Algorithms & Data Structures: From Complexity Analysis to AI/ML Algorithms

Complexity Analysis
Core Data Structures
Graph Algorithms
Dynamic Programming
Advanced Algorithms
AI/ML Algorithm Connections
Coding Interview Patterns
Quiz: Test Your Knowledge

1. Complexity Analysis

Big-O, Omega, and Theta Notation

The core of algorithm analysis is expressing mathematically how time and space resources grow with input size n.

Notation	Meaning	Description
O(f(n))	Upper Bound	Does not grow faster than f(n) in the worst case
Ω(f(n))	Lower Bound	Does not grow slower than f(n) in the best case
Θ(f(n))	Tight Bound	Grows at exactly the rate of f(n)

Common complexity classes (from slowest to fastest growth):

O(1) < O(log n) < O(n) < O(n log n) < O(n²) < O(n³) < O(2ⁿ) < O(n!)

Recurrence Relations and the Master Theorem

Divide-and-conquer algorithms are expressed as recurrences, solvable with the Master Theorem.

T(n) = aT(n/b) + f(n)

a = number of subproblems, b = division factor, f(n) = split/merge cost

Three cases of the Master Theorem:

f(n) = O(n^(log_b(a) - ε)) → T(n) = Θ(n^log_b(a))
f(n) = Θ(n^log_b(a)) → T(n) = Θ(n^log_b(a) · log n)
f(n) = Ω(n^(log_b(a) + ε)) → T(n) = Θ(f(n))

Example: Merge sort T(n) = 2T(n/2) + O(n) → Case 2 → Θ(n log n)

Amortized Analysis

Analyzes the average cost per operation over a sequence of operations, even when individual operations can be expensive.

Dynamic Array example:

Doubling capacity on overflow → that push is O(n)
Total cost of n pushes: O(1) + O(1) + ... + O(n) ≈ O(2n) = O(n)
Amortized cost: O(1) per operation

2. Core Data Structures

Complexity Summary

Data Structure	Access	Search	Insert	Delete	Space
Array	O(1)	O(n)	O(n)	O(n)	O(n)
Linked List	O(n)	O(n)	O(1)	O(1)	O(n)
Stack	O(n)	O(n)	O(1)	O(1)	O(n)
Queue	O(n)	O(n)	O(1)	O(1)	O(n)
Hash Table	-	O(1)*	O(1)*	O(1)*	O(n)
BST (balanced)	O(log n)	O(log n)	O(log n)	O(log n)	O(n)
Heap	O(1) (max)	O(n)	O(log n)	O(log n)	O(n)

*Average case

Red-Black Tree

A self-balancing BST that automatically maintains balance through rotations and recoloring.

5 properties:

Every node is either red or black
The root is black
All leaves (NIL) are black
Both children of every red node are black
All paths from root to any leaf have the same number of black nodes

# Red-Black Tree Rotation - Left Rotation concept
#
#     x                y
#    / \    →         / \
#   a   y            x   c
#      / \          / \
#     b   c        a   b
#
def left_rotate(tree, x):
    y = x.right          # y is x's right child
    x.right = y.left     # move y's left subtree to x's right
    if y.left != tree.NIL:
        y.left.parent = x
    y.parent = x.parent  # link x's parent to y
    if x.parent == tree.NIL:
        tree.root = y
    elif x == x.parent.left:
        x.parent.left = y
    else:
        x.parent.right = y
    y.left = x           # put x on y's left
    x.parent = y

Heap and Priority Queue

A heap is a complete binary tree stored as an array.

import heapq

# Using min-heap for Dijkstra's algorithm
def dijkstra(graph, start):
    distances = {node: float('inf') for node in graph}
    distances[start] = 0
    # Store (distance, node) tuples in the heap
    heap = [(0, start)]
    visited = set()

    while heap:
        dist, node = heapq.heappop(heap)
        if node in visited:
            continue
        visited.add(node)

        for neighbor, weight in graph[node]:
            new_dist = dist + weight
            if new_dist < distances[neighbor]:
                distances[neighbor] = new_dist
                heapq.heappush(heap, (new_dist, neighbor))

    return distances

# Example usage
graph = {
    'A': [('B', 1), ('C', 4)],
    'B': [('C', 2), ('D', 5)],
    'C': [('D', 1)],
    'D': []
}
print(dijkstra(graph, 'A'))
# {'A': 0, 'B': 1, 'C': 3, 'D': 4}

3. Graph Algorithms

BFS vs DFS Comparison

Property	BFS	DFS
Data structure	Queue	Stack / Recursion
Shortest path	Yes (unweighted)	No
Space complexity	O(V + E)	O(V)
Use cases	Shortest path, level traversal	Cycle detection, topological sort, components

Shortest Path Algorithm Comparison

Algorithm	Negative edges	Detect negative cycles	Time complexity	Scope
Dijkstra	No	No	O((V+E) log V)	Single source
Bellman-Ford	Yes	Yes	O(VE)	Single source
Floyd-Warshall	Yes	Yes	O(V³)	All pairs

Floyd-Warshall Implementation

def floyd_warshall(n, edges):
    INF = float('inf')
    # dist[i][j] = shortest distance from i to j
    dist = [[INF] * n for _ in range(n)]
    for i in range(n):
        dist[i][i] = 0
    for u, v, w in edges:
        dist[u][v] = w

    for k in range(n):          # intermediate node
        for i in range(n):      # source node
            for j in range(n):  # destination node
                if dist[i][k] + dist[k][j] < dist[i][j]:
                    dist[i][j] = dist[i][k] + dist[k][j]
    return dist

MST: Kruskal vs Prim

Kruskal (edge-based, uses Union-Find):

Sort all edges by weight ascending, then greedily add edges that don't form cycles
Time complexity: O(E log E)
Better for sparse graphs

Prim (vertex-based, uses min-heap):

Grow MST from a start node by repeatedly adding the minimum-weight edge crossing the cut
Time complexity: O(E log V)
Better for dense graphs

Topological Sort

Orders nodes in a DAG (Directed Acyclic Graph) respecting dependencies.

from collections import deque

def topological_sort(n, edges):
    indegree = [0] * n
    graph = [[] for _ in range(n)]
    for u, v in edges:
        graph[u].append(v)
        indegree[v] += 1

    queue = deque([i for i in range(n) if indegree[i] == 0])
    result = []
    while queue:
        node = queue.popleft()
        result.append(node)
        for neighbor in graph[node]:
            indegree[neighbor] -= 1
            if indegree[neighbor] == 0:
                queue.append(neighbor)

    return result if len(result) == n else []  # empty if cycle detected

4. Dynamic Programming

Two Approaches to DP

Optimal Substructure: The optimal solution to a problem can be constructed from optimal solutions to its subproblems. Overlapping Subproblems: The same subproblems are solved multiple times.

Approach	Direction	Advantages	Disadvantages
Memoization (Top-Down)	Recursion + cache	Only computes needed subproblems	Recursive call overhead
Tabulation (Bottom-Up)	Iteration	No stack overflow risk, cache-friendly	May compute unnecessary subproblems

LCS (Longest Common Subsequence)

def lcs(s1, s2):
    m, n = len(s1), len(s2)
    # dp[i][j] = LCS length of s1[:i] and s2[:j]
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s1[i-1] == s2[j-1]:
                dp[i][j] = dp[i-1][j-1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j-1])

    # 2D table visualization (s1="ABCD", s2="ACBD"):
    #     ""  A  C  B  D
    #  ""  0  0  0  0  0
    #  A   0  1  1  1  1
    #  B   0  1  1  2  2
    #  C   0  1  2  2  2
    #  D   0  1  2  2  3  <- LCS length = 3 ("ABD" or "ACD")
    return dp[m][n]

0/1 Knapsack

def knapsack(weights, values, capacity):
    n = len(weights)
    # dp[i][w] = max value using first i items, with capacity w
    dp = [[0] * (capacity + 1) for _ in range(n + 1)]

    for i in range(1, n + 1):
        for w in range(capacity + 1):
            # Don't include item i
            dp[i][w] = dp[i-1][w]
            # Include item i
            if weights[i-1] <= w:
                dp[i][w] = max(dp[i][w],
                               dp[i-1][w - weights[i-1]] + values[i-1])
    return dp[n][capacity]

5. Advanced Algorithms

Segment Tree

Handles range queries (sum, min, max) in O(log n) per operation.

class SegmentTree:
    def __init__(self, arr):
        self.n = len(arr)
        self.tree = [0] * (4 * self.n)
        self.build(arr, 0, 0, self.n - 1)

    def build(self, arr, node, start, end):
        if start == end:
            self.tree[node] = arr[start]
        else:
            mid = (start + end) // 2
            self.build(arr, 2*node+1, start, mid)
            self.build(arr, 2*node+2, mid+1, end)
            self.tree[node] = self.tree[2*node+1] + self.tree[2*node+2]

    def query(self, node, start, end, l, r):
        if r < start or end < l:
            return 0  # out of range
        if l <= start and end <= r:
            return self.tree[node]  # fully inside range
        mid = (start + end) // 2
        left = self.query(2*node+1, start, mid, l, r)
        right = self.query(2*node+2, mid+1, end, l, r)
        return left + right

    def update(self, node, start, end, idx, val):
        if start == end:
            self.tree[node] = val
        else:
            mid = (start + end) // 2
            if idx <= mid:
                self.update(2*node+1, start, mid, idx, val)
            else:
                self.update(2*node+2, mid+1, end, idx, val)
            self.tree[node] = self.tree[2*node+1] + self.tree[2*node+2]

Fenwick Tree (Binary Indexed Tree)

Simpler to implement than Segment Tree, with better memory efficiency.

class FenwickTree:
    def __init__(self, n):
        self.n = n
        self.tree = [0] * (n + 1)

    def update(self, i, delta):
        while i <= self.n:
            self.tree[i] += delta
            i += i & (-i)  # add the lowest set bit

    def query(self, i):
        total = 0
        while i > 0:
            total += self.tree[i]
            i -= i & (-i)  # remove the lowest set bit
        return total

    def range_query(self, l, r):
        return self.query(r) - self.query(l - 1)

Structure	Implementation	Range Query	Point Update	Range Update	Memory
Segment Tree	Complex	O(log n)	O(log n)	O(log n)	O(4n)
Fenwick Tree	Simple	O(log n)	O(log n)	Limited	O(n)

Union-Find (Path Compression + Rank)

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        # Path compression
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry:
            return False  # already in the same set
        # Union by rank
        if self.rank[rx] < self.rank[ry]:
            rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]:
            self.rank[rx] += 1
        return True
# Path compression + rank -> effectively O(α(n)) ≈ O(1) (inverse Ackermann)

KMP String Matching

def kmp_search(text, pattern):
    def build_lps(pattern):
        lps = [0] * len(pattern)
        length, i = 0, 1
        while i < len(pattern):
            if pattern[i] == pattern[length]:
                length += 1
                lps[i] = length
                i += 1
            elif length:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1
        return lps

    lps = build_lps(pattern)
    i = j = 0
    results = []
    while i < len(text):
        if text[i] == pattern[j]:
            i += 1; j += 1
        if j == len(pattern):
            results.append(i - j)
            j = lps[j - 1]
        elif i < len(text) and text[i] != pattern[j]:
            j = lps[j - 1] if j else (i := i + 1) and 0
    return results

6. AI/ML Algorithm Connections

k-d Tree for k-NN Search

The k-d tree partitions k-dimensional space to enable efficient nearest-neighbor queries.

class KDNode:
    def __init__(self, point, left=None, right=None):
        self.point = point
        self.left = left
        self.right = right

def build_kd_tree(points, depth=0):
    if not points:
        return None
    k = len(points[0])
    axis = depth % k
    sorted_points = sorted(points, key=lambda p: p[axis])
    mid = len(sorted_points) // 2
    return KDNode(
        point=sorted_points[mid],
        left=build_kd_tree(sorted_points[:mid], depth + 1),
        right=build_kd_tree(sorted_points[mid+1:], depth + 1)
    )

def nearest_neighbor(root, query, depth=0, best=None):
    if root is None:
        return best
    k = len(query)
    axis = depth % k

    dist = sum((q - p) ** 2 for q, p in zip(query, root.point)) ** 0.5
    if best is None or dist < best[0]:
        best = (dist, root.point)

    # Compare query to splitting hyperplane
    diff = query[axis] - root.point[axis]
    close, away = (root.left, root.right) if diff <= 0 else (root.right, root.left)

    best = nearest_neighbor(close, query, depth + 1, best)
    # Check if we need to explore the other side
    if abs(diff) < best[0]:
        best = nearest_neighbor(away, query, depth + 1, best)
    return best

k-d tree average complexity: O(log n), worst case: O(n) — performance degrades in high dimensions ("curse of dimensionality")

LSH (Locality-Sensitive Hashing)

An approximate nearest neighbor (ANN) method for finding similar high-dimensional vectors (e.g., embeddings) quickly.

Core idea: Similar vectors hash to the same bucket with high probability
Cosine similarity LSH: Hash by random hyperplane projections
Applications: Large-scale embedding search (Faiss, ScaNN), near-duplicate detection
Time complexity: Query O(1) ~ O(log n), index build O(nk)

Beam Search in NLP Decoding

An improvement over greedy search: maintains top k candidates (beam width) at each step.

Beam width k=2, keeping top 2 tokens per step:
Step 1: [A(0.6), B(0.4)]
Step 2: [AA(0.36), AB(0.24), BA(0.28), BB(0.16)] -> top 2: [AA, BA]
Step 3: [AAX, AAY, BAX, BAY] -> keep top 2...

Tradeoff: Larger beam width improves output quality but memory/time scales as O(k·n).

A* Algorithm in Robotics / Path Planning

Combines Dijkstra with a heuristic function h(n) for optimal and efficient pathfinding.

f(n) = g(n) + h(n)
g(n) = actual cost from start to current node
h(n) = estimated cost from current node to goal (admissible heuristic)

Admissibility condition: h(n) is less than or equal to the true cost → guarantees optimality
Robotics applications: Grid-map motion planning, ROS Navigation Stack
Time complexity: O(E log V) — same as Dijkstra, but dramatically reduces search space in practice

7. Coding Interview Patterns

Two Pointers

# Find two numbers in a sorted array that sum to target
def two_sum_sorted(arr, target):
    left, right = 0, len(arr) - 1
    while left < right:
        current = arr[left] + arr[right]
        if current == target:
            return (left, right)
        elif current < target:
            left += 1
        else:
            right -= 1
    return None
# Time O(n), Space O(1)

Sliding Window

# Maximum sum of subarray of length k
def max_subarray_sum(arr, k):
    window_sum = sum(arr[:k])
    max_sum = window_sum
    for i in range(k, len(arr)):
        window_sum += arr[i] - arr[i - k]
        max_sum = max(max_sum, window_sum)
    return max_sum
# Time O(n), Space O(1)

Backtracking

# N-Queens problem
def solve_n_queens(n):
    results = []
    board = [-1] * n

    def is_valid(row, col):
        for r in range(row):
            if board[r] == col:
                return False
            if abs(board[r] - col) == abs(r - row):
                return False
        return True

    def backtrack(row):
        if row == n:
            results.append(board[:])
            return
        for col in range(n):
            if is_valid(row, col):
                board[row] = col
                backtrack(row + 1)
                board[row] = -1  # undo choice

    backtrack(0)
    return results

Pattern Selection Guide

Problem Type	Recommended Pattern
Sorted array, pair/triple combinations	Two Pointers
Contiguous subarray or substring	Sliding Window
Combinations, permutations, counting	Backtracking
Shortest path, level traversal	BFS
Cycles, topology, connectivity	DFS
Range queries and updates	Segment Tree / Fenwick Tree
Merging sets, connectivity checks	Union-Find
Optimization with overlapping subproblems	Dynamic Programming

8. Quiz: Test Your Knowledge

Q1. Under what circumstances does a hash table's average O(1) lookup degrade to O(n)?

Answer: When all keys hash to the same bucket — a worst-case hash collision scenario.

Explanation: A hash table maps keys to buckets using a hash function. In the ideal case with no collisions, lookup is O(1). However, with a poor hash function or adversarial inputs, all n keys can land in the same bucket. With chaining, this bucket becomes a linked list that must be traversed linearly, giving O(n). Solutions include rehashing when the load factor exceeds a threshold, universal hashing (randomized hash functions), or open addressing strategies like Robin Hood hashing.

Q2. Why does Dijkstra's algorithm fail with negative-weight edges?

Answer: Dijkstra relies on the greedy assumption that once a node is settled, its distance is final. Negative edges violate this assumption.

Explanation: Dijkstra marks a node as "finalized" when it is popped from the min-heap, assuming no future path can improve it. With negative edges, a path through a later-visited node B could reduce the distance to an already-finalized node A (even without a negative cycle). The algorithm's invariant breaks down. Use Bellman-Ford for graphs with negative edges — it relaxes all edges V-1 times and can detect negative cycles.

Q3. What is the space complexity difference between memoization and tabulation in DP?

Answer: Memoization stores only the subproblems that are actually called (advantageous when many subproblems are skipped); tabulation always fills the entire table O(n) but can be space-optimized with a sliding window.

Explanation: Top-down memoization uses a cache (dict/array) and only computes subproblems encountered during recursion. If only k of n total subproblems are needed, it uses O(k) space. Bottom-up tabulation fills the entire dp table, always using O(n) space. However, for problems where dp[i] depends only on dp[i-1] (like Fibonacci), the table can be reduced to O(1) by keeping only the previous values.

Q4. Why is a k-d tree efficient for k-NN search on average, and what are its limitations?

Answer: It halves the search space at each level, giving average O(log n). However, the "curse of dimensionality" causes performance to approach O(n) as d grows large.

Explanation: A k-d tree splits data along one axis per level. During nearest-neighbor search, it prunes entire subtrees when the distance to the splitting hyperplane exceeds the current best distance. This pruning is highly effective in low dimensions (d is less than or equal to 20), averaging O(log n). But in high dimensions, the distance to the hyperplane is almost always less than the current best, so pruning rarely triggers and nearly all points are visited. For high-dimensional vector search, approximate nearest neighbor methods like LSH or HNSW are preferred.

Q5. What are the implementation complexity and use-case differences between Segment Tree and Fenwick Tree (BIT)?

Answer: Segment Tree is more complex to implement but general-purpose (range updates, arbitrary aggregation); Fenwick Tree is simpler and faster but limited primarily to prefix sums and point updates.

Explanation: The Fenwick Tree (BIT) uses bitwise operations on the lowest set bit, making it very concise and cache-friendly. It excels at prefix sum queries and point updates. The Segment Tree requires more code but supports range minimum/maximum queries, lazy propagation for range updates, and complex aggregation functions (GCD, XOR, etc.). A practical rule in competitive programming: "use BIT for range sums only; use Segment Tree for everything else."

Summary

Algorithms and data structures form the bedrock of AI engineering.

Complexity analysis is essential for predicting bottlenecks and making scaling decisions in system design.
Graph algorithms are directly applied in knowledge graphs, recommendation systems, and path optimization.
Dynamic programming connects deeply to sequence modeling and the Bellman equation in reinforcement learning.
k-d trees, LSH, and Beam Search are core algorithms powering efficient inference in modern ML systems.

Combine consistent problem-solving practice with hands-on implementation to master both theory and real-world application.

Table of Contents