Skip to content
Published on

Data Structures Complete Guide 2025: Array to Trie, B-Tree — Every Structure for Interviews

Authors

1. Big-O Notation

1.1 Time Complexity Overview

Algorithm efficiency is expressed as a function of the input size (n).

O(1)Constant time     e.g., Array index access
O(log n)Logarithmic time  e.g., Binary search
O(n)Linear time       e.g., Array traversal
O(n log n)Linearithmic      e.g., Merge sort
O(n^2)Quadratic time    e.g., Bubble sort
O(2^n)Exponential time  e.g., Fibonacci recursion
O(n!)Factorial          e.g., Permutation generation

1.2 Growth Rate Comparison

n       O(1)  O(log n)  O(n)   O(n log n)  O(n^2)     O(2^n)
10      1     3         10     33          100        1,024
100     1     7         100    664         10,000     1.27e30
1000    1     10        1000   9,966       1,000,000  ---
10000   1     13        10000  132,877     100M       ---

1.3 Amortized Analysis

Individual operations may be expensive, but the average cost across a sequence of operations is low.

Dynamic array append:
  Usually: O(1) — add to empty slot
  Occasionally: O(n) — copy to 2x array when full

  Total cost for n appends: n + n/2 + n/4 + ... ~ 2n
  Amortized cost: O(2n/n) = O(1)

2. Linear Data Structures

2.1 Array

Stores elements of the same type in contiguous memory.

Index:   0    1    2    3    4
       [10] [20] [30] [40] [50]
        ^                    ^
     arr[0]              arr[4]
OperationTime Complexity
Access (index)O(1)
SearchO(n)
Insert (end)O(1) amortized
Insert (middle)O(n)
Delete (end)O(1)
Delete (middle)O(n)
# Dynamic array implementation
class DynamicArray:
    def __init__(self):
        self.size = 0
        self.capacity = 1
        self.data = [None] * self.capacity

    def append(self, value):
        if self.size == self.capacity:
            self._resize(2 * self.capacity)
        self.data[self.size] = value
        self.size += 1

    def _resize(self, new_capacity):
        new_data = [None] * new_capacity
        for i in range(self.size):
            new_data[i] = self.data[i]
        self.data = new_data
        self.capacity = new_capacity

    def get(self, index):
        if 0 <= index < self.size:
            return self.data[index]
        raise IndexError("Index out of range")

2.2 Linked List

Each node holds data and a pointer to the next node.

Singly Linked List:
  [10|->] -> [20|->] -> [30|->] -> [40|->] -> None

Doubly Linked List:
  None <- [<-|10|->] <-> [<-|20|->] <-> [<-|30|->] -> None

Circular Linked List:
  [10|->] -> [20|->] -> [30|->] -> [10|->] (back to start)
OperationTime Complexity
Access (index)O(n)
SearchO(n)
Insert (front)O(1)
Insert (back, tail pointer)O(1)
Delete (known node)O(1)
class ListNode:
    def __init__(self, val=0, next=None):
        self.val = val
        self.next = next

class LinkedList:
    def __init__(self):
        self.head = None

    def prepend(self, val):
        new_node = ListNode(val, self.head)
        self.head = new_node

    def delete(self, val):
        dummy = ListNode(0, self.head)
        prev, curr = dummy, self.head
        while curr:
            if curr.val == val:
                prev.next = curr.next
                break
            prev, curr = curr, curr.next
        self.head = dummy.next

    def reverse(self):
        prev, curr = None, self.head
        while curr:
            nxt = curr.next
            curr.next = prev
            prev = curr
            curr = nxt
        self.head = prev

Array vs Linked List:

CriterionArrayLinked List
AccessO(1)O(n)
Insert/Delete (front)O(n)O(1)
Cache efficiencyGood (contiguous memory)Poor (scattered memory)
Memory overheadLowPointer overhead

2.3 Stack

LIFO (Last In, First Out) structure.

push(10) -> push(20) -> push(30) -> pop()

  |    |   |    |   | 30 |   | 20 |
  |    |   | 20 |   | 20 |   |    | <- 30 returned
  | 10 |   | 10 |   | 10 |   | 10 |
  +----+   +----+   +----+   +----+
  • Uses: Call stack, parenthesis matching, undo/redo, DFS
# Stack — implemented with Python list
class Stack:
    def __init__(self):
        self.items = []

    def push(self, item):
        self.items.append(item)

    def pop(self):
        if self.is_empty():
            raise IndexError("Stack is empty")
        return self.items.pop()

    def peek(self):
        return self.items[-1] if self.items else None

    def is_empty(self):
        return len(self.items) == 0

2.4 Queue

FIFO (First In, First Out) structure.

enqueue(10) -> enqueue(20) -> enqueue(30) -> dequeue()

Front ->                                    Front ->
[10]      [10][20]     [10][20][30]       [20][30]  <- 10 returned
  • Uses: BFS, task scheduling, message queues

2.5 Deque

A queue that allows insertion/deletion at both ends.

Insert/Delete at front <- [10][20][30] -> Insert/Delete at back
  • Python collections.deque is based on a doubly linked list

3. Hash Data Structures

3.1 Hash Table

Keys are converted via a hash function and stored in buckets.

key: "apple" -> hash("apple") -> 3
key: "banana" -> hash("banana") -> 7
key: "cherry" -> hash("cherry") -> 3  <- Collision!

Buckets:
[0] ->
[1] ->
[2] ->
[3] -> ("apple", 100) -> ("cherry", 300)  <- Chaining
[4] ->
[5] ->
[6] ->
[7] -> ("banana", 200)

3.2 Collision Resolution Strategies

Chaining: Store multiple entries in the same bucket via a linked list

[3] -> ("apple", 100) -> ("cherry", 300) -> None

Open Addressing: Find an empty slot elsewhere

Linear Probing:
  hash("cherry") = 3 -> already occupied -> check 4 -> empty -> store

Quadratic Probing:
  hash("cherry") = 3 -> 3+1^2=4 -> 3+2^2=7 -> ...

Double Hashing:
  hash("cherry") = 3 -> 3 + hash2("cherry") -> ...

3.3 Load Factor and Rehashing

Load Factor = number of stored elements / number of buckets

LF > 0.75 -> Rehash (double the buckets, redistribute all elements)
class HashMap:
    def __init__(self, capacity=16):
        self.capacity = capacity
        self.size = 0
        self.buckets = [[] for _ in range(capacity)]
        self.load_factor_threshold = 0.75

    def _hash(self, key):
        return hash(key) % self.capacity

    def put(self, key, value):
        if self.size / self.capacity >= self.load_factor_threshold:
            self._rehash()

        idx = self._hash(key)
        for i, (k, v) in enumerate(self.buckets[idx]):
            if k == key:
                self.buckets[idx][i] = (key, value)
                return
        self.buckets[idx].append((key, value))
        self.size += 1

    def get(self, key):
        idx = self._hash(key)
        for k, v in self.buckets[idx]:
            if k == key:
                return v
        raise KeyError(key)

    def _rehash(self):
        old_buckets = self.buckets
        self.capacity *= 2
        self.buckets = [[] for _ in range(self.capacity)]
        self.size = 0
        for bucket in old_buckets:
            for key, value in bucket:
                self.put(key, value)
OperationAverageWorst
InsertO(1)O(n)
LookupO(1)O(n)
DeleteO(1)O(n)

4. Tree Data Structures

4.1 Binary Search Tree (BST)

Follows the rule: left child < parent < right child.

        8
       / \
      3   10
     / \    \
    1   6    14
       / \   /
      4   7 13
class TreeNode:
    def __init__(self, val):
        self.val = val
        self.left = None
        self.right = None

class BST:
    def __init__(self):
        self.root = None

    def insert(self, val):
        self.root = self._insert(self.root, val)

    def _insert(self, node, val):
        if not node:
            return TreeNode(val)
        if val < node.val:
            node.left = self._insert(node.left, val)
        elif val > node.val:
            node.right = self._insert(node.right, val)
        return node

    def search(self, val):
        return self._search(self.root, val)

    def _search(self, node, val):
        if not node or node.val == val:
            return node
        if val < node.val:
            return self._search(node.left, val)
        return self._search(node.right, val)

    def inorder(self, node, result=None):
        if result is None:
            result = []
        if node:
            self.inorder(node.left, result)
            result.append(node.val)
            self.inorder(node.right, result)
        return result
OperationAverageWorst (skewed)
SearchO(log n)O(n)
InsertO(log n)O(n)
DeleteO(log n)O(n)

4.2 AVL Tree (Self-Balancing BST)

A BST where the height difference between left/right subtrees is at most 1 for every node.

Rotations to restore balance when imbalanced:

LL Rotation (Right Rotation):      RR Rotation (Left Rotation):
    z                              z
   /                                \
  y         ->      y                 y       ->      y
 /                 / \                \             / \
x                 x   z                x           z   x

LR Rotation:                        RL Rotation:
    z                              z
   /                                \
  x         ->      y                x       ->      y
   \               / \             /               / \
    y             x   z           y               z   x
  • All operations guaranteed O(log n)
  • Slight overhead from rotations on insert/delete

4.3 Red-Black Tree

A self-balancing BST satisfying 5 rules.

Rules:
1. Every node is red or black
2. Root is black
3. All leaves (NIL) are black
4. Red node children must all be black (no consecutive reds)
5. Equal black node count from any node to all its leaves

        (B:8)
       /      \
    (R:3)    (R:10)
    / \        \
 (B:1)(B:6)  (B:14)
       / \     /
    (R:4)(R:7)(R:13)
  • Used in Java TreeMap, C++ std::map
  • Faster insert/delete than AVL (fewer rotations)
  • Slightly slower search than AVL (less balanced)

4.4 B-Tree / B+Tree

Self-balancing multi-way search trees optimized for disk-based storage.

B-Tree (order 3):
          [10, 20]
         /   |    \
   [3,7]  [12,15] [25,30,40]

B+Tree (order 3):
          [10, 20]
         /   |    \
   [3,7]  [10,15] [20,25,30]
      -> -> -> -> ->     <- Leaf nodes form a linked list

B-Tree vs B+Tree:

PropertyB-TreeB+Tree
Data locationAll nodesLeaf nodes only
Leaf linkingNoneLinked list
Range queriesInefficientEfficient
UsageGeneral indexDB index (MySQL InnoDB)

4.5 Segment Tree

Handles range queries (sum, min, max, etc.) in O(log n).

Array: [1, 3, 5, 7, 9, 11]

Range Sum Segment Tree:
              [36]           (0-5)
           /        \
        [9]          [27]     (0-2), (3-5)
       /   \        /    \
     [4]   [5]   [16]   [11]  (0-1),(2),(3-4),(5)
    / \          / \
  [1] [3]     [7] [9]
class SegmentTree:
    def __init__(self, arr):
        self.n = len(arr)
        self.tree = [0] * (4 * self.n)
        self.build(arr, 1, 0, self.n - 1)

    def build(self, arr, node, start, end):
        if start == end:
            self.tree[node] = arr[start]
        else:
            mid = (start + end) // 2
            self.build(arr, 2 * node, start, mid)
            self.build(arr, 2 * node + 1, mid + 1, end)
            self.tree[node] = self.tree[2 * node] + self.tree[2 * node + 1]

    def query(self, node, start, end, l, r):
        if r < start or end < l:
            return 0
        if l <= start and end <= r:
            return self.tree[node]
        mid = (start + end) // 2
        left = self.query(2 * node, start, mid, l, r)
        right = self.query(2 * node + 1, mid + 1, end, l, r)
        return left + right

5. Heap

5.1 Min Heap / Max Heap

A complete binary tree where the parent is smaller (Min) or larger (Max) than its children.

Min Heap:                 Max Heap:
      1                        9
     / \                      / \
    3   5                    7   8
   / \ / \                  / \ / \
  7  8 9  10               3  5 4  6

Array representation:

Index:   0  1  2  3  4  5  6
Min:   [ 1, 3, 5, 7, 8, 9, 10]

Parent: (i-1) // 2
Left child: 2*i + 1
Right child: 2*i + 2

5.2 Heap Operations

class MinHeap:
    def __init__(self):
        self.heap = []

    def push(self, val):
        self.heap.append(val)
        self._sift_up(len(self.heap) - 1)

    def pop(self):
        if not self.heap:
            raise IndexError("Heap is empty")
        self._swap(0, len(self.heap) - 1)
        val = self.heap.pop()
        if self.heap:
            self._sift_down(0)
        return val

    def _sift_up(self, i):
        parent = (i - 1) // 2
        while i > 0 and self.heap[i] < self.heap[parent]:
            self._swap(i, parent)
            i = parent
            parent = (i - 1) // 2

    def _sift_down(self, i):
        n = len(self.heap)
        while True:
            smallest = i
            left = 2 * i + 1
            right = 2 * i + 2
            if left < n and self.heap[left] < self.heap[smallest]:
                smallest = left
            if right < n and self.heap[right] < self.heap[smallest]:
                smallest = right
            if smallest == i:
                break
            self._swap(i, smallest)
            i = smallest

    def _swap(self, i, j):
        self.heap[i], self.heap[j] = self.heap[j], self.heap[i]
OperationTime Complexity
pushO(log n)
pop (min)O(log n)
peekO(1)
heapify (array to heap)O(n)

5.3 Heap Sort

def heap_sort(arr):
    import heapq
    heapq.heapify(arr)
    return [heapq.heappop(arr) for _ in range(len(arr))]
  • Time complexity: O(n log n)
  • Space complexity: O(1) (in-place possible)
  • Uses: Priority queues, Top-K problems, Dijkstra algorithm

6. Trie

6.1 Trie Structure

A tree data structure optimized for string searching.

Words: "cat", "car", "card", "dog", "do"

Root
+-- c
|   +-- a
|       +-- t (*)
|       +-- r (*)
|           +-- d (*)
+-- d
    +-- o (*)
        +-- g (*)

(*) = end of word

6.2 Trie Implementation

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True

    def search(self, word):
        node = self._find_node(word)
        return node is not None and node.is_end

    def starts_with(self, prefix):
        return self._find_node(prefix) is not None

    def _find_node(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return None
            node = node.children[char]
        return node

    def autocomplete(self, prefix):
        """Return all words starting with prefix"""
        node = self._find_node(prefix)
        if not node:
            return []
        results = []
        self._dfs(node, prefix, results)
        return results

    def _dfs(self, node, path, results):
        if node.is_end:
            results.append(path)
        for char, child in node.children.items():
            self._dfs(child, path + char, results)
OperationTime Complexity (m = string length)
InsertO(m)
SearchO(m)
Prefix searchO(m)
  • Uses: Autocomplete, spell checking, IP routing (Longest Prefix Match), dictionary lookup

7. Graph

7.1 Graph Representation

Graph:
  A --- B
  |   / |
  |  /  |
  C --- D

Adjacency List:
  A: [B, C]
  B: [A, C, D]
  C: [A, B, D]
  D: [B, C]

Adjacency Matrix:
     A  B  C  D
  A [0, 1, 1, 0]
  B [1, 0, 1, 1]
  C [1, 1, 0, 1]
  D [0, 1, 1, 0]
ComparisonAdjacency ListAdjacency Matrix
SpaceO(V + E)O(V^2)
Edge existence checkO(degree)O(1)
Traverse all neighborsO(degree)O(V)
Best forSparse graphsDense graphs
from collections import deque

def bfs(graph, start):
    visited = set([start])
    queue = deque([start])
    order = []

    while queue:
        node = queue.popleft()
        order.append(node)
        for neighbor in graph[node]:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)
    return order
BFS order (starting from A):
Level 0: A
Level 1: B, C
Level 2: D

Queue: [A] -> [B, C] -> [C, D] -> [D] -> []
def dfs(graph, start, visited=None):
    if visited is None:
        visited = set()
    visited.add(start)
    result = [start]
    for neighbor in graph[start]:
        if neighbor not in visited:
            result.extend(dfs(graph, neighbor, visited))
    return result

7.4 Dijkstra Algorithm

Finds the shortest path in a weighted graph.

import heapq

def dijkstra(graph, start):
    distances = {node: float('inf') for node in graph}
    distances[start] = 0
    pq = [(0, start)]  # (distance, node)

    while pq:
        dist, node = heapq.heappop(pq)
        if dist > distances[node]:
            continue
        for neighbor, weight in graph[node]:
            new_dist = dist + weight
            if new_dist < distances[neighbor]:
                distances[neighbor] = new_dist
                heapq.heappush(pq, (new_dist, neighbor))

    return distances
  • Time complexity: O((V + E) log V) with a min heap
  • Only works with non-negative weights

7.5 Topological Sort

An ordering of vertices in a DAG (Directed Acyclic Graph) that respects precedence relations.

from collections import deque

def topological_sort(graph, in_degree):
    queue = deque([v for v in graph if in_degree[v] == 0])
    result = []

    while queue:
        node = queue.popleft()
        result.append(node)
        for neighbor in graph[node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    return result if len(result) == len(graph) else []  # cycle detection
  • Uses: Build dependencies, task scheduling, compilation order

7.6 Bellman-Ford Algorithm

Finds shortest paths even in graphs with negative weights.

def bellman_ford(edges, num_vertices, start):
    distances = [float('inf')] * num_vertices
    distances[start] = 0

    for _ in range(num_vertices - 1):
        for u, v, w in edges:
            if distances[u] + w < distances[v]:
                distances[v] = distances[u] + w

    # Negative cycle detection
    for u, v, w in edges:
        if distances[u] + w < distances[v]:
            return None  # Negative cycle exists

    return distances
  • Time complexity: O(V * E)

8. Advanced Data Structures

8.1 Union-Find (Disjoint Set)

Initial: Each node represents itself
{0} {1} {2} {3} {4}

union(0, 1): {0, 1} {2} {3} {4}
union(2, 3): {0, 1} {2, 3} {4}
union(0, 3): {0, 1, 2, 3} {4}

find(1) == find(3)?  -> Yes (same set)
find(1) == find(4)?  -> No (different sets)
class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        # Path Compression
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        # Union by Rank
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True
  • Path compression + union by rank: nearly O(1) (inverse Ackermann function)
  • Uses: Kruskal MST, connected components, cycle detection

8.2 Skip List

A probabilistic data structure that adds multiple levels of indexes to a sorted linked list.

Level 3: HEAD -------------------- 50 ------- NIL
Level 2: HEAD ---- 20 ----------- 50 --- 70 -- NIL
Level 1: HEAD -- 10 -- 20 -- 30 -- 50 -- 70 -- 80 -- NIL
Level 0: HEAD -- 10 -- 20 -- 30 -- 40 -- 50 -- 60 -- 70 -- 80 -- NIL
OperationAverageWorst
SearchO(log n)O(n)
InsertO(log n)O(n)
DeleteO(log n)O(n)
  • Used in Redis Sorted Set
  • Simpler to implement than Red-Black Tree

8.3 Bloom Filter

A data structure that probabilistically determines whether an element belongs to a set.

k=3 hash functions, m=10 bit array

insert("apple"):
  h1("apple")=2, h2("apple")=5, h3("apple")=8
  Bit array: [0,0,1,0,0,1,0,0,1,0]

query("apple"): positions 2,5,8 all 1 -> "probably exists"
query("grape"): h1=1, h2=5, h3=7 -> position 1 is 0 -> "definitely not"
  • False Positive: May say "exists" when it actually does not
  • False Negative: Impossible (a "definitely not" is guaranteed)
  • Uses: Cache filter, spam filter, DB existence checks

8.4 LRU Cache

A cache that evicts the least recently used item.

from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = OrderedDict()

    def get(self, key):
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)
        return self.cache[key]

    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)
  • O(1) get/put via HashMap + Doubly Linked List
  • Uses: Browser cache, database buffer, CDN

9. Time Complexity Comparison Table

Data StructureAccessSearchInsertDeleteSpace
ArrayO(1)O(n)O(n)O(n)O(n)
Dynamic ArrayO(1)O(n)O(1)*O(n)O(n)
Linked ListO(n)O(n)O(1)O(1)O(n)
StackO(n)O(n)O(1)O(1)O(n)
QueueO(n)O(n)O(1)O(1)O(n)
Hash Table-O(1)*O(1)*O(1)*O(n)
BST-O(log n)*O(log n)*O(log n)*O(n)
AVL Tree-O(log n)O(log n)O(log n)O(n)
Red-Black Tree-O(log n)O(log n)O(log n)O(n)
B-Tree-O(log n)O(log n)O(log n)O(n)
Heap-O(n)O(log n)O(log n)O(n)
Trie-O(m)O(m)O(m)O(n*m)
Skip List-O(log n)*O(log n)*O(log n)*O(n log n)

* = average time. Worst case may differ. m = string length.


10. Data Structure Selection Guide

10.1 Decision Tree

Need fast insert/delete? -> Hash table or linked list
Need sorted order? -> BST family (AVL, Red-Black)
Need fast min/max? -> Heap
Need string prefix search? -> Trie
Need range queries? -> Segment tree or B-Tree
Need fast set membership? -> HashSet or Bloom filter
Need connected component management? -> Union-Find
Need shortest path? -> Graph + Dijkstra/BFS
FIFO? -> Queue
LIFO? -> Stack

10.2 Practical Use Case Map

Problem TypeRecommended StructureReason
Cache implementationHashMap + DLL (LRU)O(1) get/put
AutocompleteTriePrefix search O(m)
Real-time leaderboardSkip List or Balanced BSTSorted + fast updates
Task schedulerPriority Queue (Heap)Priority-based extraction
Social networkGraph (adjacency list)Relationship traversal
DB indexB+TreeRange queries + disk optimization
Duplicate detectionHashSet or Bloom FilterO(1) check
Range sum/minSegment TreeO(log n) query
Network connectivityUnion-FindNearly O(1) union/find
Parenthesis matchingStackLIFO pattern
Breadth-first searchQueue + GraphLevel-order traversal

11. Top 15 Interview Questions

Q1. Explain the differences between arrays and linked lists, and when to use each.

Arrays use contiguous memory allocation with O(1) index access but O(n) for mid insertion/deletion. Linked lists use nodes connected by pointers with O(1) insertion/deletion but O(n) index access. Use arrays when random access is frequent and size is predictable; use linked lists when frequent insertions/deletions and dynamic sizing are needed.

Q2. Explain hash table collision resolution methods.

Chaining stores multiple entries in the same bucket via linked lists. It is simple to implement and degrades gracefully at high load factors. Open Addressing finds other empty slots on collision. Variants include linear probing, quadratic probing, and double hashing. It has good cache efficiency but performance degrades sharply at high load factors.

Q3. Explain BST deletion.

There are three cases: (1) Leaf node: simply remove. (2) One child: replace with child. (3) Two children: replace with the in-order successor (minimum of right subtree) or in-order predecessor (maximum of left subtree), then delete that node.

Q4. Explain the differences between a Heap and a BST.

A Heap is a complete binary tree where parent is larger (Max) or smaller (Min) than children, with O(1) min/max extraction and O(log n) insert/delete. A BST follows the left child < parent < right child rule, allowing sorted-order traversal and O(log n) search. Heaps are suited for priority queues; BSTs are suited for sorted data management.

Q5. Explain Trie time/space complexity and pros/cons of HashMap-based implementation.

Trie has O(m) insert/search for string length m. Space is O(n*m) worst case but less in practice due to shared prefixes. HashMap-based implementation (children as HashMap per node) is flexible for any alphabet size and memory-efficient, but has larger constant factors than array-based approaches.

Q6. Explain the differences between BFS and DFS and their use cases.

BFS uses a queue for level-order traversal and is suited for shortest paths in unweighted graphs. DFS uses a stack (or recursion) for depth-first traversal and is suited for path finding, cycle detection, and topological sorting. Both run in O(V+E) time with O(V) space.

Q7. Explain the differences between Red-Black Trees and AVL Trees.

Both are self-balancing BSTs. AVL has stricter balance (height difference <= 1), making search faster but requiring more rotations on insert/delete. Red-Black has looser balance, making insert/delete faster but search slightly slower. AVL is better for read-heavy workloads; Red-Black is better for write-heavy workloads.

Q8. Explain how to implement an LRU Cache in O(1).

Combine a HashMap and a doubly linked list. The HashMap stores key to node references, and the doubly linked list maintains usage order. Get performs O(1) lookup via HashMap then moves the node to the end of the list. Put adds a new node at the end, and evicts the front node when over capacity.

Q9. Explain path compression and union by rank in Union-Find.

Path Compression makes every node on the find path point directly to the root, making subsequent finds nearly O(1). Union by Rank always attaches the smaller tree under the larger tree, minimizing tree height. Together, they achieve nearly O(alpha(n)) per operation, which is effectively constant time.

Q10. Explain why B-Trees are suitable for database indexes.

B-Trees have a high branching factor, resulting in low tree height. Since disk I/O operates in blocks, B-Trees minimize I/O by storing multiple keys per node. B+Trees store all data in leaves, and leaves form a linked list, making range queries efficient.

Q11. What is a Bloom Filter and where is it used?

A Bloom Filter probabilistically determines set membership. False Positives (says "exists" but doesn't) are possible, but False Negatives (says "doesn't exist" but does) are impossible. It is space-efficient and used in cache filtering, spam URL checking, and preventing unnecessary disk reads in databases.

Q12. Explain the differences between Dijkstra and Bellman-Ford algorithms.

Dijkstra finds shortest paths in graphs without negative weights, using a priority queue in O((V+E)log V). Bellman-Ford handles negative weights and detects negative cycles, but runs in O(V*E). Use Dijkstra when there are no negative weights; use Bellman-Ford otherwise.

Q13. Explain how to implement a Queue using Stacks.

Use two stacks. Enqueue pushes to the push-stack. On dequeue, if the pop-stack is empty, transfer all elements from the push-stack to the pop-stack. This achieves amortized O(1) for both enqueue and dequeue.

Q14. Explain how Load Factor affects hash table performance.

Load Factor is the number of stored elements divided by the number of buckets. As it increases, collision probability rises: chaining sees longer linked lists, and open addressing suffers from clustering. Typically 0.75 is the threshold, triggering rehashing (bucket expansion) when exceeded.

Q15. Explain how to detect cycles in a graph.

Undirected graph: Use Union-Find (cycle if adding an edge connects nodes in the same set), or DFS (cycle if a visited non-parent node is encountered). Directed graph: DFS detects a cycle when a node in the current recursion stack is revisited (back edge detection). Topological sort fails to visit all nodes if a cycle exists.


12. 5 Quizzes

Quiz 1. What is the maximum number of comparisons for binary search in a sorted array of size n?

floor(log2(n)) + 1 comparisons. Binary search halves the search range each time, giving O(log n). For example, n=1000 requires at most 10 comparisons.

Quiz 2. What is the time complexity of finding the maximum value in a Min Heap?

O(n). In a Min Heap, the maximum must be in a leaf node, and there are about n/2 leaf nodes that must all be checked. A Heap only provides O(1) access to the minimum.

Quiz 3. How do you output all keys in sorted order from a hash table?

O(n log n) is needed. Hash tables do not maintain sorted order, so all keys must be extracted and sorted. If sorted order is frequently needed, use a TreeMap (balanced BST-based) instead.

Quiz 4. What is the height of a complete binary tree with n nodes?

floor(log2(n)). Level k of a complete binary tree holds up to 2^k nodes, so about log2(n) levels are needed for n nodes.

Quiz 5. When deleting "abc" from a Trie where "ab" is also a word, what happens?

Only the end-of-word marker (is_end) for "abc" is set to false. If node 'c' has no children, it is removed. Node 'b' is NOT removed because it marks the end of "ab". Deletion recursively removes leaf nodes that have no children and are not end-of-word markers.


13. References

Books

  1. Thomas H. Cormen et al., "Introduction to Algorithms (CLRS)" (4th ed., 2022)
  2. Robert Sedgewick, Kevin Wayne, "Algorithms" (4th ed., 2011)
  3. Steven S. Skiena, "The Algorithm Design Manual" (3rd ed., 2020)

Online Learning

  1. LeetCode: https://leetcode.com/
  2. NeetCode Roadmap: https://neetcode.io/
  3. Visualgo — Data Structure Visualization: https://visualgo.net/
  4. Big-O Cheat Sheet: https://www.bigocheatsheet.com/

Courses

  1. MIT 6.006 Introduction to Algorithms (OpenCourseWare)
  2. Stanford CS 161 Design and Analysis of Algorithms
  3. UC Berkeley CS 61B Data Structures

Coding Interview Prep

  1. Gayle Laakmann McDowell, "Cracking the Coding Interview" (6th ed.)
  2. Alex Xu, "System Design Interview" Vol. 1 & 2
  3. Aditya Bhargava, "Grokking Algorithms" (2nd ed., 2024)

Web Resources

  1. GeeksforGeeks Data Structures: https://www.geeksforgeeks.org/data-structures/
  2. CP-Algorithms: https://cp-algorithms.com/
  3. Tech Interview Handbook: https://www.techinterviewhandbook.org/