- Published on
Data Structures Complete Guide 2025: Array to Trie, B-Tree — Every Structure for Interviews
- Authors

- Name
- Youngju Kim
- @fjvbn20031
- 1. Big-O Notation
- 2. Linear Data Structures
- 3. Hash Data Structures
- 4. Tree Data Structures
- 5. Heap
- 6. Trie
- 7. Graph
- 8. Advanced Data Structures
- 9. Time Complexity Comparison Table
- 10. Data Structure Selection Guide
- 11. Top 15 Interview Questions
- 12. 5 Quizzes
- 13. References
1. Big-O Notation
1.1 Time Complexity Overview
Algorithm efficiency is expressed as a function of the input size (n).
O(1) — Constant time e.g., Array index access
O(log n) — Logarithmic time e.g., Binary search
O(n) — Linear time e.g., Array traversal
O(n log n) — Linearithmic e.g., Merge sort
O(n^2) — Quadratic time e.g., Bubble sort
O(2^n) — Exponential time e.g., Fibonacci recursion
O(n!) — Factorial e.g., Permutation generation
1.2 Growth Rate Comparison
n O(1) O(log n) O(n) O(n log n) O(n^2) O(2^n)
10 1 3 10 33 100 1,024
100 1 7 100 664 10,000 1.27e30
1000 1 10 1000 9,966 1,000,000 ---
10000 1 13 10000 132,877 100M ---
1.3 Amortized Analysis
Individual operations may be expensive, but the average cost across a sequence of operations is low.
Dynamic array append:
Usually: O(1) — add to empty slot
Occasionally: O(n) — copy to 2x array when full
Total cost for n appends: n + n/2 + n/4 + ... ~ 2n
Amortized cost: O(2n/n) = O(1)
2. Linear Data Structures
2.1 Array
Stores elements of the same type in contiguous memory.
Index: 0 1 2 3 4
[10] [20] [30] [40] [50]
^ ^
arr[0] arr[4]
| Operation | Time Complexity |
|---|---|
| Access (index) | O(1) |
| Search | O(n) |
| Insert (end) | O(1) amortized |
| Insert (middle) | O(n) |
| Delete (end) | O(1) |
| Delete (middle) | O(n) |
# Dynamic array implementation
class DynamicArray:
def __init__(self):
self.size = 0
self.capacity = 1
self.data = [None] * self.capacity
def append(self, value):
if self.size == self.capacity:
self._resize(2 * self.capacity)
self.data[self.size] = value
self.size += 1
def _resize(self, new_capacity):
new_data = [None] * new_capacity
for i in range(self.size):
new_data[i] = self.data[i]
self.data = new_data
self.capacity = new_capacity
def get(self, index):
if 0 <= index < self.size:
return self.data[index]
raise IndexError("Index out of range")
2.2 Linked List
Each node holds data and a pointer to the next node.
Singly Linked List:
[10|->] -> [20|->] -> [30|->] -> [40|->] -> None
Doubly Linked List:
None <- [<-|10|->] <-> [<-|20|->] <-> [<-|30|->] -> None
Circular Linked List:
[10|->] -> [20|->] -> [30|->] -> [10|->] (back to start)
| Operation | Time Complexity |
|---|---|
| Access (index) | O(n) |
| Search | O(n) |
| Insert (front) | O(1) |
| Insert (back, tail pointer) | O(1) |
| Delete (known node) | O(1) |
class ListNode:
def __init__(self, val=0, next=None):
self.val = val
self.next = next
class LinkedList:
def __init__(self):
self.head = None
def prepend(self, val):
new_node = ListNode(val, self.head)
self.head = new_node
def delete(self, val):
dummy = ListNode(0, self.head)
prev, curr = dummy, self.head
while curr:
if curr.val == val:
prev.next = curr.next
break
prev, curr = curr, curr.next
self.head = dummy.next
def reverse(self):
prev, curr = None, self.head
while curr:
nxt = curr.next
curr.next = prev
prev = curr
curr = nxt
self.head = prev
Array vs Linked List:
| Criterion | Array | Linked List |
|---|---|---|
| Access | O(1) | O(n) |
| Insert/Delete (front) | O(n) | O(1) |
| Cache efficiency | Good (contiguous memory) | Poor (scattered memory) |
| Memory overhead | Low | Pointer overhead |
2.3 Stack
LIFO (Last In, First Out) structure.
push(10) -> push(20) -> push(30) -> pop()
| | | | | 30 | | 20 |
| | | 20 | | 20 | | | <- 30 returned
| 10 | | 10 | | 10 | | 10 |
+----+ +----+ +----+ +----+
- Uses: Call stack, parenthesis matching, undo/redo, DFS
# Stack — implemented with Python list
class Stack:
def __init__(self):
self.items = []
def push(self, item):
self.items.append(item)
def pop(self):
if self.is_empty():
raise IndexError("Stack is empty")
return self.items.pop()
def peek(self):
return self.items[-1] if self.items else None
def is_empty(self):
return len(self.items) == 0
2.4 Queue
FIFO (First In, First Out) structure.
enqueue(10) -> enqueue(20) -> enqueue(30) -> dequeue()
Front -> Front ->
[10] [10][20] [10][20][30] [20][30] <- 10 returned
- Uses: BFS, task scheduling, message queues
2.5 Deque
A queue that allows insertion/deletion at both ends.
Insert/Delete at front <- [10][20][30] -> Insert/Delete at back
- Python
collections.dequeis based on a doubly linked list
3. Hash Data Structures
3.1 Hash Table
Keys are converted via a hash function and stored in buckets.
key: "apple" -> hash("apple") -> 3
key: "banana" -> hash("banana") -> 7
key: "cherry" -> hash("cherry") -> 3 <- Collision!
Buckets:
[0] ->
[1] ->
[2] ->
[3] -> ("apple", 100) -> ("cherry", 300) <- Chaining
[4] ->
[5] ->
[6] ->
[7] -> ("banana", 200)
3.2 Collision Resolution Strategies
Chaining: Store multiple entries in the same bucket via a linked list
[3] -> ("apple", 100) -> ("cherry", 300) -> None
Open Addressing: Find an empty slot elsewhere
Linear Probing:
hash("cherry") = 3 -> already occupied -> check 4 -> empty -> store
Quadratic Probing:
hash("cherry") = 3 -> 3+1^2=4 -> 3+2^2=7 -> ...
Double Hashing:
hash("cherry") = 3 -> 3 + hash2("cherry") -> ...
3.3 Load Factor and Rehashing
Load Factor = number of stored elements / number of buckets
LF > 0.75 -> Rehash (double the buckets, redistribute all elements)
class HashMap:
def __init__(self, capacity=16):
self.capacity = capacity
self.size = 0
self.buckets = [[] for _ in range(capacity)]
self.load_factor_threshold = 0.75
def _hash(self, key):
return hash(key) % self.capacity
def put(self, key, value):
if self.size / self.capacity >= self.load_factor_threshold:
self._rehash()
idx = self._hash(key)
for i, (k, v) in enumerate(self.buckets[idx]):
if k == key:
self.buckets[idx][i] = (key, value)
return
self.buckets[idx].append((key, value))
self.size += 1
def get(self, key):
idx = self._hash(key)
for k, v in self.buckets[idx]:
if k == key:
return v
raise KeyError(key)
def _rehash(self):
old_buckets = self.buckets
self.capacity *= 2
self.buckets = [[] for _ in range(self.capacity)]
self.size = 0
for bucket in old_buckets:
for key, value in bucket:
self.put(key, value)
| Operation | Average | Worst |
|---|---|---|
| Insert | O(1) | O(n) |
| Lookup | O(1) | O(n) |
| Delete | O(1) | O(n) |
4. Tree Data Structures
4.1 Binary Search Tree (BST)
Follows the rule: left child < parent < right child.
8
/ \
3 10
/ \ \
1 6 14
/ \ /
4 7 13
class TreeNode:
def __init__(self, val):
self.val = val
self.left = None
self.right = None
class BST:
def __init__(self):
self.root = None
def insert(self, val):
self.root = self._insert(self.root, val)
def _insert(self, node, val):
if not node:
return TreeNode(val)
if val < node.val:
node.left = self._insert(node.left, val)
elif val > node.val:
node.right = self._insert(node.right, val)
return node
def search(self, val):
return self._search(self.root, val)
def _search(self, node, val):
if not node or node.val == val:
return node
if val < node.val:
return self._search(node.left, val)
return self._search(node.right, val)
def inorder(self, node, result=None):
if result is None:
result = []
if node:
self.inorder(node.left, result)
result.append(node.val)
self.inorder(node.right, result)
return result
| Operation | Average | Worst (skewed) |
|---|---|---|
| Search | O(log n) | O(n) |
| Insert | O(log n) | O(n) |
| Delete | O(log n) | O(n) |
4.2 AVL Tree (Self-Balancing BST)
A BST where the height difference between left/right subtrees is at most 1 for every node.
Rotations to restore balance when imbalanced:
LL Rotation (Right Rotation): RR Rotation (Left Rotation):
z z
/ \
y -> y y -> y
/ / \ \ / \
x x z x z x
LR Rotation: RL Rotation:
z z
/ \
x -> y x -> y
\ / \ / / \
y x z y z x
- All operations guaranteed O(log n)
- Slight overhead from rotations on insert/delete
4.3 Red-Black Tree
A self-balancing BST satisfying 5 rules.
Rules:
1. Every node is red or black
2. Root is black
3. All leaves (NIL) are black
4. Red node children must all be black (no consecutive reds)
5. Equal black node count from any node to all its leaves
(B:8)
/ \
(R:3) (R:10)
/ \ \
(B:1)(B:6) (B:14)
/ \ /
(R:4)(R:7)(R:13)
- Used in Java TreeMap, C++ std::map
- Faster insert/delete than AVL (fewer rotations)
- Slightly slower search than AVL (less balanced)
4.4 B-Tree / B+Tree
Self-balancing multi-way search trees optimized for disk-based storage.
B-Tree (order 3):
[10, 20]
/ | \
[3,7] [12,15] [25,30,40]
B+Tree (order 3):
[10, 20]
/ | \
[3,7] [10,15] [20,25,30]
-> -> -> -> -> <- Leaf nodes form a linked list
B-Tree vs B+Tree:
| Property | B-Tree | B+Tree |
|---|---|---|
| Data location | All nodes | Leaf nodes only |
| Leaf linking | None | Linked list |
| Range queries | Inefficient | Efficient |
| Usage | General index | DB index (MySQL InnoDB) |
4.5 Segment Tree
Handles range queries (sum, min, max, etc.) in O(log n).
Array: [1, 3, 5, 7, 9, 11]
Range Sum Segment Tree:
[36] (0-5)
/ \
[9] [27] (0-2), (3-5)
/ \ / \
[4] [5] [16] [11] (0-1),(2),(3-4),(5)
/ \ / \
[1] [3] [7] [9]
class SegmentTree:
def __init__(self, arr):
self.n = len(arr)
self.tree = [0] * (4 * self.n)
self.build(arr, 1, 0, self.n - 1)
def build(self, arr, node, start, end):
if start == end:
self.tree[node] = arr[start]
else:
mid = (start + end) // 2
self.build(arr, 2 * node, start, mid)
self.build(arr, 2 * node + 1, mid + 1, end)
self.tree[node] = self.tree[2 * node] + self.tree[2 * node + 1]
def query(self, node, start, end, l, r):
if r < start or end < l:
return 0
if l <= start and end <= r:
return self.tree[node]
mid = (start + end) // 2
left = self.query(2 * node, start, mid, l, r)
right = self.query(2 * node + 1, mid + 1, end, l, r)
return left + right
5. Heap
5.1 Min Heap / Max Heap
A complete binary tree where the parent is smaller (Min) or larger (Max) than its children.
Min Heap: Max Heap:
1 9
/ \ / \
3 5 7 8
/ \ / \ / \ / \
7 8 9 10 3 5 4 6
Array representation:
Index: 0 1 2 3 4 5 6
Min: [ 1, 3, 5, 7, 8, 9, 10]
Parent: (i-1) // 2
Left child: 2*i + 1
Right child: 2*i + 2
5.2 Heap Operations
class MinHeap:
def __init__(self):
self.heap = []
def push(self, val):
self.heap.append(val)
self._sift_up(len(self.heap) - 1)
def pop(self):
if not self.heap:
raise IndexError("Heap is empty")
self._swap(0, len(self.heap) - 1)
val = self.heap.pop()
if self.heap:
self._sift_down(0)
return val
def _sift_up(self, i):
parent = (i - 1) // 2
while i > 0 and self.heap[i] < self.heap[parent]:
self._swap(i, parent)
i = parent
parent = (i - 1) // 2
def _sift_down(self, i):
n = len(self.heap)
while True:
smallest = i
left = 2 * i + 1
right = 2 * i + 2
if left < n and self.heap[left] < self.heap[smallest]:
smallest = left
if right < n and self.heap[right] < self.heap[smallest]:
smallest = right
if smallest == i:
break
self._swap(i, smallest)
i = smallest
def _swap(self, i, j):
self.heap[i], self.heap[j] = self.heap[j], self.heap[i]
| Operation | Time Complexity |
|---|---|
| push | O(log n) |
| pop (min) | O(log n) |
| peek | O(1) |
| heapify (array to heap) | O(n) |
5.3 Heap Sort
def heap_sort(arr):
import heapq
heapq.heapify(arr)
return [heapq.heappop(arr) for _ in range(len(arr))]
- Time complexity: O(n log n)
- Space complexity: O(1) (in-place possible)
- Uses: Priority queues, Top-K problems, Dijkstra algorithm
6. Trie
6.1 Trie Structure
A tree data structure optimized for string searching.
Words: "cat", "car", "card", "dog", "do"
Root
+-- c
| +-- a
| +-- t (*)
| +-- r (*)
| +-- d (*)
+-- d
+-- o (*)
+-- g (*)
(*) = end of word
6.2 Trie Implementation
class TrieNode:
def __init__(self):
self.children = {}
self.is_end = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end = True
def search(self, word):
node = self._find_node(word)
return node is not None and node.is_end
def starts_with(self, prefix):
return self._find_node(prefix) is not None
def _find_node(self, prefix):
node = self.root
for char in prefix:
if char not in node.children:
return None
node = node.children[char]
return node
def autocomplete(self, prefix):
"""Return all words starting with prefix"""
node = self._find_node(prefix)
if not node:
return []
results = []
self._dfs(node, prefix, results)
return results
def _dfs(self, node, path, results):
if node.is_end:
results.append(path)
for char, child in node.children.items():
self._dfs(child, path + char, results)
| Operation | Time Complexity (m = string length) |
|---|---|
| Insert | O(m) |
| Search | O(m) |
| Prefix search | O(m) |
- Uses: Autocomplete, spell checking, IP routing (Longest Prefix Match), dictionary lookup
7. Graph
7.1 Graph Representation
Graph:
A --- B
| / |
| / |
C --- D
Adjacency List:
A: [B, C]
B: [A, C, D]
C: [A, B, D]
D: [B, C]
Adjacency Matrix:
A B C D
A [0, 1, 1, 0]
B [1, 0, 1, 1]
C [1, 1, 0, 1]
D [0, 1, 1, 0]
| Comparison | Adjacency List | Adjacency Matrix |
|---|---|---|
| Space | O(V + E) | O(V^2) |
| Edge existence check | O(degree) | O(1) |
| Traverse all neighbors | O(degree) | O(V) |
| Best for | Sparse graphs | Dense graphs |
7.2 BFS (Breadth-First Search)
from collections import deque
def bfs(graph, start):
visited = set([start])
queue = deque([start])
order = []
while queue:
node = queue.popleft()
order.append(node)
for neighbor in graph[node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return order
BFS order (starting from A):
Level 0: A
Level 1: B, C
Level 2: D
Queue: [A] -> [B, C] -> [C, D] -> [D] -> []
7.3 DFS (Depth-First Search)
def dfs(graph, start, visited=None):
if visited is None:
visited = set()
visited.add(start)
result = [start]
for neighbor in graph[start]:
if neighbor not in visited:
result.extend(dfs(graph, neighbor, visited))
return result
7.4 Dijkstra Algorithm
Finds the shortest path in a weighted graph.
import heapq
def dijkstra(graph, start):
distances = {node: float('inf') for node in graph}
distances[start] = 0
pq = [(0, start)] # (distance, node)
while pq:
dist, node = heapq.heappop(pq)
if dist > distances[node]:
continue
for neighbor, weight in graph[node]:
new_dist = dist + weight
if new_dist < distances[neighbor]:
distances[neighbor] = new_dist
heapq.heappush(pq, (new_dist, neighbor))
return distances
- Time complexity: O((V + E) log V) with a min heap
- Only works with non-negative weights
7.5 Topological Sort
An ordering of vertices in a DAG (Directed Acyclic Graph) that respects precedence relations.
from collections import deque
def topological_sort(graph, in_degree):
queue = deque([v for v in graph if in_degree[v] == 0])
result = []
while queue:
node = queue.popleft()
result.append(node)
for neighbor in graph[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
return result if len(result) == len(graph) else [] # cycle detection
- Uses: Build dependencies, task scheduling, compilation order
7.6 Bellman-Ford Algorithm
Finds shortest paths even in graphs with negative weights.
def bellman_ford(edges, num_vertices, start):
distances = [float('inf')] * num_vertices
distances[start] = 0
for _ in range(num_vertices - 1):
for u, v, w in edges:
if distances[u] + w < distances[v]:
distances[v] = distances[u] + w
# Negative cycle detection
for u, v, w in edges:
if distances[u] + w < distances[v]:
return None # Negative cycle exists
return distances
- Time complexity: O(V * E)
8. Advanced Data Structures
8.1 Union-Find (Disjoint Set)
Initial: Each node represents itself
{0} {1} {2} {3} {4}
union(0, 1): {0, 1} {2} {3} {4}
union(2, 3): {0, 1} {2, 3} {4}
union(0, 3): {0, 1, 2, 3} {4}
find(1) == find(3)? -> Yes (same set)
find(1) == find(4)? -> No (different sets)
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
# Path Compression
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
# Union by Rank
px, py = self.find(x), self.find(y)
if px == py:
return False
if self.rank[px] < self.rank[py]:
px, py = py, px
self.parent[py] = px
if self.rank[px] == self.rank[py]:
self.rank[px] += 1
return True
- Path compression + union by rank: nearly O(1) (inverse Ackermann function)
- Uses: Kruskal MST, connected components, cycle detection
8.2 Skip List
A probabilistic data structure that adds multiple levels of indexes to a sorted linked list.
Level 3: HEAD -------------------- 50 ------- NIL
Level 2: HEAD ---- 20 ----------- 50 --- 70 -- NIL
Level 1: HEAD -- 10 -- 20 -- 30 -- 50 -- 70 -- 80 -- NIL
Level 0: HEAD -- 10 -- 20 -- 30 -- 40 -- 50 -- 60 -- 70 -- 80 -- NIL
| Operation | Average | Worst |
|---|---|---|
| Search | O(log n) | O(n) |
| Insert | O(log n) | O(n) |
| Delete | O(log n) | O(n) |
- Used in Redis Sorted Set
- Simpler to implement than Red-Black Tree
8.3 Bloom Filter
A data structure that probabilistically determines whether an element belongs to a set.
k=3 hash functions, m=10 bit array
insert("apple"):
h1("apple")=2, h2("apple")=5, h3("apple")=8
Bit array: [0,0,1,0,0,1,0,0,1,0]
query("apple"): positions 2,5,8 all 1 -> "probably exists"
query("grape"): h1=1, h2=5, h3=7 -> position 1 is 0 -> "definitely not"
- False Positive: May say "exists" when it actually does not
- False Negative: Impossible (a "definitely not" is guaranteed)
- Uses: Cache filter, spam filter, DB existence checks
8.4 LRU Cache
A cache that evicts the least recently used item.
from collections import OrderedDict
class LRUCache:
def __init__(self, capacity):
self.capacity = capacity
self.cache = OrderedDict()
def get(self, key):
if key not in self.cache:
return -1
self.cache.move_to_end(key)
return self.cache[key]
def put(self, key, value):
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
if len(self.cache) > self.capacity:
self.cache.popitem(last=False)
- O(1) get/put via HashMap + Doubly Linked List
- Uses: Browser cache, database buffer, CDN
9. Time Complexity Comparison Table
| Data Structure | Access | Search | Insert | Delete | Space |
|---|---|---|---|---|---|
| Array | O(1) | O(n) | O(n) | O(n) | O(n) |
| Dynamic Array | O(1) | O(n) | O(1)* | O(n) | O(n) |
| Linked List | O(n) | O(n) | O(1) | O(1) | O(n) |
| Stack | O(n) | O(n) | O(1) | O(1) | O(n) |
| Queue | O(n) | O(n) | O(1) | O(1) | O(n) |
| Hash Table | - | O(1)* | O(1)* | O(1)* | O(n) |
| BST | - | O(log n)* | O(log n)* | O(log n)* | O(n) |
| AVL Tree | - | O(log n) | O(log n) | O(log n) | O(n) |
| Red-Black Tree | - | O(log n) | O(log n) | O(log n) | O(n) |
| B-Tree | - | O(log n) | O(log n) | O(log n) | O(n) |
| Heap | - | O(n) | O(log n) | O(log n) | O(n) |
| Trie | - | O(m) | O(m) | O(m) | O(n*m) |
| Skip List | - | O(log n)* | O(log n)* | O(log n)* | O(n log n) |
* = average time. Worst case may differ. m = string length.
10. Data Structure Selection Guide
10.1 Decision Tree
Need fast insert/delete? -> Hash table or linked list
Need sorted order? -> BST family (AVL, Red-Black)
Need fast min/max? -> Heap
Need string prefix search? -> Trie
Need range queries? -> Segment tree or B-Tree
Need fast set membership? -> HashSet or Bloom filter
Need connected component management? -> Union-Find
Need shortest path? -> Graph + Dijkstra/BFS
FIFO? -> Queue
LIFO? -> Stack
10.2 Practical Use Case Map
| Problem Type | Recommended Structure | Reason |
|---|---|---|
| Cache implementation | HashMap + DLL (LRU) | O(1) get/put |
| Autocomplete | Trie | Prefix search O(m) |
| Real-time leaderboard | Skip List or Balanced BST | Sorted + fast updates |
| Task scheduler | Priority Queue (Heap) | Priority-based extraction |
| Social network | Graph (adjacency list) | Relationship traversal |
| DB index | B+Tree | Range queries + disk optimization |
| Duplicate detection | HashSet or Bloom Filter | O(1) check |
| Range sum/min | Segment Tree | O(log n) query |
| Network connectivity | Union-Find | Nearly O(1) union/find |
| Parenthesis matching | Stack | LIFO pattern |
| Breadth-first search | Queue + Graph | Level-order traversal |
11. Top 15 Interview Questions
Q1. Explain the differences between arrays and linked lists, and when to use each.
Arrays use contiguous memory allocation with O(1) index access but O(n) for mid insertion/deletion. Linked lists use nodes connected by pointers with O(1) insertion/deletion but O(n) index access. Use arrays when random access is frequent and size is predictable; use linked lists when frequent insertions/deletions and dynamic sizing are needed.
Q2. Explain hash table collision resolution methods.
Chaining stores multiple entries in the same bucket via linked lists. It is simple to implement and degrades gracefully at high load factors. Open Addressing finds other empty slots on collision. Variants include linear probing, quadratic probing, and double hashing. It has good cache efficiency but performance degrades sharply at high load factors.
Q3. Explain BST deletion.
There are three cases: (1) Leaf node: simply remove. (2) One child: replace with child. (3) Two children: replace with the in-order successor (minimum of right subtree) or in-order predecessor (maximum of left subtree), then delete that node.
Q4. Explain the differences between a Heap and a BST.
A Heap is a complete binary tree where parent is larger (Max) or smaller (Min) than children, with O(1) min/max extraction and O(log n) insert/delete. A BST follows the left child < parent < right child rule, allowing sorted-order traversal and O(log n) search. Heaps are suited for priority queues; BSTs are suited for sorted data management.
Q5. Explain Trie time/space complexity and pros/cons of HashMap-based implementation.
Trie has O(m) insert/search for string length m. Space is O(n*m) worst case but less in practice due to shared prefixes. HashMap-based implementation (children as HashMap per node) is flexible for any alphabet size and memory-efficient, but has larger constant factors than array-based approaches.
Q6. Explain the differences between BFS and DFS and their use cases.
BFS uses a queue for level-order traversal and is suited for shortest paths in unweighted graphs. DFS uses a stack (or recursion) for depth-first traversal and is suited for path finding, cycle detection, and topological sorting. Both run in O(V+E) time with O(V) space.
Q7. Explain the differences between Red-Black Trees and AVL Trees.
Both are self-balancing BSTs. AVL has stricter balance (height difference <= 1), making search faster but requiring more rotations on insert/delete. Red-Black has looser balance, making insert/delete faster but search slightly slower. AVL is better for read-heavy workloads; Red-Black is better for write-heavy workloads.
Q8. Explain how to implement an LRU Cache in O(1).
Combine a HashMap and a doubly linked list. The HashMap stores key to node references, and the doubly linked list maintains usage order. Get performs O(1) lookup via HashMap then moves the node to the end of the list. Put adds a new node at the end, and evicts the front node when over capacity.
Q9. Explain path compression and union by rank in Union-Find.
Path Compression makes every node on the find path point directly to the root, making subsequent finds nearly O(1). Union by Rank always attaches the smaller tree under the larger tree, minimizing tree height. Together, they achieve nearly O(alpha(n)) per operation, which is effectively constant time.
Q10. Explain why B-Trees are suitable for database indexes.
B-Trees have a high branching factor, resulting in low tree height. Since disk I/O operates in blocks, B-Trees minimize I/O by storing multiple keys per node. B+Trees store all data in leaves, and leaves form a linked list, making range queries efficient.
Q11. What is a Bloom Filter and where is it used?
A Bloom Filter probabilistically determines set membership. False Positives (says "exists" but doesn't) are possible, but False Negatives (says "doesn't exist" but does) are impossible. It is space-efficient and used in cache filtering, spam URL checking, and preventing unnecessary disk reads in databases.
Q12. Explain the differences between Dijkstra and Bellman-Ford algorithms.
Dijkstra finds shortest paths in graphs without negative weights, using a priority queue in O((V+E)log V). Bellman-Ford handles negative weights and detects negative cycles, but runs in O(V*E). Use Dijkstra when there are no negative weights; use Bellman-Ford otherwise.
Q13. Explain how to implement a Queue using Stacks.
Use two stacks. Enqueue pushes to the push-stack. On dequeue, if the pop-stack is empty, transfer all elements from the push-stack to the pop-stack. This achieves amortized O(1) for both enqueue and dequeue.
Q14. Explain how Load Factor affects hash table performance.
Load Factor is the number of stored elements divided by the number of buckets. As it increases, collision probability rises: chaining sees longer linked lists, and open addressing suffers from clustering. Typically 0.75 is the threshold, triggering rehashing (bucket expansion) when exceeded.
Q15. Explain how to detect cycles in a graph.
Undirected graph: Use Union-Find (cycle if adding an edge connects nodes in the same set), or DFS (cycle if a visited non-parent node is encountered). Directed graph: DFS detects a cycle when a node in the current recursion stack is revisited (back edge detection). Topological sort fails to visit all nodes if a cycle exists.
12. 5 Quizzes
Quiz 1. What is the maximum number of comparisons for binary search in a sorted array of size n?
floor(log2(n)) + 1 comparisons. Binary search halves the search range each time, giving O(log n). For example, n=1000 requires at most 10 comparisons.
Quiz 2. What is the time complexity of finding the maximum value in a Min Heap?
O(n). In a Min Heap, the maximum must be in a leaf node, and there are about n/2 leaf nodes that must all be checked. A Heap only provides O(1) access to the minimum.
Quiz 3. How do you output all keys in sorted order from a hash table?
O(n log n) is needed. Hash tables do not maintain sorted order, so all keys must be extracted and sorted. If sorted order is frequently needed, use a TreeMap (balanced BST-based) instead.
Quiz 4. What is the height of a complete binary tree with n nodes?
floor(log2(n)). Level k of a complete binary tree holds up to 2^k nodes, so about log2(n) levels are needed for n nodes.
Quiz 5. When deleting "abc" from a Trie where "ab" is also a word, what happens?
Only the end-of-word marker (is_end) for "abc" is set to false. If node 'c' has no children, it is removed. Node 'b' is NOT removed because it marks the end of "ab". Deletion recursively removes leaf nodes that have no children and are not end-of-word markers.
13. References
Books
- Thomas H. Cormen et al., "Introduction to Algorithms (CLRS)" (4th ed., 2022)
- Robert Sedgewick, Kevin Wayne, "Algorithms" (4th ed., 2011)
- Steven S. Skiena, "The Algorithm Design Manual" (3rd ed., 2020)
Online Learning
- LeetCode: https://leetcode.com/
- NeetCode Roadmap: https://neetcode.io/
- Visualgo — Data Structure Visualization: https://visualgo.net/
- Big-O Cheat Sheet: https://www.bigocheatsheet.com/
Courses
- MIT 6.006 Introduction to Algorithms (OpenCourseWare)
- Stanford CS 161 Design and Analysis of Algorithms
- UC Berkeley CS 61B Data Structures
Coding Interview Prep
- Gayle Laakmann McDowell, "Cracking the Coding Interview" (6th ed.)
- Alex Xu, "System Design Interview" Vol. 1 & 2
- Aditya Bhargava, "Grokking Algorithms" (2nd ed., 2024)
Web Resources
- GeeksforGeeks Data Structures: https://www.geeksforgeeks.org/data-structures/
- CP-Algorithms: https://cp-algorithms.com/
- Tech Interview Handbook: https://www.techinterviewhandbook.org/