Data Structures and Algorithms II
Data Structures and Algorithms II
Mar ~ Apr 6 weeks, 6 lectures & 6 labs
Learning Objectives:

slides online: https://mth252.fastzhong.com/
https://mth252.fastzhong.com/mth252.pdf
labs: https://github.com/fastzhong/mth252/tree/main/public/notebooks
π Python & Big O review: MTH251 lab1
πββοΈ learning by doing, implementing the algo from scratch
ππ»ββοΈ problem solving
if you want to dive deeper into proofs and the mathematics of computer science:
π Building Blocks for Theoretical Computer Science by Margaret M. Fleck
β οΈ always seek the best time and space complexity by appling DSA taught in MTH251 & MTH252
β οΈ in principle, only the standard ADT operations allowed to use by default as the solution has to be language indenpendent
β οΈ advanced features and built-in functions from Python not allowed if not clearly asked by the question, e.g. sort/search/find (in)/min(list)/max(list)/set/match β¦ , as the complexity becomes unknown and Python dependent
e.g. printer queue, cpu task scheduler, etc.
add(k, v), remove_max(), max(), size(), is_empty()
| Operation | unsorted list | sorted list |
|---|---|---|
| is_empty | O(1) | O(1) |
| size | O(1) | O(1) |
| add | O(1) | O(n) π |
| remove_min | O(n) π | O(1) |
| min | O(n) π | O(1) |
Complete Binary Tree
k value for any node in the tree smaller than any child node

π A perfect binary tree is a tree of which every non-leaf node has two child nodes.
π A complete binary tree is a tree in which at every level, except possibly the last is completely filled and all the nodes are as far left as possible.
key of root is always the smallest (MinHeap) or the largest (MaxHeap)
subtree is also a binary heap
from root to any leaf, the key values are in non-decreasing order
given height h, total nodes of binary heap: 2hβ€nβ€2h+1β1
given total nodes n, binary heap height: log2β(n+1)β1β€hβ€log2βn

| 2 | 4 | 3 | 5 | 6 | 6 | 9 | 6 | 7 | 8 |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| current node | : | i |
| parent(i) | = | (i - 1) / 2 |
| left_child(i) | = | 2 * i + 1 |
| right_child(i) | = | 2 * i + 2 |
add new element:
π https://www.cs.usfca.edu/~galles/visualization/Heap.html

remove the min:
π https://www.cs.usfca.edu/~galles/visualization/Heap.html

| Operation | unsorted list | sorted list | binary heap |
|---|---|---|---|
| is_empty | O(1) | O(1) | O(1) |
| size | O(1) | O(1) | O(1) |
| add | O(1) | O(n) | O(logN) π |
| remove_min | O(n) | O(1) | O(logN) π |
| min | O(n) | O(1) | O(1) |
Priority Queue (PQ) is used:
used in certain implementations of Dijkstraβs Shortest Path algorithm
used by Minimum Spanning Tree (MST) algorithms
Best First Search (BFS) algorithms such as A* uses PQ to continuously grab the next most promising node
used in Huffman coding (which is often used for lossless data compression)
anytime you need to dynamically fetch the "next best" or "next worst" element
β¦
Steps:
create a binary heap
add all elements to the heap
recursively obtain the min/max element from the heap
# heap sort # time complexity: O(NlogN) # space complexity: O(N) def heap_sort(nums: []) -> []: if not nums: return [] minH = MinHeap() for e in nums: minH.add((e, e)) nums_sorted = [] for i in range(len(nums)): nums[i] = minH.remove_min()[0] return nums# heap sort # time complexity: O(NlogN) # space complexity: O(N) def heap_sort(nums: []) -> []: if not nums: return [] minH = MinHeap() for e in nums: minH.add((e, e)) nums_sorted = [] for i in range(len(nums)): nums[i] = minH.remove_min()[0] return nums
Heapify: convert the array to be a binary heap
by "sift down" the non-leaf node one by one from top to bottom
Complete Binary Tree (Perfect Binary Tree in worst case)
| last level nodes: | 2nβ | sift_down: | 2nββ0 |
| 2nd last level nodes: | 4nβ | sift_down: | 4nββ1 |
| β¦ | β¦ | ||
| h+1 last level nodes: | 2h+1nβ | sift_down: | 2h+1nββ h |
Time Complexity: O(n)

""" heap_sort: sort nums in place time complexity: O(N) space complexity: O(1) """ def heap_sort2(nums: []) -> []: nums = heapify(nums) print(" heapify: ", nums) # swap from the last element for i in range(len(nums) - 1, 0, -1): # move the biggest to the end nums[0], nums[i] = nums[i], nums[0] # sift down the first element after swap # so nums[0, i) is still a max heap heapify_sift_down(nums, 0, i) return nums""" heap_sort: sort nums in place time complexity: O(N) space complexity: O(1) """ def heap_sort2(nums: []) -> []: nums = heapify(nums) print(" heapify: ", nums) # swap from the last element for i in range(len(nums) - 1, 0, -1): # move the biggest to the end nums[0], nums[i] = nums[i], nums[0] # sift down the first element after swap # so nums[0, i) is still a max heap heapify_sift_down(nums, 0, i) return nums

A Map is an abstract data structure (ADT):
k - unique
v - can be repeated
A Sorted Map is an extension of Map and keys are sorted in increasing order.
π€ Can we use none/null for key?
students score:
| Student(key) | Score(value) |
|---|---|
| A | 80 |
| B | 70 |
| C | 60 |
| β¦ | β¦ |
for a map
- map[k], map.get(k) - map.pop(k) - map[k] = v, map.set(k) = v, map.put(k,v), map.setdefault(k, default) - map.keys() - map.values() - del map[k], map.clear() - size(map) - iter(map), map.items()
for a sortedMap
- sortedMap.find_min() - sortedMap.find_max() - sortedMap.find_lt(k) - sortedMap.find_le(k) - sortedMap.find_gt(k) - sortedMap.find_ge(k) - sortedMap.find_range(k1, k2) - sortedMap.reversed()
π Complexity: O(n)
A Hash table is a data structure that provides a mapping from keys to values using a technique called hashing.
A hash function H(x) is a function that maps a general key βxβ to a whole number in a fixed range [0, N-1].
π‘ locate the element without searching: O(n) β O(1)
step1. Hash Code: abitrary object β integer
An element is converted into an integer by using a hash function. This element can be used as an index to store the original element, which falls into the hash table.
hash = hashcode(key)hash = hashcode(key)
step2. Compression: integerβ[0,Nβ1] (N is the size of hash table)
The element is stored in the hash table where it can be quickly retrieved using hashed key.
index = hash % array_sizeindex = hash % array_size
# XOR byte by byte def byte_xor(ba1, ba2): return bytes([_a ^ _b for _a, _b in zip(ba1, ba2)]) # produce 32-byte hash code # chop the data into 32-byte long chunks (padding with zeros if required) # then XOR on all chunks def bitwise_xor(data): chunks = [data[i:i+32] for i in range(0, len(data), 32)] for i in range(len(chunks)): chunk = chunks[i] hexchunk = chunk.hex() len_diff = 32 - len(chunk) if len_diff: hexchunk += '00' * len_diff chunk = bytearray.fromhex(hexchunk) chunks[i] = chunk res = bytes.fromhex('00' * 32) for chunk in chunks: res = byte_xor(res, chunk) return res# XOR byte by byte def byte_xor(ba1, ba2): return bytes([_a ^ _b for _a, _b in zip(ba1, ba2)]) # produce 32-byte hash code # chop the data into 32-byte long chunks (padding with zeros if required) # then XOR on all chunks def bitwise_xor(data): chunks = [data[i:i+32] for i in range(0, len(data), 32)] for i in range(len(chunks)): chunk = chunks[i] hexchunk = chunk.hex() len_diff = 32 - len(chunk) if len_diff: hexchunk += '00' * len_diff chunk = bytearray.fromhex(hexchunk) chunks[i] = chunk res = bytes.fromhex('00' * 32) for chunk in chunks: res = byte_xor(res, chunk) return res
for n-tuple (x0β,x1β,x2β,...,xnβ1β), if position is important, we can multiply anβ1 for position n, e.g.:
x0ββ a0+x1ββ a1+x2ββ a2+...+xnβ1ββ anβ1
for bitwise, we can also apply cyclic-shift function instead of multiplication, e.g. shift(x, y) means cyclic-shift y bits:
shift(x0β,0mod32)βshift(x1β,1mod32)βshift(x2β,2mod32)β...βshift(xnβ1β,(nβ1)mod32)
π¬ 5-bit cyclic shift operation can achieve the smallest total collisions when 230,000 English words
for hash code x:
Division Method: xmodN
MAD: (aβ x+b)modp where p is a prime number, p>N, a and b are random number, aβ[0,pβ1], bβ[0,pβ1]
a number of (k, v) pairs with key set {βabcdefβ, βbcdefaβ, βcdefabβ , βdefabcβ}
The ASCII values of a, b, c, d, e, and f are 97, 98, 99, 100, 101, and 102 respectively.
| key | Hash Function | Index |
|---|---|---|
| abcdef | (97 + 98 + 99 + 100 + 101 + 102) % 599 | 2 |
| bcdefa | (98 + 99 + 100 + 101 + 102 + 97) % 599 | 2 |
| cdefab | (99 + 100 + 101 + 102 + 97 + 98) % 599 | 2 |
| defabc | (100 + 101 + 102 + 97 + 98 + 99) % 599 | 2 |

a number of (k, v) with key set {βabcdefβ, βbcdefaβ, βcdefabβ , βdefabcβ}
The ASCII values of a, b, c, d, e, and f are 97, 98, 99, 100, 101, and 102 respectively.
| key | Hash Function | Index |
|---|---|---|
| abcdef | (971 + 982 + 993 + 1004 + 1015 + 1026) % 2069 | 38 |
| bcdefa | (981 + 992 + 1003 + 1014 + 1025 + 976) % 2069 | 23 |
| cdefab | (991 + 1002 + 1013 + 1024 + 975 + 986) % 2069 | 14 |
| defabc | (1001 + 1012 + 1023 + 974 + 985 + 996) % 2069 | 11 |

one element: O(1)
more than one element: linked list O(1)+O(n)

Finding an unused, or open, location in the hash table is called open addressing.
The process of locating an open location in the hash table is called probing.
Various probing techniques:
β οΈ probing may go into cycles (an infinite loop) (hashing attack)
β οΈ chaos when removing element from hash table


M: total num of map elements
O: num of occupied buckets
N: size of hash table
P: new size of hash table (expand or shrink)
when:
how:

H(x):
Designing good hash functions requires a blending of sophisticated mathematics and clever engineering.
π‘ coding tips:
class UserGroup: def __init__(self, name, status): self.name = name self.status = status def __hash__(self): result = 17 result = 31 * result + hash(name) if not status: result = 31 * result + hash(status) return result def __eq__(self, other): if isinstance(other, UserGroup): return self.__hash__() == other.__hash__() return Falseclass UserGroup: def __init__(self, name, status): self.name = name self.status = status def __hash__(self): result = 17 result = 31 * result + hash(name) if not status: result = 31 * result + hash(status) return result def __eq__(self, other): if isinstance(other, UserGroup): return self.__hash__() == other.__hash__() return False
| Operation | avg case | best case | worst case |
|---|---|---|---|
| Search | O(1) | O(1) | O(n) |
| Insert | O(1) | O(1) | O(n) |
| Delete | O(1) | O(1) | O(n) |
HashMap
| Key Type | hashCode(k) |
| boolean | k? 0 : 1 |
| byte, char, short, int | k |
| long | (int)(k XOR (k >>> 32)) |
| float | Float.floatToIntBits(k) |
| double | long l = Double.doubleToIntLongBits(k) (int)(l XOR (l >>> 32)) |
| string/array | s[0]*31^(n-1) + s[1]*31^(n-2)+ ... + s[n-1] |
similarly for compound data type, like Object k with n fields, a way to implement hashCode():
h=h(k0β)β31nβ1+h(k1β)β31nβ2+...+h(knβ1β)
String name; int age; @Override public int hashCode() { int result = name != null ? name.hashCode() : 0; result = 31 * result + age; return result; }String name; int age; @Override public int hashCode() { int result = name != null ? name.hashCode() : 0; result = 31 * result + age; return result; }
HashMap
when table size n is 2y: x%2y=x&(yβ1)
e.g.
6%8=6 6&7=610%8=2 10&7=2

π binary bit operation & is much faster then decimal mod
then XOR between the first 16 bits and the last 16 bits:
H(k)=(hXOR(h>>>16))&(nβ1)

HashMap


HashMap
size 16 β 16β2:


HashMap
π source code: https://github.com/frohoff/jdk8u-jdk/blob/master/src/share/classes/java/util/HashMap.java
Binary Search Tree (BST) is a binary tree and:
π‘ any BST subtree is still a BST
π‘ BST node must be comparable
π¬ in some BST implementation all values are unique, so we exclude duplicates now

π¬ 1962, Hibbard Deletion
π https://www.cs.usfca.edu/~galles/visualization/BST.html
del min
del max




"node with 1 child node"



"node with 2 child nodes"
| Operation | avg case | worst case |
|---|---|---|
| Search | O(logN) | O(n) |
| Insert | O(logN) | O(n) |
| Delete | O(logN) | O(n) |
Binary Search Tree (BST) is used:
implementation of AVL Tree Red Black Tree etc.
syntax trees used by compiler and calculator
Treap - a probabilistic data structure
β¦
Heap is balanced tree, BST is not
Heap allows duplicates, BST doesnot
BST is ordered data structure, Heap is not
worst case for building n nodes of BST O(nβ log(n)), Heap is O(n) (heapify)

named after inventors G.M. Adelβson-Velβskii and E.M. Landis, 1962
first type of Balanced Binary Search Tree (BBST)
height balanced: BF - balance factor
BF=H(node.right)βH(node.left)
BFββ1,0,1
heigh and no. of nodes: O(logN)
Examples:



Rebalance ?
Steps:
update Height
compute Balance Factor
left/right rotation if unbalanced
π https://www.cs.usfca.edu/~galles/visualization/AVLtree.html
T1<z<T2<y<T3<z<T4
- y.right = x - x.left = T3- y.right = x - x.left = T3

T4<x<T3<y<T1<z<T2
- y.left = x - x.right = T3- y.left = x - x.right = T3

1. LR β LL: left rotate 2. LL: right rotate1. LR β LL: left rotate 2. LL: right rotate

1. RL β RR: right rotate 2. RR: left rotate1. RL β RR: right rotate 2. RR: left rotate

| Operation | BST avg | BST worst | AVL avg | AVL worst |
|---|---|---|---|---|
| Search | O(logN) | O(N) | O(logN) | O(logN) |
| Insert | O(logN) | O(N) | O(logN) | O(logN) |
| Delete | O(logN) | O(N) | O(logN) | O(logN) |
AVL: O(N) β O(logN)
A red-black tree is a binary tree that satisfies the following red-black properties:
A 2-3 tree is a B-tree of order 3:

Insertion is always done on leaf



Red Black Tree is equivalent 2-3 Tree

Red Black Tree is equivalent 2-3 Tree



"black balanced"
N nodes β height: 2logN, Complexity: O(logN)
Red Black Tree vs AVL Tree
insert a red node β a blank tree
- node.right = x.left - x.left = node - x.color = node.color - node.color = RED- node.right = x.left - x.left = node - x.color = node.color - node.color = RED

- node.color = RED - node.left.color = BLACK - node.right.color = BLACK- node.color = RED - node.left.color = BLACK - node.right.color = BLACK

- node.left = T1 - x.right = node - x.color = node.color - node.color = RED- node.left = T1 - x.right = node - x.color = node.color - node.color = RED










Skip List node with an array of pointers: position pointers[0] stores a level 0 pointer, position pointers[1] stores a level 1 pointer, and so on.
When inserting a new value, the levels (depth) for the new node is randomized.

| Operation | Skip List avg | Link List avg |
|---|---|---|
| Search | O(logN) π | O(N) |
| Insert | O(logN) | O(N) |
| Delete | O(logN) | O(N) |
π Skip Lists: A Probabilistic Alternative to Balanced Trees (William Pugh, 1990)



John von Neumann


merge top down: sort_merge_bottomup_v1
improvement: sort_merge_recusive_v2
merge buttom up: sort_merge_bottomup
worst case ?
at each level i = 0,1,2,...,log2βn
there are 2i subproblems, each of size 2inβ
total # of operations at level i β€2iβ
c(2inβ)=cβ
n
complexity on each level is O(n)
"recursion tree"

Tony Hoare


worst case: O(n2)
improvement:
π random algorithm, time complexity O(Nβ log2βN) (π Introduction to Algorithms)
Simple Sort:
Efficient Sort:
| sorting | avg | best | worst | inplacement | space | stability |
|---|---|---|---|---|---|---|
| Selection | O(n2) | O(n2) | O(n2) | in-place | O(1) | NOT stable |
| Insertion | O(n2) | O(n) π | O(n2) | in-place | O(1) | stable |
| Bubble | O(n2) | O(n)* | O(n2) | in-place | O(1) | stable |
| Heap | O(nlogn) | O(nlogn) | O(nlogn) | in-place | O(1) | NOT stable |
| Merge | O(nlogn) | O(nlogn) β O(n)* | O(nlogn) | OUT-place | O(n) π | stable |
| Quick | O(nlogn) | O(nlogn) β O(n)* | O(n2) π | in-place | O(1) | NOT stable |
| n2 | nlogn | faster | |
|---|---|---|---|
| n = 10 | 100 | 33 | 3 |
| n = 100 | 10,000 | 664 | 15 |
| n = 1,000 | 1,000,000 | 9,966 | 100 |
| n = 10,000 | 100,000,000 | 132,877 | 753 |
π 1hr vs 31days (β 753hrs)
Stable sort algorithms sort equal elements in the same order that they appear in the input:


list.sort() or sorted(list)
| sorting | best | avg | worst |
|---|---|---|---|
| Merge | O(nlogn) | O(nlogn) | O(nlogn) |
| Quick | O(nlogn) | O(nlogn) | O(n2) |
| Timsort | O(n) π | O(nlogn) | O(nlogn) |
β
size < 64 β insertion sort
β
otherwise β "adaptive merge sort"
2 types of sorting operations:
β οΈ comparing objects is very expensive
list.sort() or sorted(list)
[1, 2, 3, ..., 100, ..., 101, 102, 103, ..., 200] run1 = [1, 2, 3, , ..., 100] run2 = [101, 102, 103, , ..., 200] ......
run2[2nβ1β1]β€run1[0]β€run2[2nβ1]
binary search: O(N) β O(logN)
Arrays.sort() or Collections.sort()
primitive array: Dual Pivot Quicksort
object array: Timsort
In computer science, a selection algorithm is an algorithm for finding the kth smallest number in a list or array; such a number is called the kth order statistic. This includes the cases of finding the minimum, maximum, and median elements.
π sort the array first O(NlogN)
Solution 1: complexity O(N)+O(KlogN)
Solution 2: complexity O(K)+O(NlogK)
create a Max Heap with size k, add element one by one:
min()
p = partition(arr, l, r) sort_quick_recursive(arr, l, p - 1) sort_quick_recursive(arr, p + 1, r)p = partition(arr, l, r) sort_quick_recursive(arr, l, p - 1) sort_quick_recursive(arr, p + 1, r)
k==p?
k<p?
k>p?
complexity O(n+n/2+n/4+...+1)=O(2n)=O(n)
The elimination of a large group of possibilities in one step is known as pruning:
π binary search
π alpha-beta pruning
π decision tree pruning
docs editors, emails, messages
web sites, search engine
bio-data (genonme sequencing)
natual language processing (NLP)
β¦
Example:
βHello Worldβ βhttp://www.google.comβ βCGTAAACTGCTTTAATCAAACGCβ
Pattern Matching
For a text string S with a length of n, and a pattern string P with length of m, decides if P is a substring of S.
Approximate Pattern Matching
input: a text string S with a length of n, a pattern string P with length of m, and an integer d
output: all positions in S where P appears as a substrign with at most d mismatches
Brute Force
Rabin-Karp (Michael O. Rabin and Richard M. Karp, 1987)
Knuth-Morris-Pratt Algorithm (KMP) (Knuth, Morris and Pratt, 1977)
Boyer-Moore Algorithm (BM) (Robert S. Boyer and J Strother Moore, 1970, 1974, 1977 )
Sunday Algorithm (Sunday) (Daniel M.Sunday, 1990)


S with a length of n, and a pattern string P with length of m
for i=mβ1...nβ1, compute hashing h(i) for substring S[i-m+1β¦i]:
B=256
M=1e9+7
BP=Bmβ1%M
h(i-1): S[i-m+1] β¦ S[i-1]
h(i): S[i-m+1] β¦ S[i-1] S[i]
h[i]=(hβB+S[i])%M
h=h[iβ1]βs[iβm+1]βBmβ1%M
h=(h[iβ1]βs[iβm+1]βBmβ1%M+M)%M β°
h=(h[iβ1]βs[iβm+1]βBP%M+M)%M
e.g. m = 5
"54321": 5β104+4β103+3β102+2β101+1β100
"bbabc": bβ2564+bβ2563+aβ2562+bβ2561+cβ2560
(a+b)%M=(a%M+b%M)%M
(aβb)%M=(a%Mβb%M)%M
prefix/suffix
| S | b | c | b | c | b | d | b | c | b | e |
| P | b | c | b | c | b | e | a |
| S | b | c | b | c | b | d | b | c | b | e | π |
| P | b | c | b | c | b | e | a |
| S | b | c | b | c | b | d | b | c | b | e | π |
| P | b | c | b | c | b | e | a |

P: b c b c e a prebuild the array to store the next position to shift for any substrings of P: b, b c, b c b, b c b c, b c b c e, b c b c e
next array
| substring | last pos | prefix last char | next pos |
|---|---|---|---|
| b | 0 | -1 | next[0] = -1 |
| b c | 1 | -1 | next[1] = -1 |
| b c b | 2 | 0 (b) | next[2] = 0 |
| b c b c | 3 | 1 (b c) | next[3] = 1 |
| b c b c b | 4 | 2 (b c b) | next[4] = 2 |
| b c b c b e | 5 | -1 | next[5] = -1 |
| S | b | c | b | c | b | d | b | c | b | e |
| P | b | c | b | c | b | e | a | |||
| j | 0 | 1 | 2 | 3 | 4 | 5 |
char to compare in pattern: j=5
bad char: P[5]
substring: P[0:5] (b c b c b)
look for longest common prefix/suffix: next[5β1]
next char to compare in pattern: j=next[5β1]+1=2+1=3
| S | b | c | b | c | b | d | b | c | b | e |
| P | b | c | b | c | b | e | a | |||
| j | 0 | 1 | 2 | 3 |
for i = 0β¦n, compare S[i] and P[j]:
S[i]==P[j]
S[i]!=P[j] (bad char)
for i in range(len_s): while j > 0 and s[i] != p[j]: # refer to slides: "bad char" found j = next_arr[j - 1] + 1 if s[i] == p[j]: j += 1 if j == len_p: return i - len_p + 1for i in range(len_s): while j > 0 and s[i] != p[j]: # refer to slides: "bad char" found j = next_arr[j - 1] + 1 if s[i] == p[j]: j += 1 if j == len_p: return i - len_p + 1
next array
Give next[iβ1]=k and P[i]=x then next[i]?
case1: P[k+1]==x, than next[i]=k+1
case2: if not, continuously look for kβ², so that P[kβ²+1]==x
k: next[iβ1] β next[k] β next[next[k]] β next[next[next[k]]] β¦
x?: P[k] β P[next[k]] β P[next[next[k]]] β β¦
while k != -1 and p[k + 1] != p[i]: k = next_arr[k] if p[k + 1] == p[i]: k += 1 next_arr[i] = kwhile k != -1 and p[k + 1] != p[i]: k = next_arr[k] if p[k + 1] == p[i]: k += 1 next_arr[i] = k

worst case: O(n+(mβ1)βn/m+m)

best case: O(n+1βn/m+m)








"bad char"
look for the right most bad char:



bad char shift d=jβb
π¬ bad char offset b can be pre-built for bad char position j (P[j]) for all possible missing char in 2-D lookup table badchar_tbl ?


| S | B | D | C | B | C | D | D | B | B | C | D |
| P | B | C | D | B | A | C | D | ||||
| j | 0 | 1 | 2 | 3 | 4 | 5 |
d=5β2=3
| S | B | D | C | B | C | D | D | B | B | C | D |
| i | 0 | 1 | 2 | 3 | |||||||
| P | B | C | D | B | A | C | D |
let P[b] be the right most char regardless j
case1: bad char not found d=jβb=jβ(β1)
case2: bad char found and b<j, d=jβb
case3: when dβ€0, let d=1 (might not be the optimized shift)
| P | a | b | d | a |
| index | 0 | 1 | 2 | 3 |
badchar_tbl becomes 1-D table:
| bad char | 0 | 1 | ... | 97 | 98 | 99 | 100 | ... | 255 |
| b | -1 | -1 | ... | 3 | 1 | -1 | 2 | ... | -1 |
case1:
| S | a | b | c | d | e | f | g | f |
| P | e | f | g | f |
| S | a | b | c | d | e | f | g | f |
| P | e | f | g | f |
case2:
| S | a | b | a | e | c | d | g | f |
| P | a | e | c | d |
| S | a | b | a | e | c | d | g | f |
| P | a | e | c | d |
case3:
| S | a | a | a | a | a | a | a | a |
| P | b | a | a |
| S | a | a | a | a | a | a | a | a |
| P | b | a | a |



good suffix on the left, shift d=jβs
π¬ good suffix with offset s for bad char P[j] can be pre-built in a suffix array

good suffix NOT on the left, shift d=m ?
β οΈ Prefix
bad char: "c", good suffix: "d c a", prefix: "c a"
| S | a | b | c | b | d | c | a | d | c | d | c | a | c | c |
| P | c | a | d | c | d | c | a |
| S | a | b | c | b | d | c | a | d | c | d | c | a | c | c | β |
| P | c | a | d | c | d | c | a |
| S | a | b | c | b | d | c | a | d | c | d | c | a | c | c | β |
| P | c | a | d | c | d | c | a |
a prefix in good suffix, shift d=r
π¬ prefix for length mβr checking can be pre-built in a prefix array

P: "abcdab"
| good suffix | len | suffix | prefix |
|---|---|---|---|
| b | 1 | suffix[1] = 1 | prefix[1] = false |
| ab | 2 | suffix[2] = 0 | prefix[2] = true |
| dab | 3 | suffix[3] = -1 | prefix[3] = false |
| cdab | 4 | suffix[4] = -1 | prefix[4] = false |
| bcdab | 5 | suffix[5] = -1 | prefix[5] = false |
def build_goodsuffix_arr(p): m = len(p) suffix = [-1 for _ in range(m)] prefix = [False for _ in range(m)] for i in range(m - 1): # two pointers to compare suffix j = i k = 0 while j >= 0 and p[j] == p[m - 1 - k]: k += 1 suffix[k] = j j -= 1 if j == -1: prefix[k] = True return suffix, prefixdef build_goodsuffix_arr(p): m = len(p) suffix = [-1 for _ in range(m)] prefix = [False for _ in range(m)] for i in range(m - 1): # two pointers to compare suffix j = i k = 0 while j >= 0 and p[j] == p[m - 1 - k]: k += 1 suffix[k] = j j -= 1 if j == -1: prefix[k] = True return suffix, prefix

def search_bm_shift_goodsuffix(suffix, prefix, m, j): k = m - 1 - j # length of good suffix # look for right most suffix with length k if suffix[k] != -1: # "suffix" exists return j - suffix[k] + 1 # look for the largest prefix in P[j+2...m]: j+2 <= r <= m-1 for r in range(j+2, m, 1): if prefix[m-r]: # "prefix" exists return r # no suffix, no prefix return mdef search_bm_shift_goodsuffix(suffix, prefix, m, j): k = m - 1 - j # length of good suffix # look for right most suffix with length k if suffix[k] != -1: # "suffix" exists return j - suffix[k] + 1 # look for the largest prefix in P[j+2...m]: j+2 <= r <= m-1 for r in range(j+2, m, 1): if prefix[m-r]: # "prefix" exists return r # no suffix, no prefix return m
locate bad char P[j] then decide d:
bad char d1
good suffix d2
d=max(d1,d2)
# comapre backward to find "bad char" for i in range(len_s - len_p + 1): j = len_p - 1 # comapre backward to find "bad char" while j >= 0 and s[i+j] == p[j]: j -= 1 if j < 0: # p is found return i # bad char found at s[i+j] != p[j] # bad char shift d1 d1 = j - badchar_tbl[ord(s[i+j])] d1 = 1 if d1 <= 0 else d1 # good suffix shift d2 d2 = 0 if len_p - 1 - j > 0: d2 = search_bm_shift_goodsuffix(suffix, prefix, len_p, j) i += max(d1, d2)# comapre backward to find "bad char" for i in range(len_s - len_p + 1): j = len_p - 1 # comapre backward to find "bad char" while j >= 0 and s[i+j] == p[j]: j -= 1 if j < 0: # p is found return i # bad char found at s[i+j] != p[j] # bad char shift d1 d1 = j - badchar_tbl[ord(s[i+j])] d1 = 1 if d1 <= 0 else d1 # good suffix shift d2 d2 = 0 if len_p - 1 - j > 0: d2 = search_bm_shift_goodsuffix(suffix, prefix, len_p, j) i += max(d1, d2)
π A new proof of the linearity of the Boyer-Moore string searching algorithm
π Tight bounds on the complexity of the Boyer-Moore string matching algorithm








| Pattern Matching | avg. | best | worst |
|---|---|---|---|
| KMP | O(m + n) | O(n) | O(m + n) |
| BP | O(m + n) | O(m + n/m) | O(m + nΒ·m) |
| Sunday | O(m + n) | O(m + n/m) | O(m + nΒ·m) |
π KMP always linear
π‘ graph can model many things in real world such as roads, airline routes, social media connections, etc.
π‘ graph theory, the study of graphs, is a field of discrete mathematics.



A graph is an ordered pair G=(V,E) comprising a set V of vertices or nodes and a collection of pairs of vertices from V, known as edges of a graph. For example, for the graph below:
V=1,2,3,4,5,6
E=(1,4),(1,6),(2,6),(4,5),(5,6)

An undirected graph(graph) is a graph in which edges have no orientation. The edge (x,y) is identical to edge (y,x), i.e., they are not ordered pairs. The maximum number of edges possible in an undirected graph without a loop is nΓ(nβ1)/2.

social: friends
A directed graph (digraph) is a graph in which edges have orientations, i.e., The edge (x,y) is not identical to edge (y,x).


social: follows
A mixed graph has both undirected and directed edges.
A weighted graph associates a value (weight) with every edge in the graph.
An unweighted graph does not have any value (weight) associated with every edge in the graph. In other words, an unweighted graph is a weighted graph with all edge weight as 1. Unless specified otherwise, all graphs are assumed to be unweighted by default.

A multigraph is an undirected graph in which multiple edges (and sometimes loops) are allowed.
A simple graph is an undirected graph in which both multiple edges and loops are disallowed as opposed to a multigraph. In a simple graph with n vertices, every vertexβs degree is at most nβ1.

A connected graph has a path between every pair of vertices. In other words, there are no unreachable vertices. A disconnected graph is a graph that is not connected.

A Directed Acyclic Graph (DAG) is a directed graph that contains no cycles.

A complete graph is one in which every two vertices are adjacent: all edges that could exist are present.
E=2Vβ (Vβ1)β

If a graph G has vertex set V and m edges, then vinVββdeg(v)=2m
If a directed graph G has vertex set V and m edges, then vinVββindeg(v)=vinVββoutdeg(v)=m
A simple graph G has n vertices and m edges:
An undirected graph G has n vertices and m edges:
An adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. For a simple unweighted graph with vertex set V, the adjacency matrix is a square β£Vβ£Γβ£Vβ£ matrix A such that its element:
Aijβ=1, when there is an edge from vertex Viβ to vertex Vjβ, and
Aijβ=0, when there is no edge.
Each row in the matrix represents source vertices, and each column represents destination vertices. The diagonal elements of the matrix are all zero since edges from a vertex to itself, i.e., loops are not allowed in simple graphs. If the graph is undirected, the adjacency matrix will be symmetric. Also, for a weighted graph, Aijβ can represent edge weights.

An adjacency list representation for the graph associates each vertex in the graph with the collection of its neighboring vertices or edges, i.e., every vertex stores a list of adjacent vertices. There are many variations of adjacency list representation depending upon the implementation. This data structure allows the storage of additional data on the vertices but is practically very efficient when the graph contains only a few edges. i.e. the graph is sparse.


vertex_count()
edge_count()
vertices()
edges()
get_edge(vi, vj)
degree(v, out=True), degree(v, out=False)
incident_edge(v, out=True), incident_edge(v, out=False)
insert_vertex(w=None)
insert_edge(vi,vj,e=None)
remove_vertex(v)
remove_edge(e)
| Space Complexity | construction | has_edge | adj | |
|---|---|---|---|---|
| Adjacency Matrix | O(V^2) | O(E) | O(1) | O(V) π |
| Adjacency List | O(V + E) | O(E), O(E * V) | O(deg(v)), O(V) | O(deg(v)), O(V) |
Red-Black Tree?
HashSet?


preorder(root) preorder(TreeNode node): if node != None: list.add(node.val) preorder(node.left) preorder(node.right)preorder(root) preorder(TreeNode node): if node != None: list.add(node.val) preorder(node.left) preorder(node.right)
visited[0..V-1] = false dfs(0) dfs(int v): visited[v] = true list.add(v) for (int w: incident_edges(v)): if !visited[w]: dfs(w)visited[0..V-1] = false dfs(0) dfs(int v): visited[v] = true list.add(v) for (int w: incident_edges(v)): if !visited[w]: dfs(w)
visited[0..V-1] = false dfs(0) dfs(int v): visited[v] = true list.add(v) for (int w: incident_edges(v)): if !visited[w]: dfs(w)visited[0..V-1] = false dfs(0) dfs(int v): visited[v] = true list.add(v) for (int w: incident_edges(v)): if !visited[w]: dfs(w)
dfs(0) β dfs(1) β dfs(3) β dfs(2) β dfs(6) β dfs(5) dfs(0) β dfs(1) dfs(0) β dfs(1) β dfs(4)
list: 0 1 3 2 6 5 4
O(V+E)

0 : 1 2 1 : 0 3 4 2 : 0 3 6 3 : 1 2 5 4 : 1 5 : 3 6 6 : 2 5
prev[]:
1 β 0
2 β 3
3 β 1
4 β 1
6 β 2
path from 0 to 6: 6 β 2 β 3 β 1 β 0

a bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets U and V, that is every edge connects a vertex in U to one in V.

Queue
0 :
2 β 1 : 0
4 β 3 β 2 : 1
6 β 5 β 4 β 3 : 2
6 β 5 β 4 : 3
6 β 5 : 4
6 : 5
: 6
BFS order: 0 1 2 3 4 5 6

Queue
0 :
2 β 1 : 0
4 β 3 β 2 : 1
6 β 4 β 3 : 2
5 β 6 β 4 : 3
5 β 6 : 4
5 : 6
: 5
BFS order: 0 1 2 3 4 6 5
O(V+E)

π shortest path

Each iteration:
| 0 | 1 | 2 | 3 | 4 |
| 0 | β | β | β | β |
| 0 | 4 | 2 | β | β |
| 0 | 3 | 2 | 6 | 7 |
| 0 | 3 | 2 | 6 | 7 |
| 0 | 3 | 2 | 5 | 6 |
| 0 | 3 | 2 | 5 | 6 |
| 0 | 3 | 2 | 5 | 6 |
Path (0 β> 1): Minimum cost = 3, Route = [0, 2, 1]
Path (0 β> 2): Minimum cost = 2, Route = [0, 2]
Path (0 β> 3): Minimum cost = 5, Route = [0, 2, 1, 3]
Path (0 β> 4): Minimum cost = 6, Route = [0, 2, 1, 4]


Space Complexity: O(V)
Time Complexity: O(V2) β O(ElogV) β O(E+VlogV)
shortest path from src to dest
all possible shortest path: O(VβElogV)
negative weight?
π Bellman-Ford
π FloydβWarshall
Spanning Tree: DFS & BFS

repeatedly add the next lightest edge if this doesnβt producea cycle
greedy algorithm
1, 2: 1 3, 4: 1 0, 1: 2 0, 5: 2 1, 4: 3 1, 3 β 2, 4 β 2, 5 β 3, 6: 5 0, 3 β 4, 6 β

In graph theory, a cut can be defined as a partition that divides a graph into two disjoint subsets.
A cut C=(S1β,S2β) in a connected graph G(V,E), partitions the vertex set V into two disjoint subsets S1β, and S2β.
A cut set of a cut C(S1β,S2β) of a connected graph G(V,E) can be defined as the set of edges that have one endpoint in S1β and the other in S2β. For example the C(S1β,S2β) of G(V,E)=(i,j)βEβ£iβS1β,jβS2β
An edge is a crossing edge as an edge which connects a node from one set to a node from the other set.
π bipartitie graph: find a cut, so that every edge is crossing edge
According to the cut property, given any cut, the minimum weight crossing edge is in the MST.

0, 3: 7
1, 3: 4
1, 4: 3 π
2, 4: 4

ππ»ββοΈ implementation
O(ElogE)
start from 0, find the lightest crossing edge, and repeatly expand the cut
greedy algorithm
0, 1: 2 S1 = 0
1, 2: 1 S1 = 0, 1
0, 5: 2 S1 = 0, 1, 2
1, 4: 3 S1 = 0, 1, 2, 5
3, 4: 1 S1 = 0, 1, 2, 5, 4
3, 6: 5 S1 = 0, 1, 2, 5, 4, 3
S1 = 0, 1, 2, 5, 4, 3, 6

ππ»ββοΈ implementation
O(V2), O(ElogV), O(ElogE)
Fedman-Tarjan O(E+VlogV)
Chazelle O(Eβ)
Input: Problem P
Algorithm divideAndConquer(P):
Merge-sort an array of n elements:

A greedy algorithm is a simple and efficient algorithmic approach for solving any given problem by selecting the best available option at that moment of time, without bothering about the future results.
To create greedy algorithm:
0-1 Knapsack: greedy and most valuable item?
w=[1,2,3], v=[6,10,12], C=5
| 0 | 1 | 2 | |
|---|---|---|---|
| weight | 1 | 2 | 3 |
| value | 6 | 10 | 12 |
| v/w | 6 | 5 | 4 |
6+10=16 β
10+12=22 β
Given an array of intervals intervals where intervals[i] = [starti, endi], return the minimum number of intervals you need to remove to make the rest of the intervals non-overlapping.
Input: intervals = [[1,2],[2,3],[3,4],[1,3]]
Output: 1
Explanation: [1,3] can be removed and the rest of the intervals are non-overlapping.
Input: intervals = [[1,2],[1,2],[1,2]]
Output: 2
Explanation: You need to remove two [1,2] to make the rest of the intervals non-overlapping.
Greedy: sort endi, select the smaller endi and non-overlapping intervals[i]
why correct?

Input
[["5","3",".",".","7",".",".",".","."],
["6",".",".","1","9","5",".",".","."],
[".","9","8",".",".",".",".","6","."],
["8",".",".",".","6",".",".",".","3"],
["4",".",".","8",".","3",".",".","1"],
["7",".",".",".","2",".",".",".","6"],
[".","6",".",".",".",".","2","8","."],
[".",".",".","4","1","9",".",".","5"],
[".",".",".",".","8",".",".","7","9"]]

Output
[["5","3","4","6","7","8","9","1","2"],
["6","7","2","1","9","5","3","4","8"],
["1","9","8","3","4","2","5","6","7"],
["8","5","9","7","6","1","4","2","3"],
["4","2","6","8","5","3","7","9","1"],
["7","1","3","9","2","4","8","5","6"],
["9","6","1","5","3","7","2","8","4"],
["2","8","7","4","1","9","6","3","5"],
["3","4","5","2","8","6","1","7","9"]]
backtracking vs brute force:
backtracking vs recursion:
Dynamic Programming (DP) is a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just one, and storing their solutions - ideally, using a memory-based data structure.
Writes down "1+1+1+1+1+ =" on a sheet of paper.
"Whatβs that equal to?"
Counting "Five!"
Writes down another "1+" on the left.
"What about that?"
"Six!" " Howβd you know it was nine so fast?"
"You just added one more!"
"So you didnβt need to recount because you remembered there were five!
Dynamic Programming is just a fancy way to say remembering stuff to save time later!"
Fibonacci sequence: 0,1,1,2,3,5,8, ... when n = 1, fib(1) = 0 when n = 2, fib(2) = 1 when n > 2, fib(n) = fib(n-1) + fib(n-2)Fibonacci sequence: 0,1,1,2,3,5,8, ... when n = 1, fib(1) = 0 when n = 2, fib(2) = 1 when n > 2, fib(n) = fib(n-1) + fib(n-2)
# Time Complexity: O(2^n) def fib_recursive(n): global fib_run fib_run += 1 if (n == 1): return 0 if (n == 2): return 1 return fib_recursive(n-1) + fib_recursive(n-2)# Time Complexity: O(2^n) def fib_recursive(n): global fib_run fib_run += 1 if (n == 1): return 0 if (n == 2): return 1 return fib_recursive(n-1) + fib_recursive(n-2)

Any problem is said to have overlapping subproblems if calculating its solution involves solving the same subproblem multiple times.
This method of remembering the solutions of already solved subproblems is called Memoization.
# Time Complexity: O(n) def fib_recursive_memo(n): global fib_run, fib_memo fib_run += 1 # return directly from cache if found if (n in fib_memo): return fib_memo[n] if (n == 1): return 0 if (n == 2): return 1 # cache the result fib_memo[n] = fib_recursive_memo(n-1) + fib_recursive_memo(n-2) return fib_memo[n]# Time Complexity: O(n) def fib_recursive_memo(n): global fib_run, fib_memo fib_run += 1 # return directly from cache if found if (n in fib_memo): return fib_memo[n] if (n == 1): return 0 if (n == 2): return 1 # cache the result fib_memo[n] = fib_recursive_memo(n-1) + fib_recursive_memo(n-2) return fib_memo[n]

bottom up is the opposite of the top-down approach which avoids recursion by solving all the related subproblems first.
# Time Complexity: O(n) def fib_dp(n): dp_memo = {} dp_memo[0] = 0 dp_memo[1] = 0 dp_memo[2] = 1 for i in range(2, n+1, 1): dp_memo[i] = dp_memo[i-1] + dp_memo[i-2] return dp_memo[n]# Time Complexity: O(n) def fib_dp(n): dp_memo = {} dp_memo[0] = 0 dp_memo[1] = 0 dp_memo[2] = 1 for i in range(2, n+1, 1): dp_memo[i] = dp_memo[i-1] + dp_memo[i-2] return dp_memo[n]

DP in a nutshell:

π‘ usually DP problem is hard
π‘ different DP through exercises
Given weights and values of n items, put these items in a knapsack of capacity C to get the maximum total value in the knapsack. In other words, given two integer arrays v[0..nβ1] and w[0..nβ1] which represent values and weights associated with n items respectively. Also given an integer C which represents knapsack capacity, find out the maximum value subset of v such that sum of the weights of this subset is smaller than or equal to C. You cannot break an item, either pick the complete item or donβt pick it (0-1 property).
w=[10,20,30], v=[60,100,120], C=50
solution: 200
w=10; v=60;
w=20; v=100;
w=30; v=120;
w=(20+10); v=(100+60);
w=(30+10); v=(120+60);
w=(30+20); v=(120+100);
w=(30+20+10) > 50
k(n,C): n items, capacity C, return max value
k(i,C):
k(i,C)=max(k(iβ1,C),v[i]+k(iβ1,Cβw[i]))
Time Complexity: O(2n)
w=[1,1,1]
v=[10,20,30]
C=2

id=[0,1,2]
w=[1,2,3]
v=[6,10,12]
C=5
k(i,c)=max(k(iβ1,c),v[i]+k(iβ1,cβw[i]))
| 0 | 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|---|
| 0 | 0 | 6 | 6 | 6 | 6 | 6 |
| 1 | 0 | 6 | 10 | 16 | 16 | 16 |
| 2 | 0 | 6 | 10 | 16 | 18 | 22 |
Time Complexity: O(n)
Space Complexity: O(nβC)
def knapsack2(w, v, n, C): k = [[0 for x in range(C+1)] for x in range(n+1)] # build table k from bottom up for i in range(n+1): for c in range(C+1): if i==0 or c==0: k[i][c] = 0 elif w[i-1] > c: k[i][c] = k[i-1][c] else: k[i][c] = max(v[i-1] + k[i-1][c - w[i-1]], k[i-1][c]) return k[n][C]def knapsack2(w, v, n, C): k = [[0 for x in range(C+1)] for x in range(n+1)] # build table k from bottom up for i in range(n+1): for c in range(C+1): if i==0 or c==0: k[i][c] = 0 elif w[i-1] > c: k[i][c] = k[i-1][c] else: k[i][c] = max(v[i-1] + k[i-1][c - w[i-1]], k[i-1][c]) return k[n][C]
k(i,c)=max(k(iβ1,c),v[i]+k(iβ1,cβw[i]))
row i depends on row iβ1, so only 2 rows required:
Time Complexity: O(n)
Space Complexity: O(2βC)
def knapsack3(w, v, n, C): # 2 rows only: i%2 k = [[0 for x in range(C+1)] for x in range(2)] # build table k from bottom up for i in range(n+1): for c in range(C+1): if i==0 or c==0: k[i % 2][c] = 0 elif w[i-1] > c: k[i % 2][c] = k[(i-1) % 2][c] else: k[i % 2][c] = max(v[i-1] + k[(i-1) % 2][c - w[i-1]], k[(i-1) % 2][c]) return k[n % 2][C]def knapsack3(w, v, n, C): # 2 rows only: i%2 k = [[0 for x in range(C+1)] for x in range(2)] # build table k from bottom up for i in range(n+1): for c in range(C+1): if i==0 or c==0: k[i % 2][c] = 0 elif w[i-1] > c: k[i % 2][c] = k[(i-1) % 2][c] else: k[i % 2][c] = max(v[i-1] + k[(i-1) % 2][c - w[i-1]], k[(i-1) % 2][c]) return k[n % 2][C]
id=[0,1,2]
w=[1,2,3]
v=[6,10,12]
C=5
k(i,c)=max(k(iβ1,c),v[i]+k(iβ1,cβw[i]))
| 0 | 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|---|
| 0 | 6 | 6 | 6 | 6 | 6 |
| 0 | 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|---|
| 0 | 6 | 10 | 16 | 16 | 16 |
Time Complexity: O(n)
Space Complexity: O(C)
def knapsack4(w, v, n, C): k = [0 for i in range(C+1)] for i in range(1, n+1): # compute from the back (right to left) for c in range(C, 0, -1): if w[i-1] <= c: k[c] = max(v[i-1] + k[c - w[i-1]], k[c]) return k[C]def knapsack4(w, v, n, C): k = [0 for i in range(C+1)] for i in range(1, n+1): # compute from the back (right to left) for c in range(C, 0, -1): if w[i-1] <= c: k[c] = max(v[i-1] + k[c - w[i-1]], k[c]) return k[C]
You are given k identical eggs and you have access to a building with n floors labeled from 1 to n.
You know that there exists a floor f where 0<=f<=n such that any egg dropped at a floor higher than f will break, and any egg dropped at or below floor f will not break.
In each move, you may take an unbroken egg and drop it from any floor x (where 1<=x<=n). If the egg breaks, you can no longer use it. However, if the egg does not break, you may reuse it in future moves.
Return the minimum number of moves that you need to determine with certainty what the value of f is.
Input: k = 2, n = 100
Output: 14
if drop at floor f:
drop m times: floor[k][m]=floor[kβ1][mβ1]+floor[k][mβ1]+1
when floor[k][m]==n, m=?
The most important difference in Divide and Conquer strategy is that the subproblems are independent of each other. When a problem is divided into subproblems, they do not overlap which is why each subproblem is to be solved only once.
Whereas in DP, a subproblem solved as part of a bigger problem may be required to be solved again as part of another subproblem (concept of overlapping subproblem), so the results of a subproblem is solved and stored so that the next time it is encountered, the result is simply fetched and returned.
| Parameters | Dynamic Programming | Greedy Approach |
|---|---|---|
| Optimality | There is guaranteed optimal solution as DP considers all possible cases and then choose the best among them. | Provides no guarantee of getting optimum approach. |
| Memory | DP requires a table or cache for remembering and this increases itβs memory complexity. | More memory efficient as it never looks back or revises its previous choices. |
| Time complexity | DP is generally slower due to considering all possible cases and then choosing the best among them. | Generally faster. |
| Feasibility | Decision at each step is made after evaluating current problem and solution to previously solved subproblem to calculate optimal solution. | Choice is made which seems best at the moment in the hope of getting global optimal solution. |
Review "sift up" and "sift down" of MinHeap, implement your MaxHeap.
Give an O(logN * logN) algorithm to merge two binary heap.
Design a Min-Max Heap that supports both remove_min and remove_max in O(logN) per operation.
Implement a classic Cuckoo Hash Table (Cuckoo hashing) and support the basic operations: insert, get and remove a key.
Implement BST search operation with iterative solution.
before(k) & after(k) in BST would not work if BST doesnot contain the key k, pls improve the algorithm to support such case.
(a) convert a BST into a MinHeap
(b) convert a MinHeap into a BST
Optimize AVL tree so when there is no change to the hight of nodes, the rebalance process can be stopped.
Implement a map data structure with AVL tree and support the basic operations: insert, get and remove a key.
Implement your Insertion Sort and sort the number array from right to left.
Implement your Bubble Sort and sort the number array from left to right.
Implement your Merge Sort and use bottom-up approach.
Shell Sort is an optimization of Insertion Sort. Implement your Shell Short.
Dual Pivot Quick Sort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch, this algorithm offers O(NlogN) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations. Implement your Dual Pivot Quick Sort.
Review KMP and implementate your solution
Review BM and implementate your solution
Iterative implementation of DFS with Adjacency List
Iterative implementation of BFS with Adjacency List
Modify DFS to detect cycle in undirected graph
Modify BFS to find a path (Single Source Shortest Path) in a undirected graph
Implement your Dijkstra algorithm with WeightdeGraph class
Implement your Kruskal algorithm with WeightdeGraph class
Implement your Prim algorithm with WeightdeGraph class
Q&A