algorithms

bounds

  • upper bound f(n)=O(g(n))f(n) = O(g(n)) if f(n)f(n) grows as most as fast as g(n)g(n)
  • lower bound f(n)=Ω(g(n))f(n) = \Omega(g(n)) if f(n)f(n) grows as least as fast as g(n)g(n)
  • tight bound f(n)=Θ(g(n))f(n) = \Theta(g(n)) if f(n)f(n) grows as fast as g(n)g(n)

sum of theta

  • if f(n)=Θ(g(n))f(n) = \Theta(g(n)) and h(n)=Θ(g(n))h(n) = \Theta(g(n)), then f(n)+h(n)=Θ(max(f(n),h(n)))f(n) + h(n) = \Theta(max(f(n), h(n)))

product of theta

  • if f(n)=Θ(g(n))f(n) = \Theta(g(n)) and h(n)=Θ(g(n))h(n) = \Theta(g(n)), then f(n)h(n)=Θ(f(n)h(n))f(n) \cdot h(n) = \Theta(f(n) \cdot h(n))
  • time complexity: O(n)O(n)
  • space complexity: O(1)O(1)

constants in time complexity

  • if we have T1(n)=1000000n=Θ(n)T_1(n) = 1000000n=\Theta(n) and T2(n)=0.0001n2=Θ(n2)T_2(n) = 0.0001n^2=\Theta(n^2), then T1(n)T_1(n) is worse for all but really large nn

arrays (python)

  • accessing element, need its address O(1)O(1)

  • to resize array, new chunk of memory needs to be found; old values copied over

  • np.append takes O(n)O(n) (better to pre-allocate)

linked list

  • retrieve k-th elem: O(k)O(k) if you dont know address, O(1)O(1) if you do
  • append/pop: O(1)O(1)
  • insert/remove k-th elem: O(k)O(k) if you dont know address, O(1)O(1) if you do
  • allocation not needed

dynamic array

  • allocate memory for underlying array
    • there is physical size and logical size
    • when logical size is full, allocate new underlying array whose physical size is γ\gamma (growth factor) times the previous
    • normally gamma is double
    • the last append when physical size is full takes O(n)O(n), otherwise takes O(1)O(1)

amortized analysis

  • spreading cost of something over time

  • in a sequence of n appends, how many caused physical size to grow?

    • simplicity: assume n is such that n-th append causes physical size to grow
    • x is number of resizes
    • n=βγxn=\beta\gamma^x
    • x=logγnβx=\log_\gamma \dfrac{\text{n}}{\beta}
    • number of resizes is logγnβ+1\log_\gamma \dfrac{\text{n}}{\beta} + 1
  • time for jth resize: O(cβγj1)O(c\beta\gamma^{j-1}) where c is constant that depends on γ\gamma

  • total time: cβj=1xγj1=Θ(cβγx1γ1)=Θ(cβnβγ1)=Θ(cnγ1)=Θ(nγ1)=Θ(n1)=Θ(n)c\beta\sum_{j=1}^{x} \gamma^{j-1} = \Theta(c\beta\dfrac{\gamma^x - 1}{\gamma - 1}) = \Theta(c\beta\dfrac{\text{n} - \beta}{\gamma - 1}) = \Theta(c\dfrac{\text{n}}{\gamma - 1}) = \Theta(\dfrac{\text{n}}{\gamma - 1}) = \Theta(\dfrac{\text{n}}{1}) = \Theta(\text{n})

  • amortized time is Θ(n)n=Θ(1)\dfrac{\Theta(\text{n})}{\text{n}} = \Theta(1)

binary heaps

  • binary tree normally used to implement priority queues
  • full if every node has either 0 or 2 children
  • perfect if all internal nodes have 2 children and all leaves are at the same level
  • complete if all levels are filled except possibly the last, which is filled from left to right
  • height of node is largest number of edges from root to leaf
  • height of tree is height of root
  • height of complete tree and complete binary tree is Θ(logn)\Theta(\log n)

binary max heap (min heap) is binary tree that:

  • each node has a key

  • shape: tree is complete

  • max heap: key of node is greater than or equal to keys of children

  • min heap: key of node is less than or equal to keys of children

  • .max takes O(1)O(1)

  • .insert takes O(logn)O(\log n)

  • .delete takes O(logn)O(\log n)

  • .pop_max_key takes O(logn)O(\log n)

def _push_down(self, i):
    left = left_child(i)
    right = right_child(i)
    if (
        left < len(self.keys)
        and
        self.keys[left] > self.keys[i]
    ):
        largest = left
    else:
        largest = i
 
    if (
        right < len(self.keys)
        and
        self.keys[right] > self.keys[largest]
    ):
        largest = right
 
    if largest != i:
        self._swap(i, largest)
        self._push_down(largest)

online median

  • given a stream of numbers one at a time, find the median of all numbers seen so far
  • keep a max heap of the first half and a min heap of the second half
  • may be unbalanced, so we need to rebalance
def _rebalance(self):
    if len(self.max_heap) > len(self.min_heap) + 1:
        self.min_heap.append(self.max_heap.pop())
        self._push_up(len(self.min_heap) - 1)
    elif len(self.min_heap) > len(self.max_heap):
        self.max_heap.append(self.min_heap.pop())
        self._push_down(len(self.max_heap) - 1)

dynamic set

  • similar to dictionary
  • store n elements in hash table
  • initial cost: O(n)O(n)
  • insert: O(1)O(1)
  • search: O(1)O(1)

hashing

  • collisions stored in same bin but in linked list
  • good hash function should utilize all bins evenly

binary search tree

  • alternative way to dynamic sets

  • slower insert/deletion and query than dynamic set

  • preserves data in sorted order

  • BST is binary tree that satisfies following for any node x:

    • left subtree of x contains only keys less than x
    • right subtree of x contains only keys greater than x
    • left and right subtrees must also be binary search trees
  • Each element stored as a node at an arbitrary address in memory
  • Each node has a key and pointers to left child, right child, and parent nodes (if they exist).

(diagram 1: tree representation)

      6
     / \
    12  33

(diagram 2: memory representation with pointers)

An array-like structure representing memory slots.

[ , l, 6, r, p, , l, 33, r, p, , , l, 12, r, p, , ]

  • pointers are shown with arrows:

    • the 'l' (left child) pointer for node 6 points to node 12.
    • the 'r' (right child) pointer for node 6 points to node 33.
    • the 'p' (parent) pointer for node 12 points to node 6.
    • the 'p' (parent) pointer for node 33 points to node 6.
    • other pointers (like children of 12 and 33, and parent of 6) might be null or point elsewhere depending on the full structure.
  • in order traversal: O(n)O(n) - prints node values in ascending order

  • pre order traversal: O(n)O(n) - visits root, left subtree, right subtree

  • post order traversal: O(n)O(n) - visits left subtree, right subtree, then root

  • deleting node with no leaf is easy, if one child then easy

  • otherwise use the idea of rotating node to bottom, preserving BST, then when it is a leaf or has one child, delete it.

  • right rotation: Image

  • left rotation: Image

  • rotate takes O(1)O(1)

range queries in BST

  • ceiling of x is smallest key greater than or equal to x

  • sucessor of node u is smallest node greater than u

  • strat: find ceiling of a ([a,b] is range), repeatedly find sucessor until > b

BST : in general vs balanced vs unbalanced

  • query, insertion/deletion: O(h)O(h) in general, O(logn)O(\log n) in balanced, O(n)O(n) in unbalanced

balanced BSTs

  • AVL trees
  • red-black trees
    • BST whose nodes are colored red and black
    • leaf nodes are "nil"
    • must satisfy:
      • root nodes are black
      • every leaf node is black
      • if a node is red, both child nodes are black
      • for any node, all paths fro node to leaf contain same number of black nodes
    • n internal (non-nil) nodes = height is at most 2log(n+1)2\log (n+1)
    • non-modifying operations: O(logn)O(\log n)
    • expected height of BST built by inserting nn random keys: O(logn)O(\log n)

treaps

  • binary tree in which each node has both key and priority

    • max heap with respect to priority
    • BST with respect to key
  • insertion:

    • using key, find place to insert node as if in BST
    • while priority of node is more than parent, right rotate new node if it is left child, left rotate new node if it is right child
    • repeat until heap property is satisfied
  • deletion:

    • while node is not leaf:
      • rotate it with child of highest priority
      • once it is a leaf, delete it
  • if all keys and priorioties are unique, treap is unique.

randomized binary search trees

  • when inserting node into treap, generate priority
  • runtimes are normally O(logn)O(\log n) but can be O(n)O(n) in the worst case (very rare)

citation: dsc190 advanced algos @ UCSD