Terminology - Trees

  • Trees
    • A data structure with a root node and children
    • It is a specfic type of graphs (all trees are graphs).
    • A node can have any number of children (0 to N), but every node can have only one parent!
      • This is what separates a tree from a graph: if a node has 2 parents, it is no longer a tree but it's a graph
    • There is no strictly required ordering of data
    • An example: A linkedlist is a tree that doesn't fork
  • Binary Tree
    • A specific type of tree that can follows the binary tree rule: at most 2 children (left and right)
    • There is still no required ordering of the values.
      • So if you were to search for some node, it would be O(n).

Different tree types

Various ways of organizing data within a tree structure:


  • Binary Search Tree (BST)
    • A specific type of binary tree
    • Rule
      • Must satisfy the binary tree rule (max 2 children) AND The BST property
      • The BST property
        • Everything in the left subtree is smaller than the parent
        • Everything in the right subtree is larger than the parent
        • Left < Parent < Right
    • Purpose: For fast searching
      • Because of the sorting, you can cut the search space in half every step (binary search) -> O(log n).
      • But this only works if the tree is balanced. Imagine a linkedlist tree... then it's O(n)
  • The Heap (Binary Heap)
    • A specific type of binary tree & always a complete tree (balanced)
    • Rule
      • The parent is always greater than (Max-Heap) or smaller than (Min-Heap) its children.
      • There is no relationship between the left and right children.
    • Purpose: Find min/max
      • Gives fast access to the extreme element (max or min), O(1)
  • Trie (Prefix Tree)
  • B-Tree / B+ Tree
  • AVL / Red-Black Trees
    • Self balancing BSTs

Types of binary trees

  • Full Binary Tree
    • Every node has either 0 or 2 children.
      1              1
     / \            / \
    2   3          2   3
   / \            /
  4   5          4
  (Full)      (Not Full - Node 2 has 1 child)
  • Perfect Binary Tree
    • All internal nodes have 2 children, and all leaf nodes are at the exact same level. (The triangle is perfectly filled)

    • Any level in the tree that has any nodes is completely filled all the way across

       1                 1
     /   \             /   \
    2     3           2     3
   / \   / \         / \    /
  4   5 6   7       4   5  6
 (Perfect)         (Not Perfect)
 (Last level       (Last level
  filled)          not filled)
  • Complete Binary Tree
    • It must have all levels filled completely, except possibly the last level
      • Crucially, the nodes in the last level must be filled from left to right
    • A perfect tree is also complete
       1                 1
     /   \             /   \
    2     3           2     3
   / \   /           / \     \
  4   5 6           4   5     7
 (Complete)       (Not Complete)
 (Last level       (Node 7 exists, but
  filled L->R)      Node 6 is missing)

Balanced VS Unbalanced Trees

"Balanced" refers only to the shape (height) of the tree & not the values inside.

  • Balanced Tree
    • The left and right subtrees of any node differ in height by at most 1
    • Guarantees that search/insert/delete operations take O(log n) time
    • Examples: AVL Trees, Red-Black Trees (these algorithms automatically rearrange nodes to keep the tree balanced)
  • Unbalanced Tree
    • The tree leans heavily to one side
    • A Degenerate Tree (or Pathological Tree) is a tree where every parent has only one child -> It essentially becomes a Linked List
    • Searching becomes O(n) (slow), effectively losing the advantage of using a tree

Depth, Height, and Big O

  • Depth (Level)
    • Measures how far a node is down from the root (depth 0)
    • We count edges downwards
  • Height
    • Measures how far a node is up from the deepest leaf (height 0)
    • The "height of the tree" refers to the height of the root node
        (Root)
          A           <-- Depth: 0 | Height: 2
        /   \
       B     C        <-- Depth: 1 | Height: 1
      / \     \
     D   E     F      <-- Depth: 2 | Height: 0

In a Perfect Binary Tree, every time we go down one level (depth), the number of nodes we can hold doubles:

Depth 0:       O            (2^0 = 1 node)
             /   \
Depth 1:    O     O         (2^1 = 2 nodes)
           / \   / \
Depth 2:  O   O O   O       (2^2 = 4 nodes)

Total Nodes (n) = 7
Height (h) = 2
  • Formulas
    • Total nodes = n=2h+1βˆ’1n = 2^{h+1} - 1
      • So roughly, total nodes is close to 2h2^h, or nβ‰ˆ2hn \approx 2^h
    • Nodes at last level: n=2hn = 2^h
    • hβ‰ˆlog2(n)h \approx \text{log}_2(n)
      • hh: height of tree
      • nn: number of total nodes
  • Big O analysis
    • Best case: Balanced Tree
      • hβ‰ˆlog⁑2nh \approx \log_{2}n
      • Complexity: O(log⁑n)O(\log n)
      • The search speed is very fast because every step we cut the search speed in half
    • Worst case: Unbalanced / Degenerate Tree
      • hβ‰ˆnh \approx n
      • Complexity: O(n)O(n)
      • The search speed is very slow because we can't cut the search space in half but have to search through every node.

Array representation

     1
   /   \
  2     3
 /     /
4     6

[_,1,2,3,4,_,6]
  • You can represent binary trees as an array.
  • Given some index i:
    • left child: 2i
    • right child: 2i + 1

Algorithms (+ Code)

# Binary Tree
class TreeNode:
    def __init__(self, val, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

    def __str__(self):
        return str(self.val)

BFS

  • Also called level order traversal
  • Implemented with a queue
from collections import deque

def bfs(node):
  if not root:
    return
  q = deque([node])
  result = []

  while q:
    cur = q.popleft()
    result.append(cur)
    if cur.left:
      q.append(cur.left)
    if cur.right:
      q.append(cur.right)
  • You initialize the stack with the 1st node

DFS

  • Usually implemented using a stack (or just recursion)

Three different ways in doing DFS

It helps if you see the order of the root!

Next, these are code to print out the nodes' values in the following orders, storing the results in a results arraylist (in Java).

PreOrder

  • Root -> Left -> Right
def pre_order(node):
  if not node:
    return

  print(node)
  pre_order(node.left)
  pre_order(node.right)

InOrder

  • Left -> Root -> Right
  • Look at the name "inorder" -> the results will be in order!
def pre_order(node):
  if not node:
    return

  pre_order(node.left)
  print(node)
  pre_order(node.right)

PostOrder

  • Left -> Right -> Root
def pre_order(node):
  if not node:
    return

  pre_order(node.left)
  pre_order(node.right)
  print(node)

Sources