balanced search trees coursenotes

Upload: bsitler

Post on 14-Apr-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Balanced Search Trees Coursenotes

    1/57

    http://algs4.cs.princeton.edu

    Algorithms ROBERT SEDGEWICK | KEVIN WAYNE

    3.3 BALANCED SEARCH TREES

    2-3 search trees

    red-black BSTs

    B-trees

  • 7/30/2019 Balanced Search Trees Coursenotes

    2/57

    2

    Symbol table review

    Challenge. Guarantee performance.

    This lecture. 2-3 trees, left-leaning red-black BSTs, B-trees.

  • 7/30/2019 Balanced Search Trees Coursenotes

    3/57

    http://algs4.cs.princeton.edu

    2-3 search trees

    red-black BSTs

    B-trees

    3.3 BALANCED SEARCH TREES

  • 7/30/2019 Balanced Search Trees Coursenotes

    4/57

    Allow 1 or 2 keys per node.

    2-node: one key, two children.

    3-node: two keys, three children.

    Perfect balance. Every path from root to null link has same length.

    2-3 tree

    4

    S XA C PH

    R

    M

    L

    3-node

    E J

    2-node

    null link

  • 7/30/2019 Balanced Search Trees Coursenotes

    5/57

    Allow 1 or 2 keys per node.

    2-node: one key, two children.

    3-node: two keys, three children.

    Perfect balance. Every path from root to null link has same length.

    Symmetric order. Inorder traversal yields keys in ascending order.

    2-3 tree

    5

    between E and J

    larger than Jsmaller than E

    S XA C PH

    R

    M

    L

    E J

  • 7/30/2019 Balanced Search Trees Coursenotes

    6/57

    Search.

    Compare search key against keys in node.

    Find interval containing search key.

    Follow associated link (recursively).

    2-3 tree demo

    6

    search for H

    S XA C PH

    R

    M

    L

    E J

  • 7/30/2019 Balanced Search Trees Coursenotes

    7/57

    Insertion into a 3-node at bottom.

    Add new key to 3-node to create temporary 4-node.

    Move middle key in 4-node into parent.

    Repeat up the tree, as necessary.

    If you reach the root and it's a 4-node, split it into three 2-nodes.

    2-3 tree demo

    7

    S XA C PH

    E R

    Linsert L

  • 7/30/2019 Balanced Search Trees Coursenotes

    8/57

    2-3 tree construction demo

    8

    insert S

    S

  • 7/30/2019 Balanced Search Trees Coursenotes

    9/57

    S XA C

    2-3 tree construction demo

    9

    2-3 tree

    PH

    E R

    L

  • 7/30/2019 Balanced Search Trees Coursenotes

    10/57

    Splitting a 4-node is a local transformation: constant number of operations.

    b c d

    a e

    betweena andb

    lessthan a

    betweenb andc

    betweend ande

    greaterthan e

    betweenc andd

    betweena andb

    lessthan a

    betweenb andc

    betweend ande

    greaterthan e

    betweenc andd

    b d

    a c e

    10

    Local transformations in a 2-3 tree

  • 7/30/2019 Balanced Search Trees Coursenotes

    11/57

    Invariants. Maintains symmetric order and perfect balance.

    Pf. Each transformation maintains symmetric order and perfect balance.

    11

    Global properties in a 2-3 tree

    b

    right

    middle

    left

    right

    left

    b db c d

    a ca

    a b c

    d

    ca

    b d

    a b cca

    root

    parent is a 2-node

    parent is a 3-node

    c e

    b d

    c d e

    a b

    b c d

    a e

    a b d

    a c e

    a b c

    d e

    ca

    b d e

  • 7/30/2019 Balanced Search Trees Coursenotes

    12/57

    12

    2-3 tree: performance

    Perfect balance. Every path from root to null link has same length.

    Tree height.

    Worst case:

    Best case:

  • 7/30/2019 Balanced Search Trees Coursenotes

    13/57

    13

    2-3 tree: performance

    Perfect balance. Every path from root to null link has same length.

    Tree height.

    Worst case: lgN. [all 2-nodes]

    Best case: log3N .631 lgN. [all 3-nodes]

    Between 12 and 20 for a million nodes.

    Between 18 and 30 for a billion nodes.

    Guaranteed logarithmic performance for search and insert.

  • 7/30/2019 Balanced Search Trees Coursenotes

    14/57

    ST implementations: summary

    14

    constants depend upon implementation

  • 7/30/2019 Balanced Search Trees Coursenotes

    15/57

    15

    2-3 tree: implementation?

    Direct implementation is complicated, because:

    Maintaining multiple node types is cumbersome.

    Need multiple compares to move down tree.

    Need to move back up the tree to split 4-nodes.

    Large number of cases for splitting.

    Bottom line. Could do it, but there's a better way.

  • 7/30/2019 Balanced Search Trees Coursenotes

    16/57

    http://algs4.cs.princeton.edu

    2-3 search trees

    red-black BSTs

    B-trees

    3.3 BALANCED SEARCH TREES

  • 7/30/2019 Balanced Search Trees Coursenotes

    17/57

    http://algs4.cs.princeton.edu

    2-3 search trees

    red-black BSTs

    B-trees

    3.3 BALANCED SEARCH TREES

  • 7/30/2019 Balanced Search Trees Coursenotes

    18/57

  • 7/30/2019 Balanced Search Trees Coursenotes

    19/57

    A BST such that:

    No node has two red links connected to it.

    Every path from root to null link has the same number of black links.

    Red links lean left.

    19

    An equivalent definition

    "perfect black balance"

    X

    SH

    P

    J R

    E

    A

    M

    C

    L

    c ree

  • 7/30/2019 Balanced Search Trees Coursenotes

    20/57

    Key property. 11 correspondence between 23 and LLRB.

    20

    Left-leaning red-black BSTs: 1-1 correspondence with 2-3 trees

    X

    SH

    P

    J R

    E

    A

    M

    C

    L

    XSH P

    J RE

    A

    M

    C L

    redblack tree

    horizontal red links

    2-3 tree

    E J

    H L

    M

    R

    P S XA C

  • 7/30/2019 Balanced Search Trees Coursenotes

    21/57

    Search implementation for red-black BSTs

    Observation. Search is the same as for elementary BST (ignore color).

    Remark. Most other ops (e.g., floor, iteration, selection) are also identical.

    21

    but runs faster because of better balance

    X

    SH

    P

    J R

    E

    A

    M

    C

    L

    ree

  • 7/30/2019 Balanced Search Trees Coursenotes

    22/57

    Red-black BST representation

    Each node is pointed to by precisely one link (from its parent)

    can encode color of links in nodes.

    22

    null links are black

    J

    G

    E

    A D

    C

    hh.left.coloris RED

    h.right.coloris BLACK

  • 7/30/2019 Balanced Search Trees Coursenotes

    23/57

    Elementary red-black BST operations

    Left rotation. Orient a (temporarily) right-leaning red link to lean left.

    Invariants. Maintains symmetric order and perfect black balance.

    23

    greater

    than S

    x

    h

    S

    between

    E and S

    less

    than E

    E

    rotate E left

    (before)

  • 7/30/2019 Balanced Search Trees Coursenotes

    24/57

    Elementary red-black BST operations

    Left rotation. Orient a (temporarily) right-leaning red link to lean left.

    Invariants. Maintains symmetric order and perfect black balance.

    24

    greater

    than S

    less

    than E

    x

    h E

    between

    E and S

    S

    rotate E left

    (after)

  • 7/30/2019 Balanced Search Trees Coursenotes

    25/57

    Elementary red-black BST operations

    Right rotation. Orient a left-leaning red link to (temporarily) lean right.

    Invariants. Maintains symmetric order and perfect black balance.

    25

    rotate S right

    (before)

    greater

    than S

    less

    than E

    h

    x E

    between

    E and S

    S

  • 7/30/2019 Balanced Search Trees Coursenotes

    26/57

    Elementary red-black BST operations

    Right rotation. Orient a left-leaning red link to (temporarily) lean right.

    Invariants. Maintains symmetric order and perfect black balance.

    26

    rotate S right

    (after)

    greater

    than S

    h

    x

    S

    between

    E and S

    less

    than E

    E

  • 7/30/2019 Balanced Search Trees Coursenotes

    27/57

    Color flip. Recolor to split a (temporary) 4-node.

    Invariants. Maintains symmetric order and perfect black balance.

    Elementary red-black BST operations

    27

    greater

    than S

    between

    E and S

    between

    A and E

    less

    than A

    Eh

    SA

    flip colors

    (before)

  • 7/30/2019 Balanced Search Trees Coursenotes

    28/57

    Color flip. Recolor to split a (temporary) 4-node.

    Invariants. Maintains symmetric order and perfect black balance.

    Elementary red-black BST operations

    28

    Eh

    SA

    flip colors

    (after)

    greater

    than S

    between

    E and S

    between

    A and E

    less

    than A

  • 7/30/2019 Balanced Search Trees Coursenotes

    29/57

    Basic strategy. Maintain 1-1 correspondence with 2-3 trees by

    applying elementary red-black BST operations.

    Insertion in a LLRB tree: overview

    29

    E

    A

    E

    R

    S

    R

    S

    A

    C

    E

    R

    S

    C

    A

    add newnode here

    right link redso rotate left

    insert C

  • 7/30/2019 Balanced Search Trees Coursenotes

    30/57

    Warmup 1. Insert into a tree with exactly 1 node.

    Insertion in a LLRB tree

    30

    red link tonew node

    containingaconverts 2-node

    to 3-node

    search endsat this null link

    b

    a

    b

    root

    root

    left

    search endsat this null link

    attached new nodewith red link

    rotated leftto make a

    legal 3-node

    a

    b

    a

    a

    b

    root

    root

    right

  • 7/30/2019 Balanced Search Trees Coursenotes

    31/57

    Case 1. Insert into a 2-node at the bottom.

    Do standard BST insert; color new link red.

    If new red link is a right link, rotate left.

    Insertion in a LLRB tree

    31

    E

    A

    E

    R

    S

    R

    S

    A

    C

    E

    R

    S

    C

    A

    add newnode here

    right link redso rotate left

    insert C

  • 7/30/2019 Balanced Search Trees Coursenotes

    32/57

  • 7/30/2019 Balanced Search Trees Coursenotes

    33/57

  • 7/30/2019 Balanced Search Trees Coursenotes

    34/57

  • 7/30/2019 Balanced Search Trees Coursenotes

    35/57

    Red-black BST construction demo

    35

    S

    insert S

  • 7/30/2019 Balanced Search Trees Coursenotes

    36/57

    Red-black BST construction demo

    36

    C

    A

    E

    X

    S

    P

    M

    R

    red-black BST

    L

    H

  • 7/30/2019 Balanced Search Trees Coursenotes

    37/57

    Insertion in a LLRB tree: Java implementation

    Same code for all cases.

    Right child red, left child black: rotate left.Left child, left-left grandchild red:rotate right.

    Both children red: flip colors.

    37

    insert at bottom

    (and color it red)

    split 4-node

    balance 4-node

    lean left

    only a few extra lines of code provides near-perfect balance

    flipcolors

    rightrotate

    leftrotate

    h

    h

    h

  • 7/30/2019 Balanced Search Trees Coursenotes

    38/57

    Insertion in a LLRB tree: visualization

    38

    255 insertions in ascending order

  • 7/30/2019 Balanced Search Trees Coursenotes

    39/57

    39

    Insertion in a LLRB tree: visualization

    255 insertions in descending order

  • 7/30/2019 Balanced Search Trees Coursenotes

    40/57

    40

    Insertion in a LLRB tree: visualization

    255 random insertions

  • 7/30/2019 Balanced Search Trees Coursenotes

    41/57

    41

    Balance in LLRB trees

    Proposition. Height of tree is 2 lgNin the worst case.

    Pf.Every path from root to null link has same number of black links.

    Never two red links in-a-row.

    Property. Height of tree is ~1.00 lgNin typical applications.

  • 7/30/2019 Balanced Search Trees Coursenotes

    42/57

    ST implementations: summary

    42

    * exact value of coefficient unknown but extremely close to 1

  • 7/30/2019 Balanced Search Trees Coursenotes

    43/57

    Xerox PARC innovations. [1970s]

    Alto.GUI.

    Ethernet.

    Smalltalk.

    InterPress.

    Laser printing.

    Bitmapped display.

    WYSIWYG text editor.

    ...

    War story: why red-black?

    43

    A DIClIROlV1ATIC FUAl\lE\V()HK Fo n BALANCED TREES

    Leo J. Guibas.Xerox Palo Alto Research Center,Palo Alto, California, andCarnegie-Afellon University

    andRobert Sedgewick*Program in Computer ScienceBrown UniversityProvidence, R. I.

    ABSTUACT

    I() this paper we present a uniform framework for the implementat ionand s tudy of halanced tree algorithms. \Ve show how to imhcd in thisframework the best known halanced tree tecilIl iques and thell usc theframework to deVl'lop new which per form the update andrebalancing in one pas s, Oil the way down towards a leaf. \Veconclude with a study of performance issues and concurrent updating.

    O. Introduction

    the way down towards a leaf. As we will see, this has a number ofsignificant advantages ovcr the older methods. We shall cxamine anumhcr of variations on a common theme and exhibit fullimplementat ions which are notable for their brcvity. Oneimp1cn1entation is exatnined carefully, and some properties about itsbehavior are proved.]n both sections 1 and 2 par ti cular a ttent ion is paid to practicalimplementation issues, and cOlnplcte impletnentations are given forall of the itnportant algorithms. '1l1is is significant because onemeasure under which balanced tree algorithtns can differ greatly is

    Xerox Alto

  • 7/30/2019 Balanced Search Trees Coursenotes

    44/57

    War story: red-black BSTs

    44

    Telephone company contracted with database provider to build real-time

    database to store customer information.

    Database implementation.

    Red-black BST search and insert; Hibbard deletion.

    Exceeding height limit of 80 triggered error-recovery process.

    Extended telephone service outage.

    Main cause = height bounded exceeded!

    Telephone company sues database provider.

    Legal testimony:

    allows for up to 240 keys

    Hibbard deletion

    was the problem

  • 7/30/2019 Balanced Search Trees Coursenotes

    45/57

    http://algs4.cs.princeton.edu

    2-3 search trees red-black BSTs

    B-trees

    3.3 BALANCED SEARCH TREES

  • 7/30/2019 Balanced Search Trees Coursenotes

    46/57

    http://algs4.cs.princeton.edu

    2-3 search trees red-black BSTs

    B-trees

    3.3 BALANCED SEARCH TREES

  • 7/30/2019 Balanced Search Trees Coursenotes

    47/57

    47

    File system model

    Page. Contiguous block of data (e.g., a file or 4,096-byte chunk).

    Probe. First access to a page (e.g., from disk to memory).

    Property. Time required for a probe is much larger than time to access

    data within a page.

    Cost model. Number of probes.

    Goal. Access data using minimum number of probes.

    slow fast

  • 7/30/2019 Balanced Search Trees Coursenotes

    48/57

    B-tree. Generalize 2-3 trees by allowing up toM- 1 key-link pairs per node.

    At least 2 key-link pairs at root.At leastM/2 key-link pairs in other nodes.

    External nodes contain client keys.

    Internal nodes contain copies of keys to guide search.

    48

    B-trees (Bayer-McCreight, 1972)

    choose M as large as possible sothat M links fit in a page, e.g., M = 1024

    Anatomy of a B-tree set (M = 6)

    2-node

    external3-node external 5-node (full)

    internal 3-node

    external 4-node

    all nodes except the root are 3-, 4- or 5-nodes

    * B C

    sentinel key

    D E F H I J K M N O P Q R T

    * D H

    * K

    K Q U

    U W X Y

    each red key is a copy

    of min key in subtree

    client keys (black)are in external nodes

  • 7/30/2019 Balanced Search Trees Coursenotes

    49/57

    Start at root.

    Find interval for search key and take corresponding link.Search terminates in external node.

    * B C

    searching for E

    D E F H I J K M N O P Q R T

    * D H

    * K

    K Q U

    U W X

    search forE inthis external node

    follow this link becauseE is between * andK

    follow this link becauseE

    is betweenD

    andH

    Searching in a B-tree set (M = 6)

    49

    Searching in a B-tree

  • 7/30/2019 Balanced Search Trees Coursenotes

    50/57

    Search for new key.

    Insert at bottom.Split nodes withMkey-link pairs on the way up the tree.

    50

    Insertion in a B-tree

    * A B C E F H I J K M N O P Q R T

    * C H

    * K

    K Q U

    U W X

    * A B C E F H I J K M N O P Q R T U W X

    * C H K Q U

    * A B C E F H I J K M N O P Q R T U W X

    * H K Q U

    * B C E F H I J K M N O P Q R T U W X

    * H K Q U

    new key (A) causesoverflow and split

    root split causesa new root to be created

    new key (C) causesoverflow and split

    Inserting a new key into a B-tree set

    inserting A

  • 7/30/2019 Balanced Search Trees Coursenotes

    51/57

    Proposition. A search or an insertion in a B-tree of order MwithNkeys

    requires between logM-

    1Nand logM/2Nprobes.

    Pf. All internal nodes (besides root) have between M/ 2 andM- 1 links.

    In practice. Number of probes is at most 4.

    Optimization. Always keep root page in memory.

    51

    Balance in B-tree

    M = 1024; N = 62 billionlog M/2 N 4

  • 7/30/2019 Balanced Search Trees Coursenotes

    52/57

    52

    Building a large B tree

    full page splits intotwo half -full pages

    then a new key is addedto one of them

    full page, about to split

    white: unoccupied portion of page

    black: occupied portion of page

    each line shows the resultof inserting one key

    in some page

  • 7/30/2019 Balanced Search Trees Coursenotes

    53/57

    53

    Balanced trees in the wild

    Red-black trees are widely used as system symbol tables.

    Java: java.util.TreeMap, java.util.TreeSet.C++ STL: map, multimap, multiset.

    Linux kernel: completely fair scheduler, linux/rbtree.h.

    B-tree variants. B+ tree, B*tree, B# tree,

    B-trees (and variants) are widely used for file systems and databases.

    Windows: HPFS.

    Mac: HFS, HFS+.

    Linux: ReiserFS, XFS, Ext3FS, JFS.

    Databases: ORACLE, DB2, INGRES, SQL, PostgreSQL.

  • 7/30/2019 Balanced Search Trees Coursenotes

    54/57

    54

    Red-black BSTs in the wild

    Common sense. Sixth sense.Together they're the

    FBI's newest team.

  • 7/30/2019 Balanced Search Trees Coursenotes

    55/57

    Red-black BSTs in the wild

    55

  • 7/30/2019 Balanced Search Trees Coursenotes

    56/57

    http://algs4.cs.princeton.edu

    2-3 search trees red-black BSTs

    B-trees

    3.3 BALANCED SEARCH TREES

    Algorithms S |

  • 7/30/2019 Balanced Search Trees Coursenotes

    57/57

    http://algs4.cs.princeton.edu

    Algorithms ROBERT SEDGEWICK | KEVIN WAYNE

    3.3 BALANCED SEARCH TREES

    2-3 search trees red-black BSTs

    B-trees