dictiona - itu.dk€¦ · smaller and f aster: succinct data structures session 1: intro, bit v...

71

Upload: others

Post on 21-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Smaller and Faster: Su in t Data Stru turesSession 1: Intro, Bit Ve torsJérémy Barbay(some �gures by Gonzalo Navarro)1/ 24

Page 2: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

OutlineThe CourseSu in t Data Stru turesWhy?How?What?Bit Ve tors and Di tionariesDi tionaries in a pre-grad ourse.Whi h Di tionary DS do you know?Di tionary as a Bit Ve torHomework 3/ 24

Page 3: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Stru ture of the Course1. Basi Con epts; Bit Ve tors and Stati Di tionaries;2. Trees; Graphs;3. Permutations; Fun tions;4. Strings; Binary Relations;5. Labeled Obje ts; Overview;Talk From Sorting to Compression

4/ 24

Page 4: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

ForewordWarning◮ Not a spe ialist.◮ Not te hni ally in lined.◮ Five Sessions is very short.

5/ 24

Page 5: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

ForewordWarning◮ Not a spe ialist.◮ Not te hni ally in lined.◮ Five Sessions is very short.Why do I give those le tures?◮ Tea hing = learning better.◮ Did not �nd a good (English) tutorial.◮ Su in t Data Stru tures get to be pra ti al.

5/ 24

Page 6: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

What you should get from it◮ List of useful referen es;◮ Sele tion of te hni al examples;◮ Some general ulture about te hniques to apply to yourproblem.

6/ 24

Page 7: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

OutlineThe CourseSu in t Data Stru turesWhy?How?What?Bit Ve tors and Di tionariesDi tionaries in a pre-grad ourse.Whi h Di tionary DS do you know?Di tionary as a Bit Ve torHomework 7/ 24

Page 8: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesWhy?Se ondary Memory arguments◮ Ma hines run faster,◮ Memory gets larger,◮ A ess gets slower.

8/ 24

Page 9: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesWhy?(Old) Se ondary Memory arguments◮ Ma hines run faster,◮ Memory gets larger,◮ A ess gets slower.More spe i� ally◮ Lower bounds (e.g. Beame and Fi h (1999))◮ Provably optimal (e.g. Golynski (2007))

8/ 24

Page 10: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

The Memory Hierar hyCurrent Situation◮ Rough numbers:

◮ A few CPU registers, less than 1 nanose ond.◮ A few KBs of a he L1, a few 10 nanose unds.◮ A few MBs of a he L2, a few 30 nanose unds.◮ A few GBs of RAM, a few 60 nanose unds.◮ A few TBs of dis o, a few 10 milise unds of laten y, more afew 500 nanose unds per word transfered.

◮ The memory hierar hy is more and more relevant!�The di�eren e in time between a essing RAM and a essinga hard drive is omparable to the di�eren e between a essinga book on the shelves on your o� e and getting to the otherend of the globe and oming ba k.� 9/ 24

Page 11: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesHow?◮ An obje t hosen among N is optimally en oded by ⌈lgN⌉bits

10/ 24

Page 12: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesHow?◮ An obje t hosen among N is optimally en oded by ⌈lgN⌉bits, but rarely usable as su h (e.g. trees).

10/ 24

Page 13: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesHow?◮ An obje t hosen among N is optimally en oded by ⌈lgN⌉bits, but rarely usable as su h (e.g. trees).◮ We explore the relation between:

10/ 24

Page 14: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesHow?◮ An obje t hosen among N is optimally en oded by ⌈lgN⌉bits, but rarely usable as su h (e.g. trees).◮ We explore the relation between:

◮ Spa e to ode;10/ 24

Page 15: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesHow?◮ An obje t hosen among N is optimally en oded by ⌈lgN⌉bits, but rarely usable as su h (e.g. trees).◮ We explore the relation between:

◮ Spa e to ode;◮ Spa e to represent;

10/ 24

Page 16: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesHow?◮ An obje t hosen among N is optimally en oded by ⌈lgN⌉bits, but rarely usable as su h (e.g. trees).◮ We explore the relation between:

◮ Spa e to ode;◮ Spa e to represent;◮ Time to pre ompute the representation;

10/ 24

Page 17: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesHow?◮ An obje t hosen among N is optimally en oded by ⌈lgN⌉bits, but rarely usable as su h (e.g. trees).◮ We explore the relation between:

◮ Spa e to ode;◮ Spa e to represent;◮ Time to pre ompute the representation;◮ Time to navigate and sear h.

10/ 24

Page 18: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesHow?◮ An obje t hosen among N is optimally en oded by ⌈lgN⌉bits, but rarely usable as su h (e.g. trees).◮ We explore the relation between:

◮ Spa e to ode;◮ Spa e to represent;◮ Time to pre ompute the representation;◮ Time to navigate and sear h.

◮ What is the smallest amount of spa e and timeneeded to pre ompute a representation allowing tonavigate and sear h su h an obje t?10/ 24

Page 19: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesExample: ordinal trees◮ 2n − Θ(lg n) bits to ode an ordinal tree;

11/ 24

Page 20: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesExample: ordinal trees◮ 2n − Θ(lg n) bits to ode an ordinal tree;◮ Using pointers

◮ 4n lg n ≈ 128× n bits represents◮ and navigates it in onstant time,◮ with O(n) prepro essing time.

11/ 24

Page 21: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesExample: ordinal trees◮ 2n − Θ(lg n) bits to ode an ordinal tree;◮ Using pointers

◮ 4n lg n ≈ 128× n bits represents◮ and navigates it in onstant time,◮ with O(n) prepro essing time.

◮ Using te hniques from [Sadakane (2008)℄◮ 2n + n/ lg n ∈ 2n + o(n) bits (< 3n in pra ti e) represents◮ and navigates and sear hes it in onstant time,◮ with O(n) prepro essing time.

11/ 24

Page 22: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesExample: ordinal trees◮ 2n − Θ(lg n) bits to ode an ordinal tree;◮ Using pointers

◮ 4n lg n ≈ 128× n bits represents◮ and navigates it in onstant time,◮ with O(n) prepro essing time.

◮ Using te hniques from [Sadakane (2008)℄◮ 2n + n/ lg n ∈ 2n + o(n) bits (< 3n in pra ti e) represents◮ and navigates and sear hes it in onstant time,◮ with O(n) prepro essing time.(All results in a θ(lg n) bits ar hite ture.) 11/ 24

Page 23: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesWhat?◮ An obje t hosen among N is optimally oded by ⌈lgN⌉ bits,but rarely usable as su h (e.g. trees).

12/ 24

Page 24: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesWhat?◮ An obje t hosen among N is optimally oded by ⌈lgN⌉ bits,but rarely usable as su h (e.g. trees).

12/ 24

Page 25: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesWhat?◮ An obje t hosen among N is optimally oded by ⌈lgN⌉ bits,but rarely usable as su h (e.g. trees).◮ What is the smallest amount of spa e and time needed torepresent, navigate and sear h su h an obje t?

12/ 24

Page 26: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesWhat?◮ An obje t hosen among N is optimally oded by ⌈lgN⌉ bits,but rarely usable as su h (e.g. trees).◮ What is the smallest amount of spa e and time needed torepresent, navigate and sear h su h an obje t?◮ Several type of answers:

12/ 24

Page 27: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesWhat?◮ An obje t hosen among N is optimally oded by ⌈lgN⌉ bits,but rarely usable as su h (e.g. trees).◮ What is the smallest amount of spa e and time needed torepresent, navigate and sear h su h an obje t?◮ Several type of answers:

◮ A Su in t En oding uses ⌈lgN⌉ + o(lgN) bits for everything;12/ 24

Page 28: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesWhat?◮ An obje t hosen among N is optimally oded by ⌈lgN⌉ bits,but rarely usable as su h (e.g. trees).◮ What is the smallest amount of spa e and time needed torepresent, navigate and sear h su h an obje t?◮ Several type of answers:

◮ A Su in t En oding uses ⌈lgN⌉ + o(lgN) bits for everything;◮ A Su in t Index uses o(lgN) bits and a ess to theunmodi�ed data;

12/ 24

Page 29: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesWhat?◮ An obje t hosen among N is optimally oded by ⌈lgN⌉ bits,but rarely usable as su h (e.g. trees).◮ What is the smallest amount of spa e and time needed torepresent, navigate and sear h su h an obje t?◮ Several type of answers:

◮ A Su in t En oding uses ⌈lgN⌉ + o(lgN) bits for everything;◮ A Su in t Index uses o(lgN) bits and a ess to theunmodi�ed data;◮ An Ultra-Su in t En oding uses H(I ) + o(lgN) bits to ompress the data;

12/ 24

Page 30: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesWhat?◮ An obje t hosen among N is optimally oded by ⌈lgN⌉ bits,but rarely usable as su h (e.g. trees).◮ What is the smallest amount of spa e and time needed torepresent, navigate and sear h su h an obje t?◮ Several type of answers:

◮ A Su in t En oding uses ⌈lgN⌉ + o(lgN) bits for everything;◮ A Su in t Index uses o(lgN) bits and a ess to theunmodi�ed data;◮ An Ultra-Su in t En oding uses H(I ) + o(lgN) bits to ompress the data;◮ A Self-Index uses O(H(I )) bits to ompress and index the data;

12/ 24

Page 31: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesWhat?◮ An obje t hosen among N is optimally oded by ⌈lgN⌉ bits,but rarely usable as su h (e.g. trees).◮ What is the smallest amount of spa e and time needed torepresent, navigate and sear h su h an obje t?◮ Several type of answers:

◮ A Su in t En oding uses ⌈lgN⌉ + o(lgN) bits for everything;◮ A Su in t Index uses o(lgN) bits and a ess to theunmodi�ed data;◮ An Ultra-Su in t En oding uses H(I ) + o(lgN) bits to ompress the data;◮ A Self-Index uses O(H(I )) bits to ompress and index the data;

12/ 24

Page 32: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Su in t Data Stru turesWhat?◮ An obje t hosen among N is optimally oded by ⌈lgN⌉ bits,but rarely usable as su h (e.g. trees).◮ What is the smallest amount of spa e and time needed torepresent, navigate and sear h su h an obje t?◮ Several type of answers:

◮ A Su in t En oding uses ⌈lgN⌉ + o(lgN) bits for everything;◮ A Su in t Index uses o(lgN) bits and a ess to theunmodi�ed data;◮ An Ultra-Su in t En oding uses H(I ) + o(lgN) bits to ompress the data;◮ A Self-Index uses O(H(I )) bits to ompress and index the data;Note that:

◮ Su in t En oding ⇐ Su in t Index ⇐ Compressed Index;◮ Su in t En oding ⇐ Compressed Self Index. 12/ 24

Page 33: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

OutlineThe CourseSu in t Data Stru turesWhy?How?What?Bit Ve tors and Di tionariesDi tionaries in a pre-grad ourse.Whi h Di tionary DS do you know?Di tionary as a Bit Ve torHomework 13/ 24

Page 34: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Di tionary ADT (from CS240, Waterloo)◮ Container of key-element pairs◮ Required operations:

◮ insert( k,e ),◮ remove( k ),◮ find( k ),◮ isEmpty().

14/ 24

Page 35: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Di tionary ADT (from CS240, Waterloo)◮ Container of key-element pairs◮ Required operations:

◮ insert( k,e ),◮ remove( k ),◮ find( k ),◮ isEmpty().

◮ When an order is provided:◮ losestKeyBefore( k ),◮ losestElemAfter( k ),◮ rank(k): nb. of values smaller than k in the set,◮ sele t(r): r -th smallest number in the set.Note: No dupli ate keys 14/ 24

Page 36: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Spe i� Di tionary ADTs and their DS

15/ 24

Page 37: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Spe i� Di tionary ADTs and their DSUnordered ◮ Array

15/ 24

Page 38: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Spe i� Di tionary ADTs and their DSUnordered ◮ Array◮ Sequen e

15/ 24

Page 39: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Spe i� Di tionary ADTs and their DSUnordered ◮ Array◮ Sequen eOrdered ◮ Array

15/ 24

Page 40: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Spe i� Di tionary ADTs and their DSUnordered ◮ Array◮ Sequen eOrdered ◮ Array◮ Sequen e (Skip Lists)

15/ 24

Page 41: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Spe i� Di tionary ADTs and their DSUnordered ◮ Array◮ Sequen eOrdered ◮ Array◮ Sequen e (Skip Lists)◮ Binary Sear h Tree (BST)

15/ 24

Page 42: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Spe i� Di tionary ADTs and their DSUnordered ◮ Array◮ Sequen eOrdered ◮ Array◮ Sequen e (Skip Lists)◮ Binary Sear h Tree (BST)◮ AVL

15/ 24

Page 43: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Spe i� Di tionary ADTs and their DSUnordered ◮ Array◮ Sequen eOrdered ◮ Array◮ Sequen e (Skip Lists)◮ Binary Sear h Tree (BST)◮ AVL◮ (2, 4) Trees

15/ 24

Page 44: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Spe i� Di tionary ADTs and their DSUnordered ◮ Array◮ Sequen eOrdered ◮ Array◮ Sequen e (Skip Lists)◮ Binary Sear h Tree (BST)◮ AVL◮ (2, 4) Trees◮ B-Trees

15/ 24

Page 45: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Spe i� Di tionary ADTs and their DSUnordered ◮ Array◮ Sequen eOrdered ◮ Array◮ Sequen e (Skip Lists)◮ Binary Sear h Tree (BST)◮ AVL◮ (2, 4) Trees◮ B-TreesValued ◮ Hash Tables

15/ 24

Page 46: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Spe i� Di tionary ADTs and their DSUnordered ◮ Array◮ Sequen eOrdered ◮ Array◮ Sequen e (Skip Lists)◮ Binary Sear h Tree (BST)◮ AVL◮ (2, 4) Trees◮ B-TreesValued ◮ Hash Tables◮ Extendible Hashing

15/ 24

Page 47: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Notes on Computation Model◮ Di tionaries and Sets are implemented (almost) identi ally.◮ Fo us on stati data stru tures and operators

◮ losestKeyBefore( k ), losestElemAfter( k )◮ rank(k), sele t(r).

◮ Advan ed implementations address the other operations.◮ We will onsider the Θ(lg n)-RAM model, where operations on

Θ(lg n) bits are supported in onstant time.16/ 24

Page 48: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Di tionary as Bit Ve torString Su in t En odings support◮ bin_rank(α, x): nb. of α-o urren es before pos. x ;◮ bin_sele t(α, r): position of r -th α-o urren e.Example: 0 0 0 1 0 0 0 1 0 0◮ bin_rank(1, 6) = ?

◮ bin_sele t(1, 2) = ?

17/ 24

Page 49: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Di tionary as Bit Ve torString Su in t En odings support◮ bin_rank(α, x): nb. of α-o urren es before pos. x ;◮ bin_sele t(α, r): position of r -th α-o urren e.Example: 0 0 0 1 0 0 0 1 0 0◮ bin_rank(1, 6) = 1◮ bin_sele t(1, 2) = ?

17/ 24

Page 50: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Di tionary as Bit Ve torString Su in t En odings support◮ bin_rank(α, x): nb. of α-o urren es before pos. x ;◮ bin_sele t(α, r): position of r -th α-o urren e.Example: 0 0 0 1 0 0 0 1 0 0◮ bin_rank(1, 6) = 1◮ bin_sele t(1, 2) = 8

17/ 24

Page 51: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

General Te hniqueSummaries of Summaries◮ Pre ompute (and ode) the fun tion for a small sele tion ofvalues;◮ For some in-between values, ode the di�eren e;◮ For the smallest sequen es of left in-between values, use agreedy approa h.Bender and Fara h-Colton (2004) gave a veryni e writing of the te hnique. Even though theywere not aiming at su in t data stru ture butrather small pre omputing time, they a hievethe same goal, and explain it in a ni e in re-mental way. 18/ 24

Page 52: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Spe i� Te hniqueBasi s◮ Binary Rank through re ursive de�nition of the bit ve tor insuperblo s and blo s;◮ Binary Sele t through re ursive de�nition of the parameterspa e in sparse and dense blo s.Further Improvements◮ Raman et al. (2002) showed how to ompress to the binary entropy.◮ Golynski (2006) redu ed the spa erequired through the ommon use of ounting index. 19/ 24

Page 53: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary Entropyif there are n0 zeroes and n1 ones in a bit ve tor B(n0 + n1 = n = |B |)

H0(B) =n0n lg nn0 +

n1n lg nn1=

1n lg( nn0) + O ( lg nn )

20/ 24

Page 54: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary RankTrivial Solution: Constant Time◮ Divide B in blo s of b = (log n)/2 bits.◮ Pre ompute in an array R [1, n/b] the values of bin_rank atthe begining of ea h blo ◮ Pre ompute an array T [0, 2b − 1][0, b − 1], su h thatT [x , r ] = total number of 1's in x [1, r ]where x is a blo of b bits.◮ Compute bin_rank(B , i) as follows:

◮ De ompose i = q · b + r , 0 ≤ r < b;◮ The number of 1's till position qb is R[q];◮ The number of 1's in B[qb + 1, qb + r ] isT [B[qb + 1, qb + b], r ];◮ The binary rank is obtained in onstant time throughR[q] + T [B[qb + 1..qb + b], r ] 21/ 24

Page 55: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary RankTrivial Solution: Linear Spa e◮ The array R usesnb log n =

n(log n)/2 · log n = 2n bits

◮ The array T , indexed on [0, b − 1], uses2b · b · log b = 2 log n2 · log n2 · (log log n − 1)≤ 12√n log n log log n = o(n) bits.For instan e, for n = 232, T requires only 512 KB.

◮ This solution takes 3n + o(n) bits. 22/ 24

Page 56: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary RankTrivial Solution

23/ 24

Page 57: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary RankSu in t Solution◮ Divide B in superblo s ofs =

(log n)22 = b log n bits.◮ Pre ompute in an array S [1, n/s] the values of bin_rank atthe begining of ea h superblo , relatively to the value ofbin_rank at the begining of ea h blo :R [q] = bin_rank(B , qb) − S [⌊q/ log n⌋]

= bin_rank(B , qb) − bin_rank(B , ⌊q/ log n⌋ · log n)24/ 24

Page 58: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary RankSu in t Solution: sublinear indexing spa e◮ Ea h value of bin_rank in S requires log n bits, so that Suses in totalns · log n =

n(log n)2/2 · log n =

2nlog n = o(n) bits◮ Ea h value of bin_rank in R is less than s = O(log n)2 andrequires 2 log log n bits, so that R uses in totalnb · 2 log log n =

n(log n)/2 · 2 log log n

=4n · log log nlog n = o(n) bits. 25/ 24

Page 59: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary RankSu in t Solution: onstant time◮ How to ompute bin_rank(B , i) ?

◮ De ompose i = q · b + r , 0 ≤ r < b.◮ De ompose i = q′ · s + r ′, 0 ≤ r ′ < s.◮ The sum of three tabulated value yields the result:bin_rank(B , i) = S [q′] + R[q] + T [B[qb + 1..qb + b], r ]

26/ 24

Page 60: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary RankSu in t Solution

27/ 24

Page 61: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary Sele tSele t in onstant time

◮ Based on sampling on arguments , not on B .◮ More omplex te hnique from bin_rank, based on thepartitioning of blo s into dense and sparse blo s.◮ Solution in o(n) = O(n/ lg lg n).

28/ 24

Page 62: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary Sele t◮ Divide the arguments of bin_sele t in superblo s ofs = lg2 n arguments.◮ Ea h blo orresponds to s ones in an interval of distin t sizein B : a blo is sparse if spanning at leasts lg n lg lg n = lg3 n lg lg n positions; and dense otherwise.◮ Note that all sparse blo s require in total at mostO(n/ lg lg n). bits.

29/ 24

Page 63: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary Sele t◮ Pre ompute a bit ve tor E [1, n/s] indi ating whi h superblo sare sparse.◮ Join the answers to bin_sele t in the sparse superblo s inan array of arrays R :bin_sele t(B , i) = R [bin_rank(E , ⌊i/s⌋)] [1 + (i mod s)]for ea h position i su h that E [⌊i/s⌋] = 1.◮ Add the starting position in B of ea h dense superblo in anarray P [1, n/s].◮ Ea h superblo is spanning at most s lg n lg lg n = lg3 n lg lg npositions in B . 30/ 24

Page 64: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary Sele t◮ Divide ea h dense superblo in blo s of b = (lg lg n)2arguments.◮ Apply the same te hnique re ursively inside ea h densesuperblo .◮ Require lg(lg3 n lg lg n) ≤ 4 lg lg n bits to ode ea h positioninside a dense superblo .◮ A blo is sparse if spanning at least 4b(lg lg n)2 positions in B ,and dense otherwise.◮ Combine all the (relative) answers of the sparse blo s inside thedense superblo s in an array, using in total O(n/ lg lg n) bits.

31/ 24

Page 65: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary Sele t◮ Indi ate whi h blo s are sparse in an array E ′[1, s/b].◮ Store the starting position of ea h dense superblo in an arrayP ′[1, s/b] using O(n/ log log n) bits.◮ Ea h dense blo spans at most 4(log log n)4 = o(log n)positions, so that they are supported in onstant time throughtable look-up.

32/ 24

Page 66: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary Sele t◮ How to ompute bin_sele t(B , i)?

◮ De ompose i = q · s + r .◮ If E [q] = 1, i falls in a sparse superblo , return R[...].◮ Otherwise, i falls in a dense superblo starting in P[q].◮ De ompose r = q′ · b + r ′.◮ If E ′q[q′] = 1, i falls in a sparse blo , return P[q] + R ′[...].◮ Otherwise, i falls in a dense blo starting in p = P[q] + P ′[q′].◮ Sear h for the r ′-th one in B[p...p + 4(log log n)4] throughtable look-up.

33/ 24

Page 67: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Binary Sele t

34/ 24

Page 68: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

OutlineThe CourseSu in t Data Stru turesWhy?How?What?Bit Ve tors and Di tionariesDi tionaries in a pre-grad ourse.Whi h Di tionary DS do you know?Di tionary as a Bit Ve torHomework 36/ 24

Page 69: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

HomeworkBalan ed Parenthesis: (()())◮ How does this relate to bit ve tors?◮ What kind of operators would you support on it?◮ How do you support them?Ordinal trees◮ How does this relate to BP?◮ What kind of operators would you support on it?◮ How do you support them?

37/ 24

Page 70: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

Stru ture of the Course1. Basi Con epts; Bit Ve tors and Stati Di tionaries;2. Trees; Graphs;3. Permutations; Fun tions;4. Strings; Binary Relations;5. Labeled Obje ts; Overview;Talk From Sorting to Compression

38/ 24

Page 71: Dictiona - itu.dk€¦ · Smaller and F aster: Succinct Data Structures Session 1: Intro, Bit V ecto rs Jérémy Ba rba y (some gures b y Gonzalo Nava rro) 1/ 24

BibliographyBeame, P. and Fi h, F. E. (1999). Optimal bounds for theprede essor problem. In Pro eedings of the 31st annual ACMsymposium on Theory of omputing, pages 295�304.Bender, M. A. and Fara h-Colton, M. (2004). The level an estorproblem simpli�ed. Theor. Comput. S i., 321(1):5�12.Golynski, A. (2006). Optimal lower bounds for rank and sele tindexes. In Pro eedings of the International Colloquium onAutomata, Languages and Programming (ICALP).Golynski, A. (2007). Upper and Lower Bounds for Text IndexingData Stru tures. PhD thesis, University of Waterloo.Raman, R., Raman, V., and Rao, S. S. (2002). Su in t indexabledi tionaries with appli ations to en oding k-ary trees andmultisets. In Pro eedings of the 13th Annual ACM-SIAMSymposium on Dis rete algorithms, pages 233�242. 39/ 24