1 algorithmic aspects of searching in the past thomas ottmann institut für informatik, universität...
Post on 20-Dec-2015
218 views
TRANSCRIPT
1
Algorithmic Aspects of Searching in the Past
Thomas OttmannInstitut für Informatik, Universität Freiburg, Germany
(Lecture 13: Persistence and Oblivious Data Structures)Advanced Algorithms & Data Structures
2
Overview
• Motivation: Oblivious and persistent structures
• Examples: Arrays, search trees, Z-stratified search trees, relaxation
• Making structures persistent: Structure-copying, path-copying-, DSST-method
• Application: Pointlocation
• Application: Time-evolving data: Capture and replay of whiteboard data,
in particular handwriting traces
• Oblivious structures: Randomized and uniquely represented structures, c-level
jump lists
3
Motivation
A structure storing a set of keys is called oblivious, if it is not possible to infer its
generation history from its current shape.
A structure is called persistent, if it supports access to multiple versions.
Partially persistent: All versions can be accessed but only the newest version can be
modified.
Fully persistent: All versions can be accessed and modified.
Confluently persistent: Two or more old versions can be combined into one new
version.
4
Example: Arrays
Array:
2 4 8 15 17 43 47 ……
Uniquely represented structure, hence, oblivious!
Access: In time O(log n) by binary search.
Update (Insertion, Deletion): (n)
Caution:Storage structure may still depend on generation history!
5
Example: Natural search trees
Only partially oblivious!
• Insertion history can sometimes be reconstructed.
• Deleted keys are not visible.
Access, insertion, deletion of keys may take time (n)
1, 3, 5, 7 5, 1, 3, 7
13
57 3
1
5
7
6
Example: Balanced search tree
Problem:Updates come in sudden bursts (Example: Recording ink-traces from pen input)Not enough time to serialize insertions and rebalancing transformationsSolution:Relaxed balancing: Carry out updates and rebalancing transformations concurrently!
10
6 15
2 9 11 23
5 7 20 30
6 7 9
10 11 15
5
2
20 23
7
Stratified search trees
....
…..
… …
… …
8
Example
9
Example
10
Insertion
Insert the new key among the leaves at the expected positionand deposit a „push-up-request“
… …
… … ....
…..
…..
x p
11
Iterative sequence of insertions
12
Handling of push-up-requests (1)
• A push-up-request either leads to a local structural change and halt, which can be
carried out in time O(1) (Case 1)
• or (exclusively) to a recursive shift of the push-up-requests to the next higher
stratum without any structural change (Case 2)
Case 1 [There is still room on the next higher stratum]
1 2 31 2
3
1 2
3 4 1 2 3 4
1
2 3
4 2 31 4
13
Handling of push-up-requests (2)
Case 2 [Next higher stratum is full]
Append a new apex, if node is pushed over topmost stratum boarder
1
2 3
4 5 1
2 3
4 5
14
Deletion
Locate x among the leaves.Deposit a removal request at x.Handle removal request.
… …
… … ....
…..
… …
15
Handling removal requests
Case 1 [Enough nodes at bottommost stratum]
Case 2 [Bottommost stratum too sparse]
Deposit „pull-down-request“ p q q
16
Handling of pull-down-requests (1)
1p 2 3 1p 2 3
1p 2 3 4 p 1 2 3 41 p2 3 4
1 2 3 4p
Case1 [There are enough nodes on next higher stratum]
Finite structural change andHalt!
17
Handling of pull-down-requests (2)
p
p
Case 2 [Not enough nodes on next higher stratum]
Recursively shift pull-down-request to next higher stratum,but no structural change!
18
Z-stratified search trees: Observations
Insertions, deletions, and rebalancing-transformations (removal of , ) can be
arbitrarily interleaved.
The amortized restructuring costs per insertion or deletion are constant.
The generation history of a current version may be partially reconstructed (Sequence
of insertions and deletions are partially visible)
But:
• Update operations are always applied to the current version
• Z-stratified search trees are not persistent
19
Overview
• Motivation: Oblivious and persistent structures
• Examples: Arrays, search trees, Z-stratified search trees, relaxation
• Making structures persistent: Structure-copying, path-copying-, DSST-method
• Application: Pointlocation
• Application: Time-evolving data: Capture and replay of whiteboard data,
in particular handwriting traces
• Oblivious structures: Randomized and uniquely represented structures, c-level
jump lists
20
Simple methods for making structures persistent
• Copy structure and apply an update-operation to the copy, yields fully persistence
at the price of (n) time per update and space (m n) for m updates applied to
structures of size n. (Structure-copying method)
• Do nothing, but store a log-file of all updates! In order to access version i, first carry
out i updates, starting with the initial structure, and generate version i. (i) time per
access, (m) space for m operations.
• Hybrid-method: Store the complete sequence of updates and additionally each k-th
version for a suitably chosen k. Result: Time and space requirement increases at
least with a faktor sqr(m) !
Are there any better methods? …. for search trees….
21
Persistent search trees (1)
Path-copying method
5
1 7
3
0
version 0:
22
Persistent search trees (1)
Path-copying method
5 5
1 1 7
3 3
2
0 1
version 1:Insert (2)
23
Persistent search trees (1)
Path-copying method
5 5 5
1 1 1 7
3 3 3
2 4
0 1 2
version 1:Insert (2)version 2:Insert (4)
24
Persistent search trees (1)
Path-copying method
Restructuring costs: O(log n) per update operation
5 5 5
1 1 1 7
3 3 3
2 4
0 1 2
version 1:Insert (2)version 2:Insert (4)
25
Persistent search trees (2)
DSST-method: Extend each node by a time-stamped modification box
? All versionsbefore time t
All versionsafter time t
Modification boxes• initially empty• are filled bottom up
k
t: rplp rp
26
DSST method
5
1
3
7
version 0
27
DSST method
5
1
3
2
7
1 lp
version 0:
28
DSST method
5
1
3
2
3
4
7
1 lp
version 1:Insert (2)version 2:Insert (4)
29
DSST method
The amortized costs (time and space) per update operation are O(1)
5
1
3
2
3
4
72 rp
1 lp
version 1:Insert (2)version 2:Insert (4)
30
Overview
• Motivation: Oblivious and persistent structures
• Examples: Arrays, search trees, Z-stratified search trees, relaxation
• Making structures persistent: Structure-copying, path-copying-, DSST-method
• Application: Pointlocation
• Application: Time-evolving data: Capture and replay of whiteboard data,
in particular handwriting traces
• Oblivious structures: Randomized and uniquely represented structures, c-level
jump lists
31
Application: Planar Pointlocation
Suppose that the Euclidian plane is subdivided into polygons by n line segments that intersect only at their endpoints.
Given such a polygonal subdivision and an on-line sequence of query points in the plane, the planar point location problem, is to determine for each query point the polygon containing it.
Measure an algorithm by three parameters:
1) The preprocessing time.
2) The space required for the data structure.
3) The time per query.
32
Planar point location -- example
33
Planar point location -- example
34
Solving planar point location (Cont.)
Partition the plane into vertical slabs by drawing a vertical line through each endpoint.
Within each slab the lines are totally ordered.
Allocate a search tree per slab containing the lines at the leaves with each line associate the polygon above it.
Allocate another search tree on the x-coordinates of the vertical lines
35
Solving planar point location (Cont.)
To answer query
first find the appropriate slab
then search the slab to find the polygon
36
Planar point location -- example
37
Planar point location -- analysis
Query time is O(log n)
How about the space ?
(n2)
And so could be the preprocessing time
38
Planar point location -- bad example
Total # lines O(n), and number of lines in each slab is O(n).
39
Planar point location & persistence
So how do we improve the space bound ?
Key observation: The lists of the lines in adjacent slabs are very similar.
Create the search tree for the first slab.
Then obtain the next one by deleting the lines that end at the corresponding vertex and adding the lines that start at that vertex
How many insertions/deletions are there alltogether ?
2n
40
Planar point location & persistence (cont)
Updates should be persistent since we need all search trees at the end.
Partial persistence is enough.
Well, we already have the path copying method, lets use it.What do we get ?
O(n logn) space and O(n log n) preprocessing time.
We can improve the space bound to O(n) by using the DSST method.
41
Overview
• Motivation: Oblivious and persistent structures
• Examples: Arrays, search trees, Z-stratified search trees, relaxation
• Making structures persistent: Structure-copying, path-copying-, DSST-method
• Application: Pointlocation
• Application: Time-evolving data: Capture and replay of whiteboard data,
in particular handwriting traces
• Oblivious structures: Randomized and uniquely represented structures, c-level
jump lists
42
Author Audience
Data sources
Lightweight
content creation
Recorded learning module Document
Input media• Whiteboard• TouchScreen• Tablet PC
Time evolving data: Presentation recording
43
Cintiq Tablet (Wacom)
• Pen input, large display
• Eye contact with audience
44
Random access facility
Access of an ink-object sj corresponding to time tj requires the immediate presentation
of sj and of all ink-objects since t0
45
Whiteboard data
Whiteboard data-stream requires
• Fast insertion and deletion of graphical objects (lines, circles, pen-traces, …) in
large quantities,
• Partially persistent storage which allows:
• Fast access (display and „rendering“) of all data for a given time stamp,
• Synchronisability (as slave) with audio-stream (master).
Problem: Find a suitable method for storing the whiteboard-action stream!
46
Postprocessing
Whiteboard-stream is made persistent by the structure-copying method:
For each time stamp t a complete list of all objects visible on the board at time t is (pre-)computed and stored for random access.
Disadvantage: Highly redundant, very large data-volume
Advantage: Visible scrolling
Storage and representation of freehand ink-traces: Find a suitable compromise between conflicting goals:
Data-volume
Access cost (time) and dynamic replay (visible scrolling)
Individual, personal style
Skalability (vector- vs. raster-based-representation)
47
Overview
• Motivation: Oblivious and persistent structures
• Examples: Arrays, search trees, Z-stratified search trees, relaxation
• Making structures persistent: Structure-copying, path-copying-, DSST-method
• Application: Pointlocation
• Application: Time-evolving data: Capture and replay of whiteboard data,
in particular handwriting traces
• Oblivious structures: Randomized and uniquely represented structures, c-level
jump lists
48
Methods for making structures oblivious
Unique representation of the structure:
• Set/size uniqueness: For each set of n keys there is exactly one structure which
can store such a set.
• The storage is order unique, i.e. the nodes of the strucure are ordered and the
keys are stored in ascending order in nodes with ascending numbers.
Randomise the structure:
Assure that the expectation for the occurrence of a structure storing a set M of
keys is independent of the way how M was generated.
Observation: The address-assingment of pointers has to be subject under a
randomised regime!
49
Example of a randomised structure
Z-stratified search tree
On each stratum, randomlychoose the distribution oftrees from Z.
Insertion?Deletion?
… …
… … ....
…..
…..
50
Uniquely represented structures
(a) Generation history determines structure
(b) Set-uniqueness:Set determines structure
1, 3, 5, 7 5, 1, 3, 7
1, 3, 5, 7
13
57
3
1
5
7
13
57
51
Uniquely represented structures
(c) Size-uniqueness:Size determines structure
1, 3, 5, 7
2, 4, 5, 8 Common structure
Order-uniqueness: Fixed ordering of nodes determines where the keys are to be stored.
1
3
2
4
2
4
5
8
1
3
5
7
52
Set- and order-unique structures
Lower bounds?
Assumptions: A dictionary of size n is represented by a graph of n nodes.
Node degree finite (fixed),
Fixed order of the nodes,
i-th node stores i-largest key.
Operations allowed to change a graph:
Creation | Removal of a node
Pointer change
Exchange of keys
Theorem: For each set- and order-unique representation of a dictionary with n keys, at
least one of the operations access, insertion, or deletion must require time (n1/3).
53
Uniquely represented dictionaries
Problem: Find set-unique oder size-unique representations of the ADT „dictionary“
Known solutions:
(1) set-unique, oder-unique
Aragon/Seidel, FOCS 1989: Randomized Search Trees
universal
hash-function
Update as for priority search trees!
Search, insert, delete can be carried out in O(log n) expected time.
(s, h(s))
priority
s Î X
54
The Jelly Fish
(2) L. Snyder, 1976, set-unique, oder-unique
Upper Bound: Jelly Fish, search insert delete in time O(n).
body: n nodes
n tentacles of length n each
10
5
1
2
3
6
7
8
11
12
55
Lower bound for tree-based structures
set-unique, oder-unique
Lower bound: For “tree-based” structures the following holds:
Update-time · Search-time = Ω (n)
Number of nodes n ≤ h L + 1
L ≥ (n – 1)/h
At least L-1 keys must have moved from leaves to internal nodes. Therefore, update requires time Ω(L).
Delete x1
Insert xn+1 > xn
L leaves
·
xnx1
h
56
Cons-structures
(3) Sunder/Tarjan, STOC 1990, Upper bound: (Nearly) full, binary search trees
Einzige erlaubte Operation für Updates:
Search time O(log n)
EinfügenEntfernen in Zeit O(n) möglich
·
··
·31 15 353
L Rx L R
x
Cons, ,
57
Jump-lists
(Half-dynamic) 2-level jump-list
2-level jump-liste of size n
niini 22 )1(
Search: O(i) = O( ) timeInsertion: Deletion: O( ) time
n
n
22 4113
tail
0 i 2i n
(n-1)/i · i
2 3 5 7 8 10 11 12 14 17 19
58
Jump-lists: Dynamization
2-level-jump-list of size n niini 22 )1(
22 4113
search: O(i) = O(n) timeinsert delete
: O(n) time
Can be made fully dynamic:
(i-1)2 i2 n (i+1)2 (i+2)2
tail
0 i 2i n
(n-1)/i · i
2 3 5 7 8 10 11 12 14 17 19
59
3-level jump-lists
33 )1( ini
33 43,30 nnin 3
level 2
Search(x): locate x by followinglevel-2-pointers identifying i2 keys among which x may occur,level-1-pointers identifying i keys among which x may occur,level-0-pointers identifying x
time: O(i) = O(n1/3)
0 i 2i i2 i2+i 2·i2
60
3-level jump-lists
33 )1( ini
33 43,30 nnin 3
level 2
Update requiresChanging of 2 pointers on level 0Changing of i pointers on level 1Changing of all i pointers onlevel 2
Update time O(i) = O(n1/3)
0 i 2i i2 i2+i 2·i2
61
c-level jump-lists
Let
Lower levels:
level 0: all pointers of length 1:
...
level j: all pointers of legth ij-1:
...
level c/2 : ...
Upper levels:
level j: connect in a in list all nodes
1, 1·ij-1+1, 2· ij-1+1, 3· ij-1+1, ...
level c:
cc ini )1(
62
c-level jump-lists
Theorem:
For each c ≥ 3, the c-level jump-list is a size and order-unique representation
of dictionaries with the following characteristics:
Space requirement O(c·n)
Access time O(c·n1/c)
Update time , if n is even
, if n is odd
)( nO
)( 2/)1( ccnO
63
1 top-level tree with n leavesAll low-level trees for each sequence of n consecutive keys
Top-level tree direct search to the root of the currently active low-level treesSemi-dynamic structure: )1(22 22 kk n
12,)12( kk rrsn
low-level-tree-sizes+1 = top-level-tree-size )( nO
Shared-search-trees
Reduction of search time
64
Shared-search-trees
Pointers at:
Level 0 : (p-20) p (p+ 20)
Level 1 : (p-21) p (p+ 21)
…
Level k-2 : (p-2k-2) p (p+ 2k-2)
2(k-1) Pointers per node p, k = O(log n)
Search time O(log n)Space O(n log n)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1
0
65
Insertion: Determine insertion position;Change all pointers jumping over the insertion position;Add 2 new pointers per level;Completely rebuild top-level tree.
)1(22 22 kk n
0
1
66
Number of pointerchanges: level 0:
2·20+2level 1:
2·21+2…level k-2:
2·2k-2+2 2·(2k-1-1)+2(k-1) =
)nO(
)1(22 22 kk n
0
1
67
Shared search trees: Summary
Theorem: Shared search trees are a size- und order-unique representation of
dictionaries with the following characteristics:
Space requirement: O(n log n)
Search time: O(log n)
Upadate time: O( n )
Open problem:
Is there a size- and order-unique representation of by graphs with bounded node
degree, search time O(log n), and update time o(n) (e.g.. O(n))?