csc 213 – large scale programming. dictionaries in real world often need large database on many...
TRANSCRIPT
Dictionaries in Real World
Often need large database on many machines Split search terms across machines Updating & searching work split between
machines Database way too large for any single
machine If you think about it, this is incredibly
common Where?
Splitting Keys From Values
In real world, we often have many indices Simple units measure where we can find
values Values could be searched for in multiple
ways
Splitting Keys From Values
In real world, we often have many indices Simple units measure where we can find
values Values could be searched for in multiple
ways
Index & Data Files
Split information into two (or more) files Data file uses fixed-size records to store
data Index files contain search terms & data
locations Fixed-size records usually used in data
file Each record will use exactly that much
space Extra space wasted if the value is smaller But limits data size, cannot get more space Makes it far easier to reuse space &
rebuild index
Index File Format
No standard format – depends on type of data Often variable sized, but this not specific
requirement Each entry in index file begins with exact
search term Followed by position containing matching
data As a result, often find indexes smushed
together Can read indexes at start of program
execution Reasonably assumes index file smaller than
data file Changes written immediately, however
When program starts, do NOT read data file
Indexed Files
Enables splitting search terms across computers Alphabetical split searches faster on many
serversA - C
D-E
F-HI-P
Q-R
S-T
U-X Y-Z
Indexed Files
Enables splitting search terms across computers Create indexes for different types of
searchingSong name
SongLength
How Does This Work?
Using index files simplified using positions Look in index structure to find position of
data in file With this position can then seek to specific
record Create instance & initialize by reading data
from file
Starting with Indexed Files
IBM106
IBM
AT & T 23 T Ford 2 F
American Telephone & Telegraph 0
International Business Machines
112
Ford Motorcars, Inc. 224
F 224
IBM 0
T 112
How Does This Work?
Adding new records takes only a few steps Add space for record with setLength on
data file Update index structure(s) to include new
record Records in data file updated at each
change
Adding New Data To The Files
IBM106
IBM
AT & T 23 T Ford 2 F
C 336
F 224
IBM 0
T 112
0
American Telephone & Telegraph 0
Citibank 336
International Business Machines
112
Ford Motorcars, Inc. 224
Adding New Data To The Files
IBM106
IBM
AT & T 23 T Ford 2 F
C 336
F 224
IBM 0
T 112
Citibank -2 C
American Telephone & Telegraph 0
Citibank 336
International Business Machines
112
Ford Motorcars, Inc. 224
How Does This Work?
Removing records even easier To prevent using record, remove items from
indexes Do NOT update index file(s) until program
completes Use impossible magic numbers for record in
data file
Removing Data As We Go
IBM106
IBM
AT & T 23 T Ford 2 F
C 336
F 224
IBM 0
T 112
Citibank -2 C
American Telephone & Telegraph 0
Citibank 336
International Business Machines
112
Ford Motorcars, Inc. 224
Removing Data As We Go
IBM106
IBM
AT & T 23 T Ford 0 Ø
C 336
IBM 0
T 112
Citibank -2 C
American Telephone & Telegraph 0
Citibank 336
International Business Machines
112