csc 213 – large scale programming. dictionaries in real world often need large database on many...

LECTURE 11:INDEXED FILES

CSC 213 – Large Scale Programming

Dictionaries in Real World

Often need large database on many machines Split search terms across machines Updating & searching work split between

machines Database way too large for any single

machine If you think about it, this is incredibly

common Where?

Split Dictionaries

Splitting Keys From Values

In real world, we often have many indices Simple units measure where we can find

values Values could be searched for in multiple

ways

Index & Data Files

Split information into two (or more) files Data file uses fixed-size records to store

data Index files contain search terms & data

locations Fixed-size records usually used in data

file Each record will use exactly that much

space Extra space wasted if the value is smaller But limits data size, cannot get more space Makes it far easier to reuse space &

rebuild index

Index File Format

No standard format – depends on type of data Often variable sized, but this not specific

requirement Each entry in index file begins with exact

search term Followed by position containing matching

data As a result, often find indexes smushed

together Can read indexes at start of program

execution Reasonably assumes index file smaller than

data file Changes written immediately, however

When program starts, do NOT read data file

Never Read Data File

Indexed Files

Enables splitting search terms across computers Alphabetical split searches faster on many

serversA - C

D-E

F-HI-P

Q-R

S-T

U-X Y-Z

Indexed Files

Enables splitting search terms across computers Create indexes for different types of

searchingSong name

SongLength

How Does This Work?

Using index files simplified using positions Look in index structure to find position of

data in file With this position can then seek to specific

record Create instance & initialize by reading data

from file

Starting with Indexed Files

IBM106

IBM

AT & T 23 T Ford 2 F

American Telephone & Telegraph 0

International Business Machines

112

Ford Motorcars, Inc. 224

F 224

IBM 0

T 112

How Does This Work?

Adding new records takes only a few steps Add space for record with setLength on

data file Update index structure(s) to include new

record Records in data file updated at each

change

Adding New Data To The Files

IBM106

IBM


C 336

F 224

IBM 0

T 112

0


Citibank 336


112


Adding New Data To The Files

IBM106

IBM


C 336

F 224

IBM 0

T 112

Citibank -2 C


Citibank 336


112


How Does This Work?

Removing records even easier To prevent using record, remove items from

indexes Do NOT update index file(s) until program

completes Use impossible magic numbers for record in

data file

Removing Data As We Go

IBM106

IBM


C 336

F 224

IBM 0

T 112

Citibank -2 C


Citibank 336


112


Removing Data As We Go

IBM106

IBM

AT & T 23 T Ford 0 Ø

C 336

IBM 0

T 112

Citibank -2 C


Citibank 336


112

For Next Lecture

Weekly assignment still available online Continues to be due Wednesday at 5PM Ask me questions, if you have trouble on a

problem

Reading Section 9.1 in textbook about Map ADT How do we look up data? What other ADTs are out there? How could they relate to today's lecture?

csc 213 – large scale programming. dictionaries in real world often need large database on many...

Documents

new data

position of data

data filechanges

data filenever

data fileeach record

index file smaller

smallerbut limits data

filesdata file