i/o-algorithms lars arge january 31, 2005. lars arge i/o-algorithms 2 random access machine model...
Post on 18-Dec-2015
218 views
TRANSCRIPT
Lars Arge
I/O-algorithms
2
Random Access Machine Model
• Standard theoretical model of computation:
– Infinite memory
– Uniform access cost
R
A
M
Lars Arge
I/O-algorithms
3
Hierarchical Memory
• Modern machines have complicated memory hierarchy
– Levels get larger and slower further away from CPU
– Levels have different associativity and replacement strategies
– Large access time amortized using block transfer between levels
• Bottleneck often transfers between largest memory levels in use
L
1
L
2
R
A
M
Lars Arge
I/O-algorithms
4
I/O-Bottleneck• I/O is often bottleneck when handling massive datasets
– Disk access is 106 times slower than main memory access
– Large transfer block size (typically 8-16 Kbytes)
• Important to obtain “locality of reference”
– Need to store and access data to take advantage of blocks
track
magnetic surface
read/write armread/write head
Lars Arge
I/O-algorithms
5
Massive Data• Massive datasets are being collected everywhere• Storage management software is billion-$ industry
Examples:
• Phone: AT&T 20TB phone call database, wireless tracking
• Consumer: WalMart 70TB database, buying patterns
• WEB: Web crawl of 200M pages and 2000M links, Akamai stores 7 billion clicks per day
• Geography: NASA satellites generate 1.2TB per day
Lars Arge
I/O-algorithms
6
Example: Grid Terrain DataAppalachian Mountains (800km x 800km)
• 500MB at 100m resolution
• 5.5GB at 30m resolution
– NASA SRTM mission acquired 30m data for
80% of the earth land mass
• 50GB at 10m resolution (some of US available from USGS)
• 5TB at 1m resolution
Lars Arge
I/O-algorithms
7
I/O-Model
• Parameters
N = # elements in problem instance
B = # elements that fits in disk block
M = # elements that fits in main memory
K = # output size in searching problem
• We often assume that M>B2
• I/O: Movement of block between memory and disk
D
P
M
Block I/O
Lars Arge
I/O-algorithms
8
List Ranking
• Trivial internal memory algorithm takes O(N) time
– and causes O(N) page faults in external memory
• O(N/B) is the number of I/Os we need to read N element
– Difference between N and N/B is extremely important in practice
• Can we develop O(N/B) algorithm?
– Answer is NO
CBA D E G HF
B=M/B=2
Lars Arge
I/O-algorithms
9
Fundamental Bounds [AV88] Internal External
• Scanning: N
• Sorting: N log N
• Permuting
– List rank
• Searching:
• Note:
– Permuting not linear
– Permuting and sorting bounds are equal in all practical cases
– B factor VERY important:
– Cannot sort optimally with search tree
NBlog
BN
BN
BMlog
BN
NBN
BN
BN
BM log
}log,min{BN
BN
BMNN
N2log
Lars Arge
I/O-algorithms
10
Sorting• Merge sort:
– Create N/M memory sized sorted runs
– Merge runs together M/B at a time
phases using I/Os each)( BNO)(log
MN
BMO