external data structures1 file organization and indexes on databases
TRANSCRIPT
External data structures 1
External data structures
File organization and indexes on databases
External data structures 2
Primary vs. Secondary storage
• Primary storage– Main memory (RAM)– Volatile– Small capacity– Fast– You read/write a single
byte
• Secondary storage– Hard disk– Non-volatile– Large capacity– Slow– You read/write a
record (several bytes)
External data structures 3
Databases
• External data structures are used by Database Management Systems (DBMS) to store data on the hard disk
• Requirements– Add new data (fast)– Find the data (fast)– Update the data (fast)
External data structures 4
File organizationsHow to organize database records inside a file on
the hard disk• Heap (unordered) file
– Records are placed on disk in no particular order
• Sequential (ordered) file– Records are ordered by the value of a
specified field
• Hash files– Records are placed on disk according to a
hash function
External data structures 5
Heap files
• Records placed in the file in insertion order.
• Access methods– Insert: Fast– Find: Slow (linear search)– Delete: Mark record as deleted
External data structures 6
Ordered files
• Records in the file are sorted by 1 (or more) fields forming the key of the data set
• We can use binary search among records and inside records– If file is ordered by id
• select ... where id = 14– No binary search on non-ordered fields
• select ... where name = ’Anders’
• Insert / delete: Slow– We have to keep the sequence.– Solution: Overflow file with new elements
• Finding elements is a little slower
External data structures 7
Hash files
• Hash function calculates the address on the record
• Hash field– Usually the key
• Collision– Problem
• Different records hashes to same address
– Solution• Buckets can hold multiple records
External data structures 8
Indexes
• Index in a book– Auxiliary structure
• Not really part of the book– Used when searching for an item in the book
• To avoid linear search
• Index in a database– Auxiliary structure
• Not really part of the database tables– Used when searching for a record in the database
• To avoid linear search– Data file
• Contains records (the data)– Index file
• Contains the index, ordered according to some field
External data structures 9
Types of indexes
• Primary index– Ordering in the data file using a key field
• Clustering index– Ordering in the data file using a non-key field
• Called the clustering attribute
– At most 1 primary or clustering index
• Secondary index– Auxiliary data structure– Fast finding on non-prime/-clustering attributes
External data structures 10
Pros and cons of secondary indexes
• Benefits– Fast search (select)
• Liabilities– Slow updates (insert, update, delete)
• The data + the index must be updated
External data structures 11
Multilevel indexes
• Problem– Large indexes => slow search in the index
• Solution– Indexes of more levels
• Index on the index!
External data structures 12
References
• Connolly & Begg Database Systems, 4th edition, Addison Wesley 2004– Appendix C File
organizations and indexes