csc 213 lecture 10: btrees. announcements you should not need to do more than the lab exercise...
TRANSCRIPT
Announcements
You should not need to do more than the lab exercise states If only says add a CharRange, you should not
need to define a CharClassUse your finite state machine drawings and
description of classes
Lies My Professor Told Me
Big-Oh notation does not always correctly model algorithm performance
For example, big-Oh Treats all memory accesses as equal:Register: 1 cycleCache: 20 cyclesRAM: 240 cyclesHard Drive: 200,000 cycles
Paging == Bad
What happens when heap needs more memory than is in RAM?Disk accesses dominate total running timeExecution can take 20 times or more longer
Virtual Memory Organization
Virtual Memory works by dividing memory into pagesSize of a page is constant throughout system
4096 bytes used in a lot of systems
Operating system then handles each page separately
When not being used, will evict pages to disk Must reload page whenever it is accessed
Problems with Trees
Tree are important and common way to organize information Provide consistent O(log n) access time, something
we all like
But nodes contain only 1 - 3 entries and 2 – 4 children Get spread over entire heap in no real order Great way to torture computer & make hard drive beg
for mercy Good when using roommates machine, not so god for your
own
BTrees to the Rescue!
BTrees are the real-world solution to pagingApple uses them to track directories and files
in their file systemOrganizational scheme used within MySQL
database (e.g., the most popular database) It also makes julienne fries
What is “the BTree?”
BTrees are similar to (2,4) treesAll leaves are at same levelNodes contain variable number of children and
entriesBut nodes are much, much bigger
Usually discuss a BTree of order m Internal nodes have m/2 to m entries
Root node has m of fewer entries
BTree Order
Select order to minimize pagingSized so that a full node, including its entries,
and references to the children fill a pageSince each node has at least m/2 entries, each
page is at least 50% full How many pages will we access searching
for an element?
Insertion into a BTree
Insertion begins as normalSearch through tree to find location to insertAdd entry into the node it belongs
Check for overflow If the node now has m+1 entries?
Split into two nodes of size m/2 and promote median (middle) entry into parent
Check if this causes the parent to overflow
Removal from a BTree
Swap entry to be removed with its successor at the bottom level
If node at bottom level now has fewer than m/2 entries If possible, move entry from sibling to parent and
steal entry from parentOtherwise, combine node with its sibling and steal
entry from parent Check if this propagates underflow to parent
Choosing a Good m
How do we choose a BTree’s order?Want to minimize number of disk accessesWant to maximize page usageSelect m so a full node fills a page
Smallest node (using m/2 entries) uses ½ a page
What is the maximum number of pages used for any search, insert or remove?
Using BTrees
One very common place to find BTrees is in databasesOften have too much data to fit in RAMNeeds simple, efficient organization
But databases also need data to be stored permanentlyDoes not interact well with heap objects, since
heap is stored in RAM
Database BTrees
Maintain BTree in memory…… but keep records on diskEntries include ID and where in file to find the
record Immediately write changes back to the disk
So we know that all updates will be saved
Also means we do not need to keep file in any specific order
RandomAccessFile
For this scheme to work, we cannot read and write file sequentially Instead, we must be able to jump around
throughout entire fileAlso need way to specifying locations in the
file Java’s solution: the RandomAccessFile
class
RandomAccessFile
Instances can create new files or work with already existing filesRandomAccessFile raf = new RandomAccessFile(“file.txt”, “rw”);
Creates file.txt if it does not already exist Allows read and write access to the file Throws an IOException if a problem arises Can now use variable raf to access/modify the file
Reading RandomAccessFile
Read from RandomAccessFile using normal file input methods:boolean readBoolean(), int readInt(), double readDouble()… reads and returns the appropriate value
int read(byte[] b)read up to b.length bytes and store in b; return the number of bytes read
Writing to RandomAccessFile
Write to RandomAccessFile using normal file output methods:void writeInt(int i), void writeDouble(double d)… write the appropriate value to the file
void write(byte[] b)write the contents of the array b to the file
Typical File I/O
Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;
while (c != ‘s’) {
c = raf.readChar();
}
This is an example file we accessraf:
Typical File I/O
Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;
while (c != ‘s’) {
c = raf.readChar();raf.writeChar(c);
}
This is an example file we access
Typical File I/O
Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;
while (c != ‘s’) {
c = raf.readChar();raf.writeChar(c);
}
TTis is an example file we access
Typical File I/O
Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;
while (c != ‘s’) {
c = raf.readChar();raf.writeChar(c);
}
TTii is an example file we access
Typical File I/O
Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;
while (c != ‘s’) {
c = raf.readChar();raf.writeChar(c);
}
TTii s an example file we access
Typical File I/O
Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;
while (c != ‘s’) {
c = raf.readChar();raf.writeChar(c);
}
TTii ssan example file we access
RandomAccessFile
RandomAccessFile includes ability to position where we next read from/write to anywhere in filevoid seek(long pos) moves anywhere
in filePositions are specified by the number of bytes
from beginning of file
RandomAccessFile I/O
Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;
raf.seek(raf.length()-1);
c = raf.readChar();
raf.seek(0);
raf.writeChar(c);
This is an example file we access
RandomAccessFile I/O
Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;
raf.seek(raf.length()-1);
c = raf.readChar();
raf.seek(0);
raf.writeChar(c);
shis is an example file we access
So, how do we use this?
We use these file positions to simplify our BTrees and entriesEach Entry contains the ID number and position of
record within the file We can even use this to simplify building our
BTree whenever we start the programRecord contents of BTree at end of file Store num. elements in BTree at start of fileCan then find and read BTree at startup