csc 213 lecture 10: btrees. announcements you should not need to do more than the lab exercise...

31
CSC 213 Lecture 10: BTrees

Upload: maximillian-hutchinson

Post on 03-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

CSC 213

Lecture 10: BTrees

Announcements

You should not need to do more than the lab exercise states If only says add a CharRange, you should not

need to define a CharClassUse your finite state machine drawings and

description of classes

Red-Black Tree Follow-up

Delete 3, 6, 5, 2, 8, 4, 1, 7 from this tree

4

2

1

6

5 7

8

Lies My Professor Told Me

Big-Oh notation does not always correctly model algorithm performance

For example, big-Oh Treats all memory accesses as equal:Register: 1 cycleCache: 20 cyclesRAM: 240 cyclesHard Drive: 200,000 cycles

Paging == Bad

What happens when heap needs more memory than is in RAM?Disk accesses dominate total running timeExecution can take 20 times or more longer

Paging == Bad

Virtual Memory Organization

Virtual Memory works by dividing memory into pagesSize of a page is constant throughout system

4096 bytes used in a lot of systems

Operating system then handles each page separately

When not being used, will evict pages to disk Must reload page whenever it is accessed

Problems with Trees

Tree are important and common way to organize information Provide consistent O(log n) access time, something

we all like

But nodes contain only 1 - 3 entries and 2 – 4 children Get spread over entire heap in no real order Great way to torture computer & make hard drive beg

for mercy Good when using roommates machine, not so god for your

own

BTrees to the Rescue!

BTrees are the real-world solution to pagingApple uses them to track directories and files

in their file systemOrganizational scheme used within MySQL

database (e.g., the most popular database) It also makes julienne fries

What is “the BTree?”

BTrees are similar to (2,4) treesAll leaves are at same levelNodes contain variable number of children and

entriesBut nodes are much, much bigger

Usually discuss a BTree of order m Internal nodes have m/2 to m entries

Root node has m of fewer entries

BTree Order

Select order to minimize pagingSized so that a full node, including its entries,

and references to the children fill a pageSince each node has at least m/2 entries, each

page is at least 50% full How many pages will we access searching

for an element?

Insertion into a BTree

Insertion begins as normalSearch through tree to find location to insertAdd entry into the node it belongs

Check for overflow If the node now has m+1 entries?

Split into two nodes of size m/2 and promote median (middle) entry into parent

Check if this causes the parent to overflow

Removal from a BTree

Swap entry to be removed with its successor at the bottom level

If node at bottom level now has fewer than m/2 entries If possible, move entry from sibling to parent and

steal entry from parentOtherwise, combine node with its sibling and steal

entry from parent Check if this propagates underflow to parent

Choosing a Good m

How do we choose a BTree’s order?Want to minimize number of disk accessesWant to maximize page usageSelect m so a full node fills a page

Smallest node (using m/2 entries) uses ½ a page

What is the maximum number of pages used for any search, insert or remove?

Using BTrees

One very common place to find BTrees is in databasesOften have too much data to fit in RAMNeeds simple, efficient organization

But databases also need data to be stored permanentlyDoes not interact well with heap objects, since

heap is stored in RAM

Database BTrees

Maintain BTree in memory…… but keep records on diskEntries include ID and where in file to find the

record Immediately write changes back to the disk

So we know that all updates will be saved

Also means we do not need to keep file in any specific order

RandomAccessFile

For this scheme to work, we cannot read and write file sequentially Instead, we must be able to jump around

throughout entire fileAlso need way to specifying locations in the

file Java’s solution: the RandomAccessFile

class

RandomAccessFile

Instances can create new files or work with already existing filesRandomAccessFile raf = new RandomAccessFile(“file.txt”, “rw”);

Creates file.txt if it does not already exist Allows read and write access to the file Throws an IOException if a problem arises Can now use variable raf to access/modify the file

Reading RandomAccessFile

Read from RandomAccessFile using normal file input methods:boolean readBoolean(), int readInt(), double readDouble()… reads and returns the appropriate value

int read(byte[] b)read up to b.length bytes and store in b; return the number of bytes read

Writing to RandomAccessFile

Write to RandomAccessFile using normal file output methods:void writeInt(int i), void writeDouble(double d)… write the appropriate value to the file

void write(byte[] b)write the contents of the array b to the file

Typical File I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

while (c != ‘s’) {

c = raf.readChar();

}

This is an example file we accessraf:

Typical File I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

while (c != ‘s’) {

c = raf.readChar();raf.writeChar(c);

}

This is an example file we access

Typical File I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

while (c != ‘s’) {

c = raf.readChar();raf.writeChar(c);

}

TTis is an example file we access

Typical File I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

while (c != ‘s’) {

c = raf.readChar();raf.writeChar(c);

}

TTii is an example file we access

Typical File I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

while (c != ‘s’) {

c = raf.readChar();raf.writeChar(c);

}

TTii s an example file we access

Typical File I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

while (c != ‘s’) {

c = raf.readChar();raf.writeChar(c);

}

TTii ssan example file we access

RandomAccessFile

RandomAccessFile includes ability to position where we next read from/write to anywhere in filevoid seek(long pos) moves anywhere

in filePositions are specified by the number of bytes

from beginning of file

RandomAccessFile I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

raf.seek(raf.length()-1);

c = raf.readChar();

raf.seek(0);

raf.writeChar(c);

This is an example file we access

RandomAccessFile I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

raf.seek(raf.length()-1);

c = raf.readChar();

raf.seek(0);

raf.writeChar(c);

shis is an example file we access

So, how do we use this?

We use these file positions to simplify our BTrees and entriesEach Entry contains the ID number and position of

record within the file We can even use this to simplify building our

BTree whenever we start the programRecord contents of BTree at end of file Store num. elements in BTree at start of fileCan then find and read BTree at startup

Daily Quiz

Write the Entry class that we would use with a disk-based BTree