13. indexing mtrees - data structures using c++ by varsha patil

35
Oxford University Press © 2012 Data Structures Using C++ by Dr Varsha Patil 1 13. Indexing and Multiway Trees

Upload: widespreadpromotion

Post on 12-Jan-2017

413 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

1Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

13. Indexing and Multiway Trees

Page 2: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

2Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Objectives Indexing techniques

B-trees which prove invaluable for problems of external information retrieval

A class of trees called tries, which share some properties of table lookup

Important uses of trees in many search techniques

Page 3: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

3Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Introduction A file is a collection of records, each record

having one or more fields

The fields used to distinguish among the records are known as keys

File organization describes the way where the records are stored in a file

File organization is concerned with representing data records on an external storage media

Page 4: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

4Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

The file organization breaks down into two more aspects:

Directory—for collection of indices

File organization—for the physical organization of records

Page 5: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

5Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

File organization is the way records are organized on a physical storage

One of such organizations is sequential (ordered and unordered)

In this general framework, processing a query or updating a request would proceed in two steps:

The indices would be interrogated to determine the parts of the physical file to be searched

These parts of the physical file will be searched

Page 6: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

6Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Indexing An index, whether it is a book or a data file

index (in computer memory), is based on the basic concepts such as keys and reference fields

The index to a book provides a way to find a topic quickly

Page 7: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

7Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Indexing An index, whether it is a book or a data file

index (in computer memory), is based on the basic concepts such as keys and reference fields

The index to a book provides a way to find a topic quickly

Page 8: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

8Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Cylinder-Surface Indexing This is the simplest type of index organization.

It is useful only for the primary key index of a sequentially ordered file

In a sequentially ordered file, the physical sequence of records is ordered by the key, called the primary key

The cylinder-surface index consists of a cylinder index and several surface indexes

Page 9: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

9Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

For each cylinder, there is a surface index. If the disk has S usable surfaces, then each surface index has s entries. The total number of surface index entries is C.SEmp. No. Emp.

NameCylinder Surface

12345678

AboleeAnandAmitAmolRohit

SantoshSaurabh

Shila

11112222

11221122

Page 10: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

10Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Let there be two surfaces and two records stored per track. The file is organized sequentially on the field ‘Emp. name’

The cylinder index is shown in following tableEmp. No. Highest Key Value

1

2

Amol

Shila

Page 11: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

11Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

This method of maintaining a file and index is referred to as ISAM (indexed sequential access method)

It is the simplest file organization for single key files but not useful for multiple key files

Page 12: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

12Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Hashed Indexes The operations related to hashed indexes are

the same as those for hash tables

Page 13: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

13Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Multiway Search Trees

A multiway search tree is a tree of order m, where each node has utmost m children

Fig. shows way search tree:

d e p v

w x y z

rh j k l

b c

qia f g

m n o

s t u

Page 14: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

14Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

B-trees A B-tree is a balanced M-way tree. A node of the

tree contains many records or keys of records and pointers to children

To reduce disk access, the following points are applicable: Height is kept minimum

All leaves are kept at the same level

All other than leaves must have at least minimum number of children

Page 15: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

15Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

B-trees Definition: A B-tree of order m is an m-way tree with

the following properties: The number of keys in each internal node is one

less than the number of its non-empty children, and these keys partition the keys in the children in the fashion of the search tree

All leaves are on the same level All internal nodes except the root have utmost m

non-empty children and at least [m/2] non-empty children

The root is either a leaf node, or it has from two to m children

A leaf node contains no more than m − 1 keys

Page 16: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

16Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Node structure

Ptr1 Key1 Ptr2 Key2 Ptri Keyi …….. Keyn-1

Ptrn

X XXX<Key1 Keyi-1<X<Keyi X>Keyn

-1

Page 17: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

17Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Operations on B-tree Search a node

Insertion of a key into a B-tree

Deletion from a B-tree

Page 18: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

18Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

B+ Tree B+ trees are internal data structures That is, the nodes contain whatever information

is associated with the key as well as the key values

Page 19: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

19Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

B+ Tree Structure The structure of a B+ tree can be

understood from the following points: A B+ tree is in the form of a balanced tree

where every path from the root of the tree to a leaf of the tree is of the same length

Each non-leaf node (internal node) in the tree has between [n/2] and n children, where n is fixed

The pointer (Ptr) can point to either a file record or a bucket of pointers which each point to a file record

Searching time is less in B+ trees but has some problem of wasted space

Page 20: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

20Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Nodes of B+ Tree Internal node of a B+ tree with q −1 search

values

Leaf node of a B+ tree with q − 1 search values and q − 1 data pointers

Page 21: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

21Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Node structure

Ptr1 Key1 Ptr2 Key2 Ptri Keyi …….. Keyn-1

Ptrn

X XX

X<Key1 Keyi-1<X<Keyi X>Keyn-1

Tree PointerTree Pointer

Tree Pointer

Page 22: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

22Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Advantages of B+ trees over Indexed Sequential Access Method

A dynamic index structure that adjusts gracefully to inserts and deletes

A balanced tree

Leaf pages are not allocated sequentially. They are linked together through pointers (a doubly linked list)

Page 23: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

23Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Trie Tree One solution is to prune from the tree all the

branches that do not lead to any key

The resulting tree is called a trie (short for reTRIEvaL and pronounced ‘try’)

The number of steps needed to search a trie is proportional to the number of characters in a key

Page 24: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

24Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Splay Trees Splay trees are a form of a BST. A splay tree

maintains a balance without any explicit balance condition such as color

Instead, ‘splay operations’, which involve rotations, are performed within the tree every time an access is made

Page 25: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

25Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Splay Trees If we use a BST or even an AVL tree, then the records

of the newly admitted patient’s records will go to a leaf position, far from the root, and the access will be slower

Instead, we want to keep the records that are newly inserted or frequently accessed very near to the root, while the inactive records far off, in the leaf positions

However, we do not want to rebuild the tree into the desired shape. Instead, we need to make a tree a self-adjusting data structure that automatically changes its shape to bring the records closer to the root as they are used frequently, allowing inactive records to drift slowly down towards the leaves. Such trees are called as splay trees

Page 26: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

26Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Red-black Trees A red-black tree is a BST with one extra bit of

storage per node: its colour, which can either be red or black

Page 27: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

27Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Properties of red-black trees

Every node is either red or black All the external nodes (leaf nodes) are black The rank in a tree goes from zero upto the maximum

rank which occurs at the root. The rank of two consecutive nodes differs by utmost 1. Each leaf node has a rank 0

If a node is red, then both its children are black. In other words, consecutive red nodes are disallowed. This means every red node is followed by a black node; on the other hand, a black node may be followed by a black or a red node

This implies that utmost 50% of the nodes on any path from external node to root are red

Page 28: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

28Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Properties of red-black trees

The number of black nodes on any path from but not including the node x to leaf is called as black height of the node x, denoted as bh(x)

Every simple path from the root to a leaf contains the same number of black nodes

In addition, every simple path from a node to a descendent leaf contains the same number of black nodes

If a black node has a rank r, then its parent has the rank r + 1

If a red node has a rank r, then its parent will have the rank r as well

Page 29: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

29Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

KD-Trees A KD-tree is a data structure used in computer

science during orthogonal range searching, for instance, to find the set of points that fall into a given rectangle

Page 30: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

30Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

AA TreeAn AA tree is a balanced BST with the following

properties:

Every node is colored either red or black

The root is black

If a node is red, both of its children are black

Every path from a node to a null reference has the same number of black nodes

Left children may not be red

Page 31: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

31Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Advantages of AA Trees

They eliminate half the reconstructing cases

They simplify deletion by removing an annoying case If an internal node has only one child, that child

must be a red child We can always replace a node with the smallest

child in the right subtree; it will either be a leaf node or have a red child

AA tree, balanced BST, supports efficient operations, since most operations only have to traverse one or two root-to-leaf paths

Page 32: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

32Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Representing Balance Information in AA Tree

In each node of AA tree, we store a level. The level is defined by the following rules: If a node is a leaf, its level is one

If a node is red, its level is the level of its parent

If a node is black, its level is one less than the level of its parent

Here, the level is the number of left links to a null reference

Page 33: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

33Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Links in an AA tree A horizontal link is a connection between

a node and a child with equal levels The properties of such horizontal links are

as follows:

Horizontal links are right references

There cannot be two consecutives horizontal links

Nodes at level two or higher must have two children

If a node has no right horizontal link, its two children are at the same level

Page 34: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

34Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Summary

A node of a BST has only one key value entry stored in it. A multiway tree has many key values stored in each node and thus each node may have multiple subtrees

Different indexing techniques are used to search a record in O(1) time. The index is a pair of key value and address. It is an indirect addressing that imposes order on a file without rearranging the file

Indexing techniques are classified as Hashed indexing, Tree indexing, B-tree, B+ tree, Trie tree

Splay trees are self-adjusting trees

Page 35: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil

35Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

END Of

Chapter 13….!