database indexes

43
DATABASE INDEXES #SperasoftTalks

Upload: sperasoft

Post on 14-Jan-2015

203 views

Category:

Technology


2 download

DESCRIPTION

Sperasoft Talks about outlining the principles of Database Management System based on RDMC.

TRANSCRIPT

Page 1: Database Indexes

DATABASE INDEXES

#SperasoftTalks

Page 2: Database Indexes

Who We Are?

We are a team of professionals specializing in game development, art production, online engineering and

creation of amazing products.

Our technology competencies include solid experience and background in delivering scalable platforms and

online solutions. That serve millions of players all over the world and run beyond amazing games.

Read more: http://www.sperasoft.com

Page 3: Database Indexes

•  RDBMS  store  data  only  in  Trees •  Index  is  a  tree  in  terms  of  data  structure •  a  Table  is  an  Index •  a  Clustered  Index  is  a  Table  itself •  a  Non-­‐clustered  Index  is  a  copy  of  data •  all  Non-­‐clustered  Indexes  refer  to  Clustered  one •  all  keys  in  Tree  Nodes  are  always  unique

The Simple Truth

Page 4: Database Indexes

•  Oracle  Database •  SQL  Server •  IBM  DB2 •  MySQL •  PostgreSQL •  Sybase •  Informix

What’s Common Between

Page 5: Database Indexes

RDBMS  is  a  type  of  Database  Management  System  that  stores  data  in  the  form  of  related  tables RDBMS  is  a  Database  Management  System  that  is  based  on  the  relaJonal  model  introduced  by  E.F.  Codd Data  is  stored  in  tables  and  the  relaJonships  among  the  data  are  also  stored  in  tables

Relational Database Management Systems

Page 6: Database Indexes

•  Born  on  the  Isle  of  Portland  in  England  in  1923

•  Died  in  Florida  US  in  2003,  aged  79 •  MathemaJc •  Worked  for  IBM

Edgar Frank “Ted” Codd

Page 7: Database Indexes

•  Introduced  “A  RelaJonal  Model  of  Data  for  Large  Shared  Data  Banks”  and  Alpha  database  language

•  IBM  started  implemenJng  the  RelaJonal  model  and  introduced  another  language  named  SEQUEL

Edgar Frank “Ted” Codd

Page 8: Database Indexes

•  Larry  Ellison  came  up  in  Jme  with  his  implementaJon  of  RelaJonal  model  and  the  language  –  Oracle  Database  and  SQL •  ANSI  started  making  SQL  standard

Birth of Oracle

Page 9: Database Indexes

•  It’s  all  about  Table  RelaJons

Relation Model Briefly

Page 10: Database Indexes

•  Database  contains  tables  (two  dimensional  arrays)

•  Tables  have  relaJonships  enforced  by  Foreign  Key  constraints  (1-­‐to-­‐Many  relaJonship)

•  NormalizaJon  of  tables  is  a  key  concept

•  That’s  why  RDBMS  are  called  RelaJonal

Relation Model

Page 11: Database Indexes

What’s Database Physically

Page 12: Database Indexes

•  Files  are  flat  in  nature

FILE

READING CURSOR

0 OFFSET

All Tables Are Stored in a File

Page 13: Database Indexes

•  What’s  the  value  behind  relaJons? •  What  is  a  database  table? •  What  is  a  table  index?

•  RelaJons  vs  How  data  is  stored

What’s Actually Matter

Page 14: Database Indexes

Id User  Name Country City Age

1 Michael USA Boston 30

2 Jane USA Boston 24

3 Scoe USA NYC 18

4 Bob UK London 41

5 Prescoe UK London 35

•  Such  array  seems  to  be  a  table •  How  to  find  Users  from  Boston  faster?

ArrayList<User>  users  =  new  ArrayList<User>();

How to Handle Millions of Users

Page 15: Database Indexes

Boston 1,  Michael,  USA,  Boston,  30 2,  Jane,  USA,  Boston,  24

NYC 3,  Scoe,  USA,  NYC,  18  

London 4,  Bob,  UK,  London,  41 5,  Prescoe,  UK,  London,  35  

Index is a Tree

Page 16: Database Indexes

ID  =  2 2,  Jane,  USA,  Boston,  24

ID  =  1 1,  Michael,  USA,  Boston,  30

ID  =  3 3,  Scoe,  USA,  NYC,  18  

Can replace an initial array with Index

Page 17: Database Indexes

•  Key  values  in  a  Key  node  should  be  unique •  Otherwise  Trees  do  not  work

What’s important to note

Page 18: Database Indexes

•  Indexes  are  Trees  in  terms  of  data  structure •  Trees  are  suitable  to  store  any  array  of  data  to  make  search  faster

Returning to our sheep

Page 19: Database Indexes

•  All  RDBMS  store  data  as  Balanced  Trees •  The  concrete  implementaJon  of  B-­‐Tree  could  differ  from  vendor  to  vendor

•  It  means  the  only  way  to  store  data  is  Tree •  No  excepJons  here  -­‐  table  is  a  tree,  index  is  a  tree

Balanced Trees

Page 20: Database Indexes

What’s a Clustered Index

Page 21: Database Indexes

•  The  next  record  in  Clustered  Index  is  always  stored  aoer  the  previous  one

RECORD  1 RECORD  2

1    |  Michael  |  USA  |  Boston  |  30 2    |  Jane  |  USA  |  Boston  |  24

The clustered index storage

Page 22: Database Indexes

Have  a  quesKon?  Like  this  deck?  

Tweet  us  @SperasoR Like  deck  on  SlideShare.com/sperasoR

Page 23: Database Indexes

•  Clustered  Indexes •  Non-­‐clustered  indexes •  Both  could  be  unique  and  non-­‐unique •  Table  can  be  without  any  indexes

•  How  is  that  comply  with  how  data  is  actually  stored?

What SQL allows us to do

Page 24: Database Indexes

•  Unique  and  non-­‐unique

•  CREATE  CLUSTERED  INDEX  [name]  ON  [table_name]  ([column1],  [column2])

•  CREATE  UNIQUE  CLUSTERED  INDEX  [name]  ON  [table_name]  ([column1],  [column2])

Clustered Indexes

Page 25: Database Indexes

•  Unique  and  non-­‐unique

•  CREATE  NONCLUSTERED  INDEX  [name]  ON  [table_name]  ([column1],  [column2])

•  CREATE  UNIQUE  NONCLUSTERED  INDEX  [name]  ON  [table_name]  ([column1],  [column2])

None Clustered Indexes

Page 26: Database Indexes

ID  =  2 Jane,  USA,  Boston,  24

ID  =  1 Michael,  USA,  Boston,  30

ID  =  3 Scoe,  USA,  NYC,  18  

Unique Clustered Index

Page 27: Database Indexes

•  We  know  Key  values  should  be  unique

•  How  RDBMS  resolves  this  problem?

Non-unique Clustered Index

Page 28: Database Indexes

•  SQL  Server  adds  4-­‐byte  uniquifier  to  each  duplicated  key  value

•  Algorithms  could  differ  from  vendor  to  vendor •  But  the  principle  is  the  same  –  add  something  to  make  them  unique

Non-unique Clustered Index

Page 29: Database Indexes

•  Just  omitng  Unique  keyword  makes  Key  values  bigger  (why  it’s  bad  realize  later)

•  The  simple  truth  is  that  Each  table  should  have  Clustered  Index

•  The  Clustered  Index  should  be  always  Unique

•  The  situaJons  when  its  not  so  should  be  excepJonal

Clustered Indexes

Page 30: Database Indexes

•  Such  tables  are  called  Heap  Tables •  How  are  they  stored  in  database  if  they  do  not  have  a  Key  value  specified?

Tables without Clustered Index

Page 31: Database Indexes

•  Heap  Tables  are  also  stored  in  Trees

•  What’s  in  a  Key  value  for  Tables  without  Clustered  Index?

•  The  value  called  RID •  the  unique  idenJfier  which  refers  to  the  physical  locaJon  of  the  record  in  a  file

No magic over here

Page 32: Database Indexes

•  There  is  no  meaningful  data  in  Keys •  Table  records  are  not  stored  physically  in  Keys’  order

Why Heap Tables are so bad

Page 33: Database Indexes

•  Clustered  Index  has  the  actual  data  columns  in  Leaf-­‐nodes

•  What’s  in  Leaf-­‐node  of  Non-­‐clustered  index?

•  Remember  that  Non-­‐clustered  Indexes  are  duplicated  data

Non-clustered Indexes

Page 34: Database Indexes

Jane Lookup  value:  ID=2

Michael Lookup  value:  ID=1

Scoe Lookup  value:  ID=3

•  Leaf-­‐nodes  contain  the  lookup  values •  Lookup  value  is  Clustered  Index’s  Key

Non-clustered Index

Page 35: Database Indexes

•  We  know  Key  values  should  be  unique

•  How  non-­‐clustered  index’s  key  becomes  unique?

Non-unique Non-clustered Index

Page 36: Database Indexes

•  SQL  Server  adds  Clustered  Index  Key  value  to  Non-­‐clustered  Index  Key  value  to  make  it  unique

Jane,  2 Lookup  value:  ID=2

Michael,  1 Lookup  value:  ID=1

Scoe,  3 Lookup  value:  ID=3

Non-unique Non-clustered Index

Page 37: Database Indexes

•  from  SELECT  statement  the  WHERE  condiJon  is  taken

•  based  on  the  Columns  in  WHERE  we  know  what  columns  we  search  by

•  look  through  available  indexes  trying  to  find  the  appropriate  one,  starJng  from  Clustered

•  found  out  non-­‐clustered  index  which  fits  best

How indexes are used (1)

Page 38: Database Indexes

•  get  the  needed  Node  in  Non-­‐clustered  index •  get  the  Lookup  value  from  that  Node •  use  that  lookup  value  to  find  a  record  in  Clustered  index

•  get  selected  columns  from  Clustered  index  (table  itself)

How indexes are used (2)

Page 39: Database Indexes

•  Unique  Clustered  Index  on  Id  column •  Non-­‐unique  Non-­‐clustered  Index  on  City  column

•  Select  UserName  from  tbl  where  City  =  ‘Boston’

Sample 1

Page 40: Database Indexes

•  Unique  Clustered  Index  on  Id  column •  Non-­‐unique  Non-­‐clustered  Index  on  City  column

•  Select  Id  from  tbl  where  City  =  ‘Boston’

Sample 2

Page 41: Database Indexes

•  Unique  Clustered  Index  on  Id  column •  Non-­‐unique  Non-­‐clustered  Index  on  City  column

•  Select  UserName  from  tbl  where  City  =  ‘Boston’  select  should  not  go  to  Clustered  Index

Sample 3

Page 42: Database Indexes

•  Unique  Clustered  Index  on  Id,  UserName  column •  Select  Id  from  tbl  where  City  =  ‘Boston’  and  UserName  =  ‘Michael’

•  What  columns  Non-­‐unique  Non-­‐clustered  Index  would  include?

Sample 1

Page 43: Database Indexes

WE  ARE  SPERASOFT  

DELIVERING  AMAZING  PRODUCTS

Follow  us:    @SperasoA  

 

hCp://www.sperasoA.com