ugif 10 2012 beauty ofifmxdiskstructs ugif

38
The Beauty of Informix Disk Structures Presented by Frédéric Delest Written by Andreas Legner

Upload: ugif

Post on 27-May-2015

188 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ugif 10 2012 beauty ofifmxdiskstructs ugif

The Beauty of

Informix Disk Structures

Presented by Frédéric Delest Written by Andreas Legner

Page 2: Ugif 10 2012 beauty ofifmxdiskstructs ugif

What to expect

• On-disk persistence of an Informix Server instance

• Touch on layout of spaces and chunks

• Pages and page types

• How’s your data stored in partitions

• Ways to look at what’s on disk

• Hands-on– Finding a spec. row of data in your server instance

• Your questions answered– Many things only documented vaguely nowadays,

so wonder what you still know ;-)

• Hope there’s something new for everyone!

Page 3: Ugif 10 2012 beauty ofifmxdiskstructs ugif

We’ll be talking…

• Partitions– What all your tables and indices consist of

– Even things like sequences or timeseries

• Pages– What a whole instance is based upon

– Changed heavily over time – and still remained the same

• Dbspaces & Chunks– Two very persisting species as well

– through all evolution since earliest versions of Informix

• Physical & Logical Logs– How old is your oldest phys. or log. log file?

• All this supporting an ever growing, heavily expanding set of functionality– Allowing for extremely seamless, reliable, unexpensive and fast

migration from v7 through v11.7 (and back)

Ain’t this Beauty? Simplicity designed for sustanability.

Page 4: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Test Environment

• Informix Virtual Appliance

– Same as used for other sessions

• The main demo instance:

– INFORMIXDIR=/opt/IBM/informix

– INFORMIXSERVER=demo_on

– ONCONFIG=onconfig.demo_on

– ROOTPATH /data/IBM/informix/demo/demo_on/online_root

Page 5: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Jump right into it

• What makes up an Informix server instance when it’s down?– $INFORMIXDIR & $ONCONFIG

– onconfig root chunk info

– Chunks

• A chunk– A device (“raw”)

– A file (“cooked”)

– Actually a contiguous portion of them• Starting at an offset

• reaching <size> kiloBytes further

• NOT initialized anyhow as a whole

• Unless newly created as a (cooked) file: blown up with zero bytesCan’t use ‘sparse files’ – for obvious reasons

• Only first and third pages (0 + 2) are initialized

Page 6: Ugif 10 2012 beauty ofifmxdiskstructs ugif

The Root Chunk

• The Root Chunk is the only chunk initially– Making up the Root dbspace (“rootdbs” usually)

– Holding everything required

– In specific order

• … and will remain the key entry point to e.g. all other chunks

• Begins on so called “root reserved pages”– Starting from here anything else can be found

• Followed by a single chunk free list page– Every chunk logically begins in a chunk free list page recording its free space

– Only blob chunks (chunks of a blobspace) don’t have these – they are a totally different kind

• Followed by the dbspace’s master partition– “partition partition” or “TBLSpace TBLSpace”

• (Almost) anything beyond this can change– Database partition <– this would never move

– The physical log

– Initial logical logs

– System and user databases …

Page 7: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Dbspaces/Sb(lob)spaces

• Up to m logical collections of 1 – n chunks each– We’ll see what m and n can be

• Home of– Partitions in case of dbspaces and – partially – sbspaces

– Sblobs in case of sbspaces

– Blobs in case of blobspaces

• Minimum entity of a backup or restore

• ‘Critical’– Rootdbs or

– Dbspace containing physical or any logical log

– Must be contained in any dbspace backup or L 0 restore

Page 8: Ugif 10 2012 beauty ofifmxdiskstructs ugif

A Fresh Instance

For newbies

(or others still wishing to know – do this whenever you want to test something):

• Let’s create a new baby instance:– INFORMIXSERVER=baby

– ONCONFIG=onconfig.$INFORMIXSERVER

– Copy $INFORMIXDIR/etc/onconfig.std to $INFORMIXDIR/etc/$ONCONFIG

– Edit new config file:• ROOTPATH /tmp/root_chunk.baby• Lower ROOTSIZE, PHYSFILE and LOGSIZE by factor 10• MSGPATH $INFORMIXDIR/online.baby.log• SERVERNUM 123• DBSERVERNAME baby

– Add an entry to $INFORMIXDIR/etc/sqlhosts (unset INFORMIXSQLHOSTS):• baby onsoctcp localhost 9876

– oninit –ivy to initialize new instance on disk

– onstat –d to see chunks and dbspaces we have - one only

– onstat –m –r 2 to see when system databases creation is done

Page 9: Ugif 10 2012 beauty ofifmxdiskstructs ugif

oncheck -p…

• oncheck’s -p option adds printing to checking– DBA’s first choice for looking at disk objects

– -pr|R for printing reserved pages

– -pP for locating pages physically, taking chunk# and page offset (base pages)

– -pp for locating pages logically within a partition, taking partnum and log. page#

– -pe for extent listing

– -pt|T for printing partition pages

– … pd|D|k|K|l|L for data and index pages

• Some options only working when server is up– Esp. when needing more detail info than just a chunk

• Others first attempting a connection– Might have to wait up to $INFORMIXCONTIME seconds (default: 60) – when server is down

• When server is up it will always go through the server– Hence show you buffer cache content rather than reading from disk

Page 10: Ugif 10 2012 beauty ofifmxdiskstructs ugif

First Peek at a Chunk

• Do an ‘oncheck -pe [rootdbs]’– Extent listing

• we’ll clarify “extents” later

– Can limit output to specific space• not any further … so can be big

– Only available online (or quiescent)

– And with all the space’s chunks online (!)• Won’t work if one chunk in space is down

• Try and locate the objects mentioned so far

Page 11: Ugif 10 2012 beauty ofifmxdiskstructs ugif

oncheck -pe

DBspace Usage Report: rootdbs Owner: informix Created: 01/26/2011

Chunk Pathname Pagesize(k) Size(p) Used(p) Free(p)1 /tmp/rootchunk.baby 2 100000 52256 47744

Description Offset(p) Size(p)------------------------------------------------------------- -------- --------RESERVED PAGES 0 12CHUNK FREELIST PAGE 12 1rootdbs:'informix'.TBLSpace 13 250PHYSICAL LOG 263 15000LOGICAL LOG: Log file 1 15263 500LOGICAL LOG: Log file 2 15763 500...

• ‘p’ is pages – base unit of a chunk

• First 3 items always the same– Root reserverd pages

– Chunk’s first chunk free list page

– TBLSpace TBLSpace’s first extent

• All 3 can have “extension”

Page 12: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Pages and Page Sizes

• A chunk is made up of pages

• Base i/o unit is a page– Also data and index buffering occurs in pages

• 2kB entities (4kB on AIX and Windows) by default– Mandatory page size on “critical dbspaces”:

root dbspace or dbspace holding any phys. or log. logs

• Configurable page size for other, non-critical dbspaces– Per dbspcace

– At dbspace creation time

– In multiples of default page size, up to 16k

• Different game in blobspaces and sbspaces– Blobsspaces always had freely choosable pages sizes (multiples of base page size)

– Sbspaces use default (base) page size… no matter what people (or Informix installers) keep telling you ;-)

Page 13: Ugif 10 2012 beauty ofifmxdiskstructs ugif

How to look at a page?

• oncheck -pP <chunk_no> <page_offset> [#pgs] [-h]– Prints page header

– Prints page slot table and slots if applicable• Unless -h (headers only) specified

– <#pgs> to see multiple pages• (not working yet with non-default page size)

– Requires <page_offset> specified in base (default) pages !

• SMI:– sysrawdsk look at pages as raw space

– syspaghdr look at page headers only

– Both indexed, but not very smart – e.g. can’t well use <=/</>/>=

– Use base pages for offset!

– Use carefully – not too safe, esp. with non-default page size!

• onstat: when pages in memory

• dd / od / …– Latter two provide more ‘natural’ image of a page

Page 14: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Page Structure

• (Almost) every used page has– a 24byte page header

– a trailing stamp (last 4 bytes)

• When header and stamp match, the page is considered consistent in itself– At least it has been written completely

– A checksum mechanism used nowadays – used to be two stamps that needed to match

• Page content usually is organized in slots

• Slot table– growing from page end

– Entries describing slots

• Unused pages– no structure or consistency assumed

• What is ‘unused’ ?– Not allocated to any object, so FREE in the chunk

– Or beyond it’s object’s “npused” (# pages used)

Page 15: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Some Pages Now

• Try this now:– oncheck -pP 1 0 12 > first12.pgs

• Find

– Page headers

– Slot tables and entries

– Slots

• What is it what we’re looking at?

• Try to dump the same using ‘dd’ and/or ‘od’– dd if=$ROOTCHUNK bs=2k count=12 | od -A x -t x > first12.hex

Page 16: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Page Header Fields

• Page header size: 24 bytes:

• Fields – no longer documented:– Chunk:Offset (OOOOOOOO CCCC) 4 + 2

– Checksum (ssss) 2

– N2k (n:5) 2:5

– Nslots (ssss:11) 2:11

– Flags/Type (FFFF) 2

– Free Pointer (ffff) 2

– Free Counter (cccc) 2

– Next Page (NNNNNNNN) 4

– Previous Page (PPPPPPPP) 4

Page 17: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Page Types

• Many different page types– oncheck -pp|P naming them in page header output portion

– Encoded in lower bits of page flags

• ROOTRSV: root (and extended) reserved pagesrecording system configuration

• CHUNK: chunk free list pages, recording FREE extentsfirst one always at fixed position 2 in a chunkchained if one doesn’t suffice

• FREE: partition free bitmap,recording page’s use state within a partitionat fixed intervals within a partitionfirst one always logical page 0

• PARTN/SECPARTN: partition pages and secondary partition pagesa partition’s details, incl. in-place alter history

• DATA/REMAIN: table data row and overflow (remainder) pages

• BTREE: btree index page (root/twig/leaf node)

• PBLOB partition blob page

• BLOB/BMAP/BBITblobspace pages

Page 18: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Slots

• Page content organized in slots normally– Only few page types don’t need real slots

(chunk FREE list, bitmap, plog marker, any sort of blobspace pages …)

• Slot– A contiguous range of bytes within a page

– With a 2*2bytes slot table entry describing it• Slot begin and slot size, optional slot flags

– Space consumption of a slot: slot size + 4

– Slot size can be zero – deleted slot

– Slot table size, growing from page end: page’s #slots * 4

• Page can have up to 2k slots– E.g. large index pages can have this many

– Certain pages have much lower limits, for various reasons• DATA, REMAINDER, PBLOB: max. 255 slots reason: ROWIDs (we’ll see later)

• Reserved pages only few (tens) reason: slot vs. page sizes

Page 19: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Reserved Pages

• Try this:– oncheck -pr > first12.txt

• compare to what we’ve dumped earlier– Formatting those 12 “reserved pages”

• We’re seeing:– Page Zero: version information primarily

– Onconfig params and values (not all)

– Physical/Logical log definitions, and last Checkpoint details

– Dbspace definitions

– Chunk definitions

– Archive details and Data Replication status

– Yet not all of them are displayed• Some are paired – for recoverability reasons

• Only more recent of pair is taken

– In a larger instance many more are displayed …• But not mentioned individually, as extra (extended) reserved pages

• Initial 12 can only hold very limited amount of details

Page 20: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Reserved Pages Extension

• Log. logs, dbspaces and chunks can be many

• To accommodate their definitions reserved pages can be extended

• Extensions for each sort always in contiguous blocks– Within “rootdbs” chunks

• Root reserved page pointing to its extension– pg_next: start page

– pg_prev: extension size

Zero

Config

Ckpt1

Ckpt2

Dbsp1

Dbsp2

PChunk1

PChunk2

MChunk1

MChunk2

Root Reserved Pages

More logical logs…

More logical logs…

More space specs…

More space specs…

More pchunk specs…

More pchunk specs…

Extended Reserved Pages

Arch1

Arch2

More mchunk specs…

More mchunk specs…

Page 21: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Extents

• Contiguous sets of pages allocated to a certain purpose– E.g. to a partition, or forming a log file

• Within one chunk

• Arbitrary size: 1 page up to (almost) chunk size

• Oncheck –pe: listing all extents of a dbspace (or whole instance)

• S.a sysextents SMI table

Page 22: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Sorts of Extents

• Possible extents:– Reserved pages – root and extensions

– Chunk free list pages – single page extents

– Physical log – 1 large extent

– Logical logs – 1 extent each

– Partition extents – data/index partitions consist of 0 - many extents

– Unused areas of a chunk: FREE extents

• So what’s needed to read to compile a complete extent list?– Reserved pages (for log files)

– Chunk free lists

– Partition pages

Page 23: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Partitions

• Partitions form the containers for database objectsrecorded, by their Partnum or Fragid, in database catalogs– Tables (and their fragments)

– Indices (and their fragments)

– Sequences – relying on a partion’s ability to generate serial values

– Even external tables possess a (dummy) partition – for having a partnum

– Sbspace metadata

• Thinking of a partition as a ‘file’ (containing the partition data)– partition (header) page would be the ‘inode’

– Partition extents would be ‘blocks’

– dbspace would be the ‘file system’

Page 24: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Partitions (cont.)

• A partition (“tablespace”) consists of– Its partition header page

• Holding the details that describe the partition

• Potentially extending to secondary partition pages

– A collections of allocated extents

• Partitions resides in a (db-/sb-)space, one abstraction level above chunks– Their extents reside in the space’s chunks

• All partitions of a space are recorded, by their partition header pages,in the space’s Partition Partition– aka. “TBLSpace TBLSpace”

– The space’s master partition - the very first one

– Holding the spaces partition pages

Page 25: Ugif 10 2012 beauty ofifmxdiskstructs ugif

What’s a Partnum?

• Visualizing a dbspace first:Dbspace:DbsNo rp off flags 1.chk #chks flags (b)pg_sz name

4 0 354 60001 4 3 N--BA 1 datadbsPrimary chunks:chkno rp off dbsno nxchk offset fpage #bpages #freepgs ovhd f l a g s pg_sz path

4 0 39c 4 5 0 - 1000 0 30040 PO-B 2048 /data/IBM/informix/demo/de mo_on/datadbs_15 0 4c8 4 6 0 - 2500 2 30040 PO-B 2048 /data/IBM/informix/demo/de mo_on/datadbs_26 0 5f4 4 0 0 - 4000 270 10040 PO-B 2048 /data/IBM/informix/demo/de mo_on/datadbs_3

Tblspace tblspace

FREE + free list

Table_1

Table_2

Table_3

Table_4

1. chunk …/datadbs_1 2. chunk …/datadbs_2 3. chunk …/datadbs_3…

0

100

99

199…

0x00400001

0x0040005b

0x004000c2

0x00400062

0x00400005

Partnum

Reserved Pages

Partition

Page 26: Ugif 10 2012 beauty ofifmxdiskstructs ugif

So … What’s a partnum?

• A partnum is a 4bytes integer number– Uniquely identifying a partition

– Falling into 1.5 bytes “dbspace number”

– And 2.5 bytes “logical page number”

– Hex representation: 0xdddlllll

• What does this mean?– Each dbspace can hold partitions (TBLSpaces)

– It always holds a master partition (TBLSpace TBLSpace)

– All other partitions are recorded in this master partition

– The master partition only contains partition header and secondary pages

– Each partition header page describes one partition

– The ‘lllll’ fraction of a partition’s partnum is the number (position) of its partition header page within the dbspace’s (‘ddd’) TBLSpace TBLSpace

• What special partnum then is 0xN00001 ?– TBLSpace TBLSpace’s own partnum for dbspace ‘N’

Page 27: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Looking at a Partition Page

• oncheck –pt|T db:owner.table[,dbs] | partnum

• Finds the desired partition header page(s)

• Tells you the following recorded in those pages– General partition info – slot 1

• Partnum, date, flags, rowsize, …

– Extents allocated to this partition – slot 5

– Evtl. a pointer to the partition’s current compression dictionary – slot 7

– Partition name printed is NOT taken from partition page – determined from catalogs instead

• Specifying a partnum will target only this one partition page– Will attempt to resolve partition name querying systables

• Otherwise all partitions of the specified table are targeted– Single data partition – or multiple in case of a fragmented table

– Index partitions – each index normally has its own partition (detached)

• -pT: will scan an entire (set of) partition(s) to gather page statistics• Index/Data/Bitmap page types and usage

• Index usage reports

• In-place alter versions

• Only working with the server running

Page 28: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Partition Page ‘raw’

• oncheck –pp 0x<N>00001 <L>

– What’s the difference ?

– Not formatted as a partition page – but “complete”

instead ;-)

• Try and compare the following:

– oncheck -pt 0x100001

– oncheck -pp 0x100001 1

– In how far are these the same?

– In how far different?

Page 29: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Find a specific Data Row now

• Given a specific row in a fragmented table– dbname:[owner.]tabname[,fragdbs|%partition]:rowid

– or a partnum:rowid combination,e.g. from a log record

– What would it take to get to that row manually?

• First let’s learn what’s to be done under the hood

• Let’s assume the partnum is known already– Can be obtained from systables or sysfragments

– Let’s say: partnum 0x400079, rowid 0x00000a01

– Or obtain e.g. from systables.partnum

Page 30: Ugif 10 2012 beauty ofifmxdiskstructs ugif

So what’s a Rowid ?

• A rowid describes the precise location of a row within a partition/fragment:– 0xppppppss - 4byte integer

– High 3 bytes: logical page number within partition

– Low byte: slot number with page

• Not to be confused with the “WITH ROWID” shadow column (frag’d table)– A real number assigned to a row

4th extent

2nd extent

3rd extent

5th Extent ...

Page 0

page 4

page 8

1st extent

A Partition

….

slot 1

slot n

slots ….

Page

header

Slots

Rowid:

0xa01slot n

slots ….

Bitmap

Page

Page 31: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Paths to Our Row’s Page (1)

So we need extent info for our partition (identified by partnum)– Want to physically locate the page containing our row

– Either walk all the way by foot, via the partition pages

– Or use pick from a formated extent list

• Crawling:Find partition page for partnum and use its extent list for translation• Dump Tblspace Tblspace partition page:

4th page in space’s first chunk - this is fixed

• Slot 5 has the extent list - we’re on Linux, sorry for wrong endianess

• Take partnum’s “logical page” portion

• Convert to physical address using raw extent list found

• Determine location of target partition page and dump it as well

• Use that page’s raw extent list for translating your rowid into a physical page

Page 32: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Paths to Our Row’s Page (2)

• Walking:Using formatted extent list• Obtain an extent list (oncheck –pe)

• Determine table name (from system catalog)

• Find extent matching your matching(can be confusing if table is fragmented)

• OR: use extent list in ‘oncheck –pt <partnum>’ output

• Calculate precise phys. location(extent start plus log. page difference)

• Driving:– oncheck -pp <partnum> <logical_page>

Page 33: Ugif 10 2012 beauty ofifmxdiskstructs ugif

The Row Finally

• oncheck will dump the page’s slots in raw hex format– Pick the one your rowid is pointing to

• What’s easy to determine– Does the row exist? No, if slot is missing or zero length.

– Does the slot length fit the partition’s row length?• Might be shorter in case of variable length data types.

• If you need to know what’s in this row– E.g. page can’t be read any more (inconsitent)

– No way around applying the table’s schema byte by byte

– Way beyond this 1 hour talk ;-)

Page 34: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Indirect / Incomplete Rows

• Row not fitting your schema?– Too short somehow?

• Strange looking slot length – way too large?

• High bit set in a DATA page slot length means– first 4 bytes in slot are no DATA

– Instead they’re a forward pointer

– In the form of another 4byte rowid (0xppppppss)

• An indirect row or an initial piece of a row obviously– Need to look up its next/remainder piece

– Located on so called REMAINDER pages

– Row can consist of multiple such pieces (32k max row length)

• What fun looking at such rows in their entirety!

Page 35: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Watch out for IPA!

• Row still not fitting our schema??

• DATA page header having strange value in its ‘page next’ field??

• Then we’re on an old version page!– What’s that again?

– And can this be combined even with row indirection (multi-piece rows)? Sure it can!

• All rows on such page don’t fit the table’s current schema– Instead they’re in the shape of a previous schema this table had

– Before potentially a whole series of ALTER TABLE statements

– These ALTERs have been performed in in-place fashion – no real changes yet

• Some real dirt work starting here, again at our partition page– There we learn about a series of secondary partition pages

– Keeping a memory of all outstanding in-place ALTERs

– Partition page’s pg_next field has the TBLSpace TBLSpace log. page# of the first such ALTER page

Page 36: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Compression

• Neither row indirection nor IPA can explain what my row’s looking like?– Moreover it does look like real garbage!

– And that slot length is an oddity – way too big

• Is this partition compressed?– Consult ‘oncheck -pt’ output, it would tell

• Is this row compressed?– The slot length field would have its second highest bit set

• Again next step would be our partition page– Slot 7 has the pointer to the current compression dictionary

– Also oncheck -pt should show this information

• Then uncompress the row using the uncompress dictionary– Not here, not now …

Page 37: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Questions?!?

11/16/2012 Template Presentation - Session Z99 37

Page 38: Ugif 10 2012 beauty ofifmxdiskstructs ugif

Beauty of

Informix Disk Structures

Andreas Legner

[email protected]