database recovery mechanism for android devices · chapter 3 explains the le structure of sqlite...

Database Recovery Mechanism ForAndroid Devices

M.Tech Project Stage I ReportSubmitted in partial fulfillment of the requirements

for the degree ofMaster of Technology

by

Pratik PatodiRoll No: 10305917

Under the guidance ofProf. Deepak B. Phatak

Department of Computer Science and EngineeringIndian Institute of Technology, Bombay

Contents

1 Introduction 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Android 32.1 The Android Architecture . . . . . . . . . . . . . . . . . . . 32.2 Need For Database . . . . . . . . . . . . . . . . . . . . . . . 4

3 SQLite File Structure 63.1 SQLite database file . . . . . . . . . . . . . . . . . . . . . . 63.2 Schema table . . . . . . . . . . . . . . . . . . . . . . . . . . 73.3 Table b-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.4 Page structure and unallocated area . . . . . . . . . . . . . . 7

3.4.1 Page header . . . . . . . . . . . . . . . . . . . . . . . 83.4.2 Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Recovery Mechanism 114.1 Log Based Recovery . . . . . . . . . . . . . . . . . . . . . . 11

4.1.1 Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.1.2 Shadow Paging . . . . . . . . . . . . . . . . . . . . . 12

4.2 Recovery mechanism of SQLite. . . . . . . . . . . . . . . . . 124.2.1 Using Rollback Journal . . . . . . . . . . . . . . . . . 134.2.2 Using Write Ahead logging . . . . . . . . . . . . . . . 15

4.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 174.4 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Adaptive Logging 195.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . 215.3 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

6 Conclusion and Future Work 226.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226.2 Looking Ahead . . . . . . . . . . . . . . . . . . . . . . . . . 22

i

List of Figures

2.1 Android Architecture[1] . . . . . . . . . . . . . . . . . . . . 4

3.1 SQLite file structure and header first 32 bytes . . . . . . . . 73.2 SQLite b-tree structure . . . . . . . . . . . . . . . . . . . . . 83.3 SQLite page structure[2] . . . . . . . . . . . . . . . . . . . . 93.4 Internal Page Header Contents . . . . . . . . . . . . . . . . . 103.5 Internal Page Header Contents . . . . . . . . . . . . . . . . . 103.6 Leaf cell structure . . . . . . . . . . . . . . . . . . . . . . . . 10

4.1 Roolback Journal of SQLite[3] . . . . . . . . . . . . . . . . . 14

5.1 Two Update patterns[4] . . . . . . . . . . . . . . . . . . . . 20

ii

Abstract

Android devices are becoming popular due to their lower cost and increasedintegration with Google services. Open Source Andriod-SDK encourages thedevelopment of variety of applications. Considerable amount of data needsto be stored and organized for these applications. Flash Memory is welladapted to store confidential data and provides secure services in a mobiledevices. But, many cases of application failure can corrupt data in flashMemory. This report compares two most popular recovery algorithms usedin Database Management Systems, i.e. shadow paging and log based. Thereport also highlights the benefits of using shadow paging for flash memorydevices. An overview of adaptive logging approch is also provided in thereport.

Chapter 1

Introduction

Android is a Linux-based Open-Source Operating System developed by Googlefor mobile devices such as smartphones and tablets. Android owns 68% shareof global smartphone market by Q2, 2012 after being launched in 2007[5].Developers can develop applications for free or on a propriety basis for An-droid using Android SDK. Developers can use public APIs, without havingany knowledge of Android internals, to make use of device features and ser-vices provided by Android. Google has also established “Google Play ”storewhich allows developers to distribute their applications to Android users.

Almost all of these applications require to store and update considerableamount of data. The internal memory of Android devices are very less, forexample Aakash is having 512 MB, hence only small number of applica-tion can reside in the device and rest of the applications reside on the SDcard(Flash memory) which comes as an external storage. Flash memory isnon-volatile, shock-resistant, and uses little power. Although flash memoryis not as fast as RAM, it is hundreds of times faster than a hard disk in readoperations. These attractive features make flash memory one of the bestchoices for portable information systems.

Android comes with SQLite as its native database engine. SQLite storesthe data of every application in a separate file. This file cannot be accessedby any other application.

Storing and retrieving data is not enough, recovery is one of the mostimportant part of any database. It stores the information for each and everyupdate in the database and uses that information to return to the consistentstate, if something goes wrong. An important question arise here is that “which recovery mechanism should be used?”. This is because selecting anoptimal recovery mechanism can help in achieving the attractive features ofthe device or the memory in which database engine will work. It can notonly save space, but also can help in speeding the commit operation, as wellas the recovery process.

1

1.1 Overview

Chapter 2 gives an overview of the architecture of android system and thenhighlights the need of database in android system and the requirement thatdatabase should posses. Chapter 3 explains the file structure of SQLite (thenative database engine of android). Chapter 4 talks about the various recov-ery mechanisms, and about the recovery mechanism of SQLite (log basedrecovery mechanism). We also present the idea of implementing shadowpaging in SQLite, instead of using log based recovery mechanism. Chapter5 explain the concept of adaptive logging and its benefits.

2

Chapter 2

Android

2.1 The Android Architecture

Android is an Open Source software assemble of an Operating System, mid-dleware and key applications for mobile devices introduced by Google ca-pable of running multiple application programs. It is a complete operatingenvironment based upon the Linux kernel 2.6 that provides universal set ofpowerful Operation System, Comprehensive Library Set, abundant Multi-media User Interface and Phone Application.

Android platform is produced to make new and innovative mobile appli-cation program for the developers to make full use of all functions connectedto handset internet.The Android SDK provides the tools and APIs neces-sary to begin developing applications on the Android platform using theJava programming language.

Developing applications on Linux requires several standard libraries suchas libc. Android has rewritten some essential libraries. For example “libc”is replaced with a new library called “bionic”.[6]

Android is a multi-process system in which each application (and partsof the system) runs its own process. It consists of 4 layers as shown in theFigure 2.1.[7]

Linux Kernel- Android relies on Linux version 2.6 for core system ser-vices such as security, memory management, process management, networkstack, and driver model. The kernel also acts as an abstraction layer betweenthe hardware and rest of the software stack.

Libraries- Android includes a set of C/C++ libraries used by variouscomponents of the Android system. These capabilities are exposed to devel-opers through the Android application framework.

Android Runtime- Android includes a set of core libraries that pro-vides most of the functionality available in the core libraries of Java pro-gramming language. Every Android application runs in its own process,

3

Figure 2.1: Android Architecture[1]

with its own instance of the Dalvik virtual machine(VM). Dalvik has beenwritten so that a device can run multiple VMs efficiently. The Dalvik VMexecutes files in the Dalvik Executable (.dex) format which is optimized forminimal memory footprint.

Application Framework- By providing an open development platform,Android offers developers the ability to build extremely rich and innovativeapplications. Developers are free to take advantage of the device hardware,access location information, run background services, set alarms, add noti-fications to the status bar, and much more.

Applications- Android ships with a set of core applications includingan Email client, SMS program, calendar, maps, browser, contacts, etc.. Allapplications are written using the Java programming language.

2.2 Need For Database

Recent advances in processors, memory, storage, and connectivity have pavedthe way for next generation applications that are data-driven, whose datacan reside anywhere (i.e. on the server, desktop, devices, embedded in appli-cations), and that support access from anywhere (i.e. local, remote, over thenetwork, in connected and disconnected fashion). Memory sizes have gone

4

up and prices have come down significantly, enabling us to extend amountof data and number of application that can reside in a device.

Computation platforms have extended to small intelligent devices likecellphones, sensors, smartcards, PDAs, etc.. As new functionalities andfeatures are being added to these devices, there has been an increase innumber of applications that are being developed.

With advances in flash memory technology, large flash drives are availableat reasonable prices. Computers with 32 GB flash drives are making wayinto the market. Flash drives not only eliminate seek time and rotationallatency but they also consume significantly less power than conventional diskdrives, making them ideal for portable devices. All this naturally leads tomobile, disconnected, and specialized application architectures. Applicationcomponents can run on mobile platform when mobile devices is providedwith a database.

These new applications must be able to run on multiple tiers ranging fromdevices to servers to web and would benefit from various existing databasemechanisms. However, these database mechanisms (like query, indexing,persistence) must be unlocked from the traditional monolithic DBMSs andmade available as embeddable components (e.g. DLLs) that can be embed-ded within applications, thereby, enabling them to meet the requirementslike:[8]

� Embeddable in applications

� Small footprint

� Run on mobile devices

� Componentized DBMS

� Self managed DBMS

� In-Memory DBMS

� Portable databases

� No code in the database

� Synchronize with back-end data sources

5

Chapter 3

SQLite File Structure

The data-centric applications on android devices uses SQLite as a nativedatabase engine. SQLite is a small-sized database engine, largely used in em-bedded devices and local application software. The availability of portabledevices, such as smartphones, has been extended over the recent years andhas contributed to growing adaptation of SQLite. SQLite is an ANSI-C-based, Open Source database software. Unlike other types of database sys-tems, SQLite is usually used as “local-only” database that saves all theresults of database usage in a single file. When suited in an applicationprogram, SQLite engine performs direct read/write operations and managesthem into a file consisted of table, index, trigger, and view entries.

3.1 SQLite database file

The overall structure of a SQLite file is provided in Fig 3.1,and the file signa-ture is 0x53 51 4c 69 74 65 20 66 6f 72 6d 61 74 20 33 00 (16 bytes). SQLitefiles do not have a specific filename extension, and therefore, identifying aSQLite file is determined by the signature of target file.

The page that appears at the beginning of file is called header page. Theupper part of a header page comprises the database file’s header and its sig-nature. The lower part is the schema table containing the table informationof the database.

The next pages consists of b-trees, again largely separated as index b-treeand table b-tree where the latter stores data contents.

In SQLite, the elementary unit of store operation is a cell, while cellstructure and stored data content vary to internal and leaf pages. An internalpage is found in the middle section of tree, and the cells store the pointersto locate lower page. A leaf page is located at the bottom of tree, and thereis no lower page exists. The cells in this part contain database records.

6

Figure 3.1: SQLite file structure and header first 32 bytes

3.2 Schema table

The schema table in a header page contains information of table, index, andtrigger comprised in the SQLite file in an application software. The schematype indicates which of table, index, and trigger is the type of a specificschema. The schema name section contains generated schema names andthen the copy of each name, followed by root page numbers (1-byte integer),followed by a string which is a SQL query statement that is sent to SQLitewhen the schema is created.

There are four field types in SQLite, namely INTEGER, TEXT, BLOB,and NUMERIC. Datatypes including VARCHAR(125), BIG INT, and DATEthat exist in the standard SQL query are converted to one of the fourdatatypes in SQLite, which ever the system determines as the most rele-vant.

3.3 Table b-tree

The purpose of a table b-tree is to store actual data. Structure of a tableb-tree is provided in Fig. 3.2. An internal page contains pointers to thechild page, while actual data are stored in the leaf page.

3.4 Page structure and unallocated area

All pages in a SQLite file, both internal and leaf pages, share the structureillustrated in Fig. 3.3. A page header is followed by list of big endian integersat 2-byte offset values, which specifies the location of a cell that containsactual data. Cells are used in upward sequence from the bottom of the page.

7

Figure 3.2: SQLite b-tree structure

The area between a cell offset and a cell is referred to as free space, filledwith zeros when initially created. Beside free space, there can be unallocatedspace called free block within a leaf page, a place to store a cell until deletedand maintained to be used later for newly generated and assigned cell.

3.4.1 Page header

3.4.1.1 Internal page header

An internal page header shown in the Fig 3.4, consists of 12-byte, where thefirst byte is a page flag with value of 0Ö05. The first two offsets are 2-bytebig endian integers indicating the first free block offset. The third and fouthoffset represent the number of cells existing in the page, written in 2-byte bigendian integers. The specific page is a free page, when the number of cellsare described as zero. Offsets fivth and sixth are the offset of first appearedcell. The seventh offset describes the number of free blocks (3-bytes andsmaller), written in 1-byte integer. The final offsets from 8 to 11 are therightmost page number found within the lower section of the current page,present as 4-byte big endian.

8

Figure 3.3: SQLite page structure[2]

3.4.1.2 Leaf page header

A leaf page header has a structure similar to an internal page header. Theleaf page header consists of 8-bytes, where the first byte is a flag with valueof 0 Ö0D. The rest of the content are identical to that of an internal page,the only difference is that the last 4-byte information found in an internalpage is now omitted as there is no child page.

3.4.2 Cell

SQLite uses a cell as the elementary unit of stored information amount.Cells within a leaf page of a table b-tree include database records.

3.4.2.1 Internal cell

The cells in an internal page, as described in Fig. 3.5, are for maintaininga b-tree and have pointers to the child page (child page number). A key,the identification value of a cell, is in variable length integer. The first fourbytes in a cell is the child page number in big endian format.

9

Figure 3.4: Internal Page Header Contents

Figure 3.5: Internal Page Header Contents

Figure 3.6: Leaf cell structure

3.4.2.2 Leaf cell

In order to describe the records stored in a leaf cell, records are managedin three areas: cell header, record header, and record data area (see Fig3.6). In a cell header are the length of the cell, excluding the cell header,and rowID in variable length integers. A record header contains the valueof record header length in variable length integer format, followed by therecord data area in which byte length information of each field of a recordis stored.

10

Chapter 4

Recovery Mechanism

4.1 Log Based Recovery

Log based recovery is one of the most widely used mechanism . It stores thelogs for each and every update in the database and uses those logs to returnof the consistent state, if something goes wrong.

4.1.1 Log

A log is a sequential of log records, recording all the activities of the database.Logs can be of different types, logs which says start of transaction is rep-resented as < Ti start> where Ti is id of ith transaction, <Ti commit>.says transaction Ti has committed, and <Ti abort>. says transaction Ti hasaborted.[9] Similarly an update log consist of:

� id of transaction that did the update.

� attribute that is being updated.

� before-image and after-image of the attribute.

� Example <Ti, Xj, V1, V2>. where V1 and V2 are before and after valueof attribute Xj.

Whenever a transaction updates database, it is essential that, log recordsfor all the operations are generated and added to the log, before the databaseis modified. Once a log file is generated, we can perform the modificationto the database. Also, we have the ability to undo a modification that hasalready been done in the database, by using the old-image field in the logrecord.

For log record to be useful for recovery from system and disk failures,the log must reside in stable storage.

11

4.1.2 Shadow Paging

This recovery scheme does not require the use of a log. Shadow paging con-siders the database to be made up of a number of fixed-size disk pages (ordisk blocks)say, nfor recovery purposes. A page table with n entries is con-structed, where the ith entry points to the ith database page on disk. Thepage table is kept in main memory if it is not too large, and all references-reads or writesto database pages on disk go through it. When a transactionbegins executing, the current page tablewhose entries point to the most re-cent or current database pages on diskis copied into a shadow page table.The shadow page table is then saved on disk while the current page table isused by the transaction.

During transaction execution, the shadow page table is never modified.When a write-item operation is performed, a new copy of the modifieddatabase page is created, but the old copy of that page is not overwrit-ten. Instead, the new page is written elsewhereon some previously unuseddisk block. The current page table entry is modified to point to the new diskblock, whereas the shadow page table is not modified and continues to pointto the old unmodified disk block. For pages updated by the transaction, twoversions are kept. The old version is referenced by the shadow page table,and the new version by the current page table.

To recover from a failure during transaction execution, it is sufficientto free the modified database pages and to discard the current page table.The state of the database before transaction execution is available throughthe shadow page table, and that state is recovered by reinstating the shadowpage table. The database thus is returned to its state prior to the transactionthat was executing when the crash occurred, and any modified pages arediscarded. Committing a transaction corresponds to discarding the previousshadow page table. Since recovery involves neither undoing nor redoingdata items, this technique can be categorized as a NO-UNDO/NO-REDOtechnique for recovery.

4.2 Recovery mechanism of SQLite.

There are two recovery mechanism SQLite posses, Rollback journal andWrite ahead logging. Either of the two is used. Rollback journal is thedefault recovery mechanism.

12

4.2.1 Using Rollback Journal

4.2.1.1 Updating disk content

� Initially all the data reside in Disk, the os cache is empty, as well asthe main memory is also empty.

� Acquiring A Read Lock: When an transaction wants to read a datafrom the disk, first it has to obtain a shared lock on the os cache. A”shared” lock allows two or more database connections to read fromthe database file at the same time. But a shared lock prevents anotherdatabase connection from writing to the database file while we arereading it.

� Reading Information Out Of The Database : The data is first transferfrom the disk to the os cache and the user space(main memory) readsfrom os cache.

� Obtaining A Reserved Lock: A reserved lock is similar to a sharedlock in that both a reserved lock and shared lock allow other processesto read from the database file. A single reserve lock can coexist withmultiple shared locks from other processes. However, there can only bea single reserved lock on the database file. Hence only a single processcan be attempting to write to the database at one time.

� Creating A Rollback Journal File: Prior to making any changes to thedatabase file, SQLite first creates a separate rollback journal file andwrites into the rollback journal the original content of the databasepages that are to be altered. The rollback journal contains a smallheader that records the original size of the database file. So if a changecauses the database file to grow, we will still know the original size ofthe database. The page number is stored together with each databasepage that is written into the rollback journal.

� Changing Database Pages In User Space: After the original page con-tent has been saved in the rollback journal, the pages can be modifiedin user memory.

� Flushing The Rollback Journal File To Mass Storage :The next step isto flush the content of the rollback journal file to nonvolatile storage.He first flush writes out the base rollback journal content. Then theheader of the rollback journal is modified to show the number of pagesin the rollback journal. Then the header is flushed to disk.

� Obtaining An Exclusive Lock: Prior to making changes to the databasefile itself, we must obtain an exclusive lock on the database file. Ob-taining an exclusive lock is really a two-step process. First SQLite

13

obtains a ”pending” lock. Then it escalates the pending lock to an ex-clusive lock. A pending lock allows other processes that already havea shared lock to continue reading the database file. But it preventsnew shared locks from being established. The idea behind a pendinglock is to prevent writer starvation caused by a large pool of readers.

� Writing Changes To The Database File: Once an exclusive lock is held,we know that no other processes are reading from the database file andit is safe to write changes into the database file. Usually those changesonly go as far as the operating systems disk cache and do not make itall the way to mass storage.

� Flushing Changes To Mass Storage: Another flush must occur to makesure that all the database changes are written into nonvolatile storage.

� Deleting The Rollback Journal: After the database changes are allsafely on the mass storage device, the rollback journal file is deleted,as shown in Fig 4.1.

Figure 4.1: Roolback Journal of SQLite[3]

This is the instant where the transaction commits.

� Releasing The Lock: The last step in the commit process is to releasethe exclusive lock so that other processes can once again start accessingthe database file.

14

4.2.1.2 when something goes wrong: Rollback

� Suppose the power loss occurred while the database changes were beingwritten to disk. After power is restored, the situation might be thatpartial data is written on the disk. Hence we require a rollback.

� Hot Rollback Journals: A hot journal only exists when an earlier pro-cess was in the middle of committing a transaction when it crashed orlost power. A rollback journal is a ”hot” journal if all of the followingare true:

– The rollback journal exist.

– The rollback journal is not an empty file.

– There is no reserved lock on the main database file.

– The header of the rollback journal is well-formed and in particularhas not been zeroed out.

– The rollback journal does not contain the name of a master jour-nal file or if does contain the name of a master journal, then thatmaster journal file exists.

� Obtaining An Exclusive Lock On The Database: The first step towarddealing with a hot journal is to obtain an exclusive lock on the databasefile. This prevents two or more processes from trying to rollback thesame hot journal at the same time.

� Rolling Back Incomplete Changes: read the original content of pagesout of the rollback journal and write that content back to were it camefrom in the database file.

� Deleting The Hot Journal: After all information in the rollback journalhas been played back into the database file (and flushed to disk in casewe encounter yet another power failure), the hot rollback journal canbe deleted.

4.2.2 Using Write Ahead logging

The default method by which SQLite implements atomic commit and roll-back is a rollback journal. Alternately, a new ”Write-Ahead Log” option(hereafter referred to as ”WAL”) is available. There are advantages anddisadvantages to using WAL instead of a rollback journal.

15

4.2.2.1 Advantages:

� WAL is significantly faster in most scenarios.

� WAL provides more concurrency as readers do not block writers anda writer does not block readers. Reading and writing can proceedconcurrently.

� Disk I/O operations tends to be more sequential using WAL.

� WAL uses many fewer fsync() operations and is thus less vulnerableto problems on systems where the fsync() system call is broken.

4.2.2.2 Disadvantages:

� WAL normally requires that the VFS support shared-memory primi-tives. The built-in unix and windows VFSes support this but third-party extension VFSes for custom operating systems might not.

� All processes using a database must be on the same host computer.WAL does not work over a network file system.

� Transactions that involve changes against multiple ATTACHed databasesare atomic for each individual database, but are not atomic across alldatabases as a set.

� It is not possible to change the database page size after entering WALmode, either on an empty database or by using VACUUM or by restor-ing from a backup using the backup API. You must be in a rollbackjournal mode to change the page size.

� It is not possible to open read-only WAL databases. The openingprocess must have write privileges for ”-shm” wal-index shared memoryfile associated with the database, if that file exists, or else write accesson the directory containing the database file if the ”-shm” file does notexist.

� WAL might be very slightly slower than the traditional rollback-journalapproach in applications that do mostly reads and seldom write.

� There is the extra operation of checkpointing.

� WAL works best with smaller transactions. WAL does not work wellfor very large transactions. For transactions larger than about 100 MB,traditional rollback journal modes will likely be faster. For transactionsin excess of a GB, WAL mode may fail with an I/O or disk-full error.It is recommended that one of the rollback journal modes be used fortransactions larger than a 40 MB.

16

4.2.2.3 How WAL Works

The traditional rollback journal works by writing a copy of the originalunchanged database content into a separate rollback journal file and thenwriting changes directly into the database file. In the event of a crash orROLLBACK, the original content contained in the rollback journal is playedback into the database file to revert the database file to its original state.The COMMIT occurs when the rollback journal is deleted.

The WAL approach inverts this. The original content is preserved in thedatabase file and the changes are appended into a separate WAL file. ACOMMIT occurs when a special record indicating a commit is appended tothe WAL. Thus a COMMIT can happen without ever writing to the originaldatabase, which allows readers to continue operating from the original unal-tered database while changes are simultaneously being committed into theWAL. Multiple transactions can be appended to the end of a single WALfile.

4.2.2.4 Checkpointing

Of course, one wants to eventually transfer all the transactions that areappended in the WAL file back into the original database. Moving the WALfile transactions back into the database is called a ”checkpoint”.

Another way to think about the difference between rollback and write-ahead log is that in the rollback-journal approach, there are two primitiveoperations, reading and writing, whereas with a write-ahead log there arenow three primitive operations: reading, writing, and checkpointing.

By default, SQLite does a checkpoint automatically when the WAL filereaches a threshold size of 1000 pages.[10] Applications using WAL do nothave to do anything in order to for these checkpoints to occur. But if theywant to, applications can adjust the automatic checkpoint threshold. Orthey can turn off the automatic checkpoints and run checkpoints during idlemoments or in a separate thread or process.

4.3 Problem Statement

Recently, flash memory has become a critical component in building embed-ded systems or portable devices because it is non-volatile, shock-resistant,and uses little power. Although flash memory is not as fast as RAM, itis hundreds of times faster than a hard disk in read operations. These at-tractive features make flash memory one of the best choices for portableinformation systems. However, flash memory has two critical drawbacks.

� A segment, blocks of flash memory, need to be erased before theycan be rewritten. This is because flash memory technology only allows

17

individual bits to be toggled in one way for writes. The erase operationwrites ones or zeros into all the bits in a segment. This erase operationtakes much longer than a read or write operation.

� The life of each memory block is limited to 10,00,000 writes[11].

SQLite uses Rollback journaling or write ahead logging, which are update-in-place approach. In update-in-place approaches, both undo and redo logsshould be saved in a log file before the transaction commits in order to makefailure recovery possible. As the name suggest, Update-in-place rewrite thesame memory block, which doesn’t overcome the above two mention problemof flash memory.

4.4 Solution

In shadow paging approaches, a database maintains two images per pageduring the lifetime of a transaction: a current page and a shadow page.When a transaction starts, both pages are identical. The shadow page isnever changed over the duration of the transaction. The current page willbe changed when a transaction performs a write operation. To undo modifi-cation, it frees current page. To commit modification, it modifies all point-ers to old (shadow) page to now point to new (current) page, and frees theshadow page. If the system fails, then shadow pages are used to recover andset the status of the system as it was before the failure.[11] Implementingshadow paging can benefit in following ways:

� Shadow paging help in overcoming the above mention constraints offlash memory.

� Log based mechanism will not be needed, which eliminates the over-head of creating and storing logs resulting in optimization of storagespace.

� At the time of failure there is no need to first read from a log file andthen update database, instead just change the next page and previouspage pointers.

� Changing pointers rather than reading from file and then updatingdatabase, is much simpler and faster resulting in speedy recovery mech-anism.

18

Chapter 5

Adaptive Logging

Adaptive logging is a novel recovery method which can switch from log basedmethod (which is further reffered as ARIES) to shadow paging adaptivelyat a page level according to the update state of each page at run time.

It focuses on reducing the update log size in a way that different loggingmethods are applied dynamically on run time at a page level switching fromARIES to shadow paging according to the update state of for each page.

They have defined two update patterns called A-pattern (ARIES-favorableupdate pattern); and S-pattern (Shadow-paging-favorable update pattern).As page is used as the basic unit of recovery, first. Fig 5.1 shows these twopattern.

A-pattern updates a small area of a data page in a transaction, whichgenerates a small amount of log that is less than the data page size. If adata page is updated by an A-pattern update operation, it benefits from no-force policy of ARIES, since ARIES writes only log records without flushingthe data page to disk on commit time. When a transaction is composedof mostly A-pattern update operations, we call such a transaction as smallupdate transaction. Small update transaction is usually found in traditionalOLTP applications.[12]

S-pattern updates a large area of a data page or repeatedly updates thesame area of a data page in a transaction, which causes a large amount oflog that is larger than the data page size itself. If the update pattern of thetransaction is S-pattern, it can’t leverage the no-force policy, because thelog size is larger than the data page size. In this situation, shadow pagingis better since it doesn’t make any log record and its force policy needs lesspages to be written: updated data pages are less than log pages by ARIES.We call this type of transaction as large update transaction. Updating largeobjects is one typical example of the large update transaction.[13]

19

Figure 5.1: Two Update patterns[4]

5.1 Overview

ARIES is applied as a default recovery mechanism. When a transactionstarts and updates occurrs, corresponding log records are generated usingARIES way. Each data page manages the update state by counting the totalsize of log records generated from the page. If the size exceeds a predefinedthreshold value, the logging mechanism of the page is switched to shadowpaging. However, if the page lock of the page cannot be acquired due tothe other transaction’s operations, the logging mechanism will stay withARIES. When the switch occurs, a new copy of the page is created in thebuffer (selectively) as well as in the disk. Then every following update tothe page is reflected into the new copy, without generating log records untilthe transaction commits. For the same transaction if updates are on otherpages of which the threshold is not exceeded, for them log records are stillgenerated. When the transaction commits, all new copies of the data pagesfrom the transaction as well as log records are flushed to stable storage. Ifthe next transaction starts, the logging method for updates to every datapage will be ARIES. Shadow paging is only applied adaptively to the pageof which total size of log records exceeds the threshold value.

20

5.2 Problem statement

When SQLite is used in Android, we know that the database creates a singledatabase file and limited number of pages within that file. Whenever thereis an update request for some attributes, one of the two following scenariocan occur:

� The number of attributes are more than one and are present in thesame page. Request will update the large area of a data page.

� The request is for single attribute, resulting in repeatedly update ofsame area of a data page.

In both the scenarios, lots of logs are being generated, causing the size ofthe log file to grow even bigger than the size of data page itself.

Implementing shadow paging has a space drawback, as it additionallyrequires a space to maintain shadow pages.

5.3 Solution

Once shadow paging is implemented, we can further enhance recovery mech-anism by implementing adaptive logging, which will dynamically switch fromlog-based recovery to shadow paging, once the number of logs exceed the pre-defined threshold value. This predefined threshold value can be half of thesize of the page and further, experiments can be performed to observe theeffects of changing the threshold value to obtain an optimal threshold value.The benefits of using adaptive logging are:

� Shifting to shadow paging does not require further logging of records.

� Since the recovery method is selected at the page level, at the time offailure, rollback to the original state can be achieved much quicker.

21

Chapter 6

Conclusion and Future Work

6.1 Conclusion

Android Operating System provides a native database engine, SQLite. Inthis stage, we studied about the file structure of SQLite. We also hadan overview of different recovery techniques, including mechanisms whichSQLite uses, i.e. rollback journal and write ahead logging. The study showedthat shadow paging is better for flash memory. We also studied the adaptivelogging technique. According to this technique, when the threshold value isachieved, the recovery mechanism switches from log based to shadow paging.

6.2 Looking Ahead

In the next stage we plan to do following tasks:

� Implement shadow paging in SQLite.

� Implement adaptive logging technique to further optimize the recoverysystem.

22

Acknowledgements

I would like to express my deepest gratitude to my guide Prof. D. B. Phatak,for his patience and guidance throughout the project. I would also like tothank Nagesh Karmali for his continuous inputs for my work. I would alsolike to thank every one who supported me in this work.

23

Bibliography

[1] “Android Architecture.” http://http://www.elinux.org/Android Architecture.[Online; accessed 3-Oct-2012].

[2] S. Jeon, J. Bang, K. Byun, and S. Lee, “A recovery method of deletedrecord for sqlite database,” Personal and Ubiquitous Computing, vol. 16,pp. 707–715, 2012.

[3] “Roolback Journal.” http://www.sqlite.org/draft/atomiccommit.html.[Online; accessed 3-Oct-2012].

[4] Y.-S. Kim, H. Jin, and K.-G. Woo, “Adaptive logging for mobile de-vice,” Proc. VLDB Endow., vol. 3, pp. 1481–1492, Sept. 2010.

[5] J. Pepitone, “Android races past Apple in smartphone mar-ket share.” http://money.cnn.com/2012/08/08/technology/

smartphone-market-share/index.html, 2012. [Online; accessed12-Oct-2012].

[6] V. K. M. Bhupinder S. Mongia, “Reliable real-time applications on an-droid os.” http://www.ece.gatech.edu/~vkm/Android_Real_Time.

pdf, 2010.

[7] S. Lee, “Creating and using databases for android applications,” Aca-demic Journal, vol. Vol. 5, p. 99, June 2012.

[8] A. Nori, “Mobile and embedded databases,” in Proceedings of the 2007ACM SIGMOD international conference on Management of data, SIG-MOD ’07, (New York, NY, USA), pp. 1175–1177, ACM, 2007.

[9] Silberschatz, Korth, and Sudarshan, DataBase System Concepts, SixthEdition. McGraw Hill, 2006.

[10] “Write-Ahead Logging.” http://www.sqlite.org/draft/wal.html. [On-line; accessed 3-Oct-2012].

24

http://money.cnn.com/2012/08/08/technology/smartphone-market-share/index.html

http://money.cnn.com/2012/08/08/technology/smartphone-market-share/index.html

http://www.ece.gatech.edu/~vkm/Android_Real_Time.pdf

http://www.ece.gatech.edu/~vkm/Android_Real_Time.pdf

[11] S. Byun, S. Cho, and M. Huh, “Flash memory shadow paging scheme forportable computers: Design and performance evaluation,” ACM Trans.Database Syst., vol. 39, Oct. 2005.

[12] C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz, “Aries:a transaction recovery method supporting fine-granularity locking andpartial rollbacks using write-ahead logging,” ACM Trans. DatabaseSyst., vol. 17, pp. 94–162, Mar. 1992.

[13] R. A. Lorie, “Physical integrity in a large segmented database,” ACMTrans. Database Syst., vol. 2, pp. 91–104, Mar. 1977.

25

database recovery mechanism for android devices · chapter 3 explains the le structure of sqlite...

Documents