stage1ash.ppt

53
Memory Constrained Memory Constrained DBMS with Updates DBMS with Updates Ashwini G. Rao Guide Prof. Krithi Ramamritham

Upload: flashdomain

Post on 17-Jun-2015

285 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Stage1Ash.ppt

Memory Constrained Memory Constrained DBMS with UpdatesDBMS with Updates

Ashwini G. Rao

Guide

Prof. Krithi Ramamritham

Page 2: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

2

Outline of the talkOutline of the talk

Need for Handheld DBMS New Issues in Implementation Project Goals Review of Existing Work Compression in Storage Transaction Management Synchronization Current Implementation Status Conclusions and Future work

Page 3: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

3

HandheldsHandhelds

Small, Convenient, Carry anywhere Powerful

E.g. Simputer- 206MHz, 32MB SDRAM, 24 MB Flash memory, LCD display, Smart card

Applications Personal Info Management

E-dairy

Enterprise Applications Health-care, Micro-banking

Page 4: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

4

Need for Handheld DBMSNeed for Handheld DBMS

Handheld applications Volume of data is high Simple and Complex Queries

select, project, aggregate

ACID properties of transactions Require Data Privacy Need Synchronization

Database management techniques are needed to meet the above requirements

Page 5: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

5

New Issues in ImplementationNew Issues in Implementation

Handheld DBMS vs. Disk DBMS Handheld DB is Flash memory based

Disk read time is very small Storage model should consider small memory and

computation power Transaction management and synchronization have

to consider disconnections, mobility and communication cost

Handheld Operating System provides lesser facilities E.g. no multi-threading support in PalmOS

Better security measures are required as handhelds are easily stolen, damaged and lost

Page 6: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

6

Project GoalsProject Goals

Existing work Storage models Query processing & optimization Executor

My work Compression in Storage Transaction management Synchronization

Page 7: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

7

Existing Work – ReviewExisting Work – Review

Storage Management Aim at compactness in representation of

data Limited storage could preclude any

additional index Data model should try to incorporate some index

information

Query Processing Minimize writes to secondary storage Efficient usage of limited main memory

Page 8: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

8

Storage ManagementStorage Management

Existing storage models Flat Storage

Tuples are stored sequentially. Duplicates not eliminated

Pointer-based Domain Storage Values partitioned into domains which are sets

of unique values Tuples reference the attribute value by means

of pointers One domain shared among multiple attributes

Page 9: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

9

Storage Management (cont)Storage Management (cont)

10 20

3040

p

q

sr

IIT12

Flat Relation

CSE11

CSE11

CSE11CSE11

10

20

3040

p

q

rs

DomainRelation

4 bytes

IIT12

Flat Storage Domain Storage

In Domain Storage, pointer of size p (typically 4 bytes) points to the domain value. Can we further reduce the storage cost?

Page 10: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

10

ID Based StorageID Based Storage

Relation R ID Values

0

1

2

1

n

0

n

v0

v1

vn

Domain Values

Positional Indexing

Page 11: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

11

ID Based StorageID Based Storage

ID Storage An identifier for each of the domain values Store the smaller identifier instead of the

pointer Identifier is the positional value in the

domain table. Use it as an offset into the domain table

D domain values can be distinguished by identifiers of length log2D /8 bytes.

Page 12: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

12

ID Storage (cont)ID Storage (cont)

Extendable IDs are used. Length of the identifier grows and shrinks depending on the number of domain values

Starting with 1 byte identifiers, the length grows and shrinks.

To reduce reorganization of data, ID values are projected out from the rest of the relation and stored separately maintaining Positional Indexing.

Page 13: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

13

ID Storage (cont)ID Storage (cont) Ping Pong Effect

At the boundaries, there is reorganization of ID values when the identifier length changes Frequent insertions and deletions at the boundaries might result in a lot of reorganization Phenomena should be avoided

No deletion of Domain values Domain structure means a future insertion might reference the deleted value Do not delete a domain value even it is not referenced

Setting a threshold for deletion for domain values Delete only if number of deletions exceeds a threshold Increase the threshold when boundaries are being crossed

to reduce ping pong effect

Page 14: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

14

ID Storage (cont)ID Storage (cont) Primary Key-Foreign Key relationship

Primary key is a domain in itself IDs for primary key values Values present in child table are the corresponding primary

key IDs Projected foreign key column forms a Join Index

Figure: Primary Key-Foreign Key Join Index

0

1

2

1

n

0

n

v0

v1

vn

Parent TableRelation R

Child Table

Page 15: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

15

ID Storage (cont)ID Storage (cont)

ID based Storage wins over Domain Storage when pointer size > log2D /8

Relations in a small device do not have a very high cardinality Above condition true for most of the data.

Advantages of ID storage Considerable saving in storage cost. Efficient join between parent table and child

table

Page 16: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

16

Query ProcessingQuery Processing

Considerations Minimize writes to secondary storage Use Main memory as write buffer

Need for Left-deep Query Plan Reduce materialization in flash memory. If

absolutely necessary use main memory Bushy trees use materialization Left deep tree is most suited for pipelined

evaluation Right operand in a left-deep tree is always a

stored relation

Page 17: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

17

Query Processing (cont)Query Processing (cont)

Need for optimal memory allocation Using nested loop algorithms for every operator

ensures that minimum amount of memory used to execute the plan

Nested loop algorithms are inefficient Different devices come with different memory sizes Query plans should make efficient use of memory.

Memory must be optimally allocated among all operators

Need to generate the best query execution plan depending on the available memory

Page 18: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

18

Query Processing (cont)Query Processing (cont)

Operator evaluation schemes Different schemes for an operator Schemes conform to left-deep tree query

plan All have different memory usage and cost Cost of a scheme is the computation time

Page 19: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

19

Query Processing (cont)Query Processing (cont)

2-Phase optimizer Phase 1: Query is first optimized to get a query plan Phase 2: Division of memory among the operators Scheme for every operator is determined in phase 1

and remains unchanged after phase 2, memory allocation in phase 2 is on the basis of the cost functions of the schemes

Memory is assumed to be available for all the schemes, this may not be true for a resource constrained device

Traditional 2-phase optimization cannot be used

Page 20: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

20

Query Processing (cont)Query Processing (cont)

1-Phase optimizer Query optimizer is made memory cognizant Modified optimizer takes into account

division of memory among operators while choosing between plans

Ideally, 1-phase optimization should be done but the optimizer becomes complex.

Page 21: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

21

Query Processing (cont)Query Processing (cont)

Modified 2-phase optimizer Optimal division of memory involves the

decision of selecting the best scheme for every operator

Phase 1: Determine the optimal left-deep join order using

dynamic programming approach

Phase 2: Divide memory among the operators Choose the scheme for every operator depending

on the memory allocated

Page 22: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

22

Query Processing (cont)Query Processing (cont)

Memory allocation algorithms Exact memory allocation Heuristic memory allocation

Conclusions Response times highest with minimum

memory and least with maximum memory Computing power of the handheld affects

the response time in a big way Heuristic memory allocation differed from

exact algorithm in a few points only

Page 23: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

23

Compression in DBCompression in DB

Advantages Saves space Reduces read time and write time as less

data is processed Logging consumes less space and time

Disadvantages CPU intensive Competes with other CPU intensive DBMS

tasks. May slow down the DBMS

Page 24: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

24

Compression in Disk DBCompression in Disk DB

Main assumption The high disk read time compensates for the extra

time required for compression and decompression E.g. Let time taken to read 10 blocks of data from the

disk be 10ms. Let the time taken for compression and decompression be 5ms. After compression 10 blocks occupy only 1 block.

Processing time with compression/decompression = ( 1ms + 5ms) = 6ms

Handheld DB is Flash memory based Read time is very less. Above assumption is no

longer valid!!

Page 25: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

25

Compression in HandheldsCompression in Handhelds

Techniques can exploit high write time of flash memory

Logging Compressed records consume lesser log space Writing time is reduced Decompression done when recovery is initiated

Highly beneficial if failures are rare

Saves communication cost when log records have to be sent over the networkE.g., Transaction management

Page 26: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

26

Compression in Handhelds (cont)Compression in Handhelds (cont)

Data compression in Smart cards Consider Handheld with Smart card support Data stored in smart cards is accessed and

updated E.g., Personal database

Memory in smart cards is limited Compression will save space Data can be decompressed and processed in

the handheld

Page 27: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

27

Transaction ManagementTransaction Management

Ensure ACID properties of local and global transactions Local transaction - Update address book

entry in Simputer Global transaction - Transfer money from a

bank account to an epurse in a smart card attached to a Simputer

Issues Frequent disconnections, resource

constraints, mobility, loss or damage to handheld

Page 28: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

28

We will Look into Concurrency control Atomicity

Local Global

Consistency Durability

Transaction Management (cont)Transaction Management (cont)

Page 29: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

29

Concurrency controlConcurrency control

Concurrency in handhelds depends on Multi-tasking support from the handheld OS

E.g., Linux in Simputer, PalmOS

User requirements Several tasks may have to execute concurrently E.g., A periodic synchronization task, address book access

and an aggregation operation may run concurrently.

Strict 2PL, table level locks can be used Small number of concurrent processes Very few data conflicts Table level locking has small overhead and allows non

conflicting processes to continue execution

Page 30: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

30

AtomicityAtomicity Ensure the All or nothing property Local atomicity

E.g., enter name, email, phone number in the address book of Simputer

Shadow based update vs. In place update

Global atomicity E.g., In an epurse application the updates

are made at the bank's server, the Simputer and the smart card

2PC, optimizations to 2PC, 1PC

Page 31: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

31

Local atomicityLocal atomicity Shadow based update

Advantages No disk locality problem in handheld DB Simplifies recovery

Disadvantages Poorly adopted to Pointer based storage models Cost increases with increase in size of flash memory

In place update Uses WAL Accommodates Pointer based storage models Cost does not increase with size of flash memory Buffer replacement policy is Steal

Dirty blocks can be written to Smart card storage to avoid Undo

Page 32: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

32

Two Phase Commit (2PC) Most commonly used atomic commit

protocol Shortcomings in handheld scenario

Two rounds (decision and voting) of messages imposes high communication overhead

Requires the handheld to be connected during the voting and decision phase

Large number of forced writes

Optimizations to 2PC Presumed commit Presumed abort

Global atomicityGlobal atomicity

Page 33: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

33

One Phase Commit (1PC) Advantages

Only one round of messages- no voting phase Handheld can disconnect as soon as log records

are transferred to fixed server Lesser number of forced writes Transactions involving Smart card and Handheld

can use 1PC Disadvantages

Requires participants to enforce 2PL. Will work with weak levels of consistency under certain conditions. In heterogeneous environment it is difficult to control the local DBMS concurrency control policies.

Global atomicity (cont)Global atomicity (cont)

Page 34: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

34

Consistency and DurabilityConsistency and Durability

Consistency Local consistency can be ensured by defining

integrity constraints

Durability Either the changes of the transaction or enough

information about the changes are written to stable storage before the transaction commits

Network durability- transfer log records to a server on the fixed network.

1PC ensures network durability Pointer based logging Extended ephemeral logging

Page 35: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

35

SynchronizationSynchronization

Access data Anytime and Anywhere using the handheld Mobile sales person, Wireless ware house

Problem – Not possible to remain connected always

Solution- Replicate data in the handheld Download a copy of the data into the

handheld from the remote server and process it offline. Periodically merge the changes with the server

Page 36: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

36

Synchronization -IssuesSynchronization -Issues

Data replication can lead to conflicts Update-update, Update-delete, Unique key violation,

Integrity constraint violation

Maintain global consistency between replicated copies Strict consistency with Data partitioning Strict consistency with Reservation protocols or Leases

Efficient when data is rarely shared

Weak consistency with Eventual consistency leases restrictive when data is shared between many copies

Independently access and update data

only tentative commits possible

Actual commit when transaction is executed at the server

Page 37: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

37

Synchronization – Issues (cont)Synchronization – Issues (cont)

Application specific conflict detection and resolution Maximum flexibility

Device, network and backend agnostic XML, Unicode

Incremental maintenance Save communication cost

Download parts of relations, i.e., views

Page 38: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

38

Synchronization –Existing ModelsSynchronization –Existing Models

Publish Subscribe Model Three tier

Enterprise applications

Independent updates

Eventual consistency

Conflict detection, resolution and merge

PC to Handheld Model Two tier Personal information

Page 39: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

39

Publish Subscribe ModelPublish Subscribe Model

Eventual consistency model Merge replication in Win SQL CE, Oracle Lite

Publish Subscribe Process Publication and article Publishing Subscribing Subscription Synchronization Merging

Page 40: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

40

Publish Subscribe ArchitecturePublish Subscribe Architecture

Application SQL DB Engine SQL Database Client Agent Server Agent Merge Agent

Conflict Detection Conflict Resolution

Replication Provider SQL Server Database Communication Link

Page 41: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

41

Conflict Detection and ResolutionConflict Detection and Resolution

Conflict detection Row level tracking Associate RowID and Version with each row RowID is used to uniquely identify each row Version is used to check whether the a given row has

changed in the server

Conflict resolution A conflict resolution procedure is invoked when a

conflict is detected. The resolution procedure is created when the article is published

output can be server wins or handheld wins. Here the server always wins

Page 42: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

42

Row level trackingRow level tracking

Row ID VER TID

0

1

0 IIT

0 CSE

SERVER

Row ID VER TID

0

1

0 IIT

0 CSE

HANDHELD 1

Row ID VER TID

0

1

0 IIT

0 CSE

SERVER

Row ID VER TID

0

1

0 IIT

0 EE

Handheld1 changesCSE to EE

Row ID VER TID

0

1

0 IIT

0 CSE

HANDHELD 2

Row ID VER TID

0

1

0 IIT

0 ME

Handheld2 changesCSE to ME

STEP 1 STEP 2

Page 43: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

43

Row level tracking (cont)Row level tracking (cont)

Row ID VER TID

0

1

0 IIT

1 EE

SERVER mergeswith Handheld 1

Row ID VER TID

0

1

0 IIT

1 EE

HANDHELD 1

Row ID VER TID

0

1

0 IIT

1 EE

SERVER mergeswith Handheld 2

Row ID VER TID

0

1

0 IIT

1 EE

Handheld1

Row ID VER TID

0

1

0 IIT

1 EE

Handheld 2

STEP 3 STEP 4

Row ID VER TID

0

1

0 IIT

0 ME

HANDHELD 2

Page 44: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

44

Current Implementation StatusCurrent Implementation Status

Two Synchronization tools have been implemented for the Simputer First Sync tool assumes that no updates are

done in the handheld database Second sync tool is based on Merge

replication in Windows SQL CE. It allows independent updates in the handhelds.

Page 45: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

45

ConclusionsConclusions

Handheld DBMS techniques have to consider the resource constraints, mobility, frequent disconnections, and security aspects of the handheld

The techniques used for one component will influence the choice of the technique used in another component. There is a very strong interdependence between the components of the handheld DBMS

Techniques rejected for the disk environment may be explored in the handheld environment

Page 46: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

46

Future workFuture work

Enhance the Sync tool Transaction management component Recovery management component Concurrency control component Performance analysis of existing

compression techniques in handheld environment

Page 47: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

47

ReferencesReferences

Page 48: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

48

References (cont)References (cont)

Page 49: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

49

References (cont)References (cont)

Page 50: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

50

References (cont)References (cont)

Page 51: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

51

Thank YouThank You

Page 52: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

52

Query Processing (cont)Query Processing (cont) Benefit/Size of a scheme

Every scheme is characterized by a benefit/size ratio which represents its benefit per unit memory allocation

Minimum scheme for an operator is the scheme that has max. cost and min. memory

Assume n schemes s1, s2,…sn to implement an operator o

min(o)=smin

i, 1≤i≤n : Cost(si) ≤ Cost(smin) , Memory(si) ≥ Memory(smin)

smin is the minimum scheme for operator o Benefit(si)=Cost(smin) – Cost(si) Size(si) =Memory(si) – Memory(smin

Page 53: Stage1Ash.ppt

April 13, 2023 Memory Constrained DBMSs with Updates

53

Query Processing (cont)Query Processing (cont)

Every operator is a collection of (size, benefit) points, n points for n schemes

Operator cost function is the collection of (cost, memory) points of its schemes

Benefit

(0,0)

(s1,b1)

(s2,b2)

Figure: (Size, Benefit) points for an operator

Size Memory

Cost

(0,c1)

(m2,c2)

(m3,c3)

(0,0)

Figure: Operator cost function