Download - Stage1Ash.ppt
Memory Constrained Memory Constrained DBMS with UpdatesDBMS with Updates
Ashwini G. Rao
Guide
Prof. Krithi Ramamritham
April 13, 2023 Memory Constrained DBMSs with Updates
2
Outline of the talkOutline of the talk
Need for Handheld DBMS New Issues in Implementation Project Goals Review of Existing Work Compression in Storage Transaction Management Synchronization Current Implementation Status Conclusions and Future work
April 13, 2023 Memory Constrained DBMSs with Updates
3
HandheldsHandhelds
Small, Convenient, Carry anywhere Powerful
E.g. Simputer- 206MHz, 32MB SDRAM, 24 MB Flash memory, LCD display, Smart card
Applications Personal Info Management
E-dairy
Enterprise Applications Health-care, Micro-banking
April 13, 2023 Memory Constrained DBMSs with Updates
4
Need for Handheld DBMSNeed for Handheld DBMS
Handheld applications Volume of data is high Simple and Complex Queries
select, project, aggregate
ACID properties of transactions Require Data Privacy Need Synchronization
Database management techniques are needed to meet the above requirements
April 13, 2023 Memory Constrained DBMSs with Updates
5
New Issues in ImplementationNew Issues in Implementation
Handheld DBMS vs. Disk DBMS Handheld DB is Flash memory based
Disk read time is very small Storage model should consider small memory and
computation power Transaction management and synchronization have
to consider disconnections, mobility and communication cost
Handheld Operating System provides lesser facilities E.g. no multi-threading support in PalmOS
Better security measures are required as handhelds are easily stolen, damaged and lost
April 13, 2023 Memory Constrained DBMSs with Updates
6
Project GoalsProject Goals
Existing work Storage models Query processing & optimization Executor
My work Compression in Storage Transaction management Synchronization
April 13, 2023 Memory Constrained DBMSs with Updates
7
Existing Work – ReviewExisting Work – Review
Storage Management Aim at compactness in representation of
data Limited storage could preclude any
additional index Data model should try to incorporate some index
information
Query Processing Minimize writes to secondary storage Efficient usage of limited main memory
April 13, 2023 Memory Constrained DBMSs with Updates
8
Storage ManagementStorage Management
Existing storage models Flat Storage
Tuples are stored sequentially. Duplicates not eliminated
Pointer-based Domain Storage Values partitioned into domains which are sets
of unique values Tuples reference the attribute value by means
of pointers One domain shared among multiple attributes
April 13, 2023 Memory Constrained DBMSs with Updates
9
Storage Management (cont)Storage Management (cont)
10 20
3040
p
q
sr
IIT12
Flat Relation
CSE11
CSE11
CSE11CSE11
10
20
3040
p
q
rs
DomainRelation
4 bytes
IIT12
Flat Storage Domain Storage
In Domain Storage, pointer of size p (typically 4 bytes) points to the domain value. Can we further reduce the storage cost?
April 13, 2023 Memory Constrained DBMSs with Updates
10
ID Based StorageID Based Storage
Relation R ID Values
0
1
2
1
n
0
n
v0
v1
vn
Domain Values
Positional Indexing
April 13, 2023 Memory Constrained DBMSs with Updates
11
ID Based StorageID Based Storage
ID Storage An identifier for each of the domain values Store the smaller identifier instead of the
pointer Identifier is the positional value in the
domain table. Use it as an offset into the domain table
D domain values can be distinguished by identifiers of length log2D /8 bytes.
April 13, 2023 Memory Constrained DBMSs with Updates
12
ID Storage (cont)ID Storage (cont)
Extendable IDs are used. Length of the identifier grows and shrinks depending on the number of domain values
Starting with 1 byte identifiers, the length grows and shrinks.
To reduce reorganization of data, ID values are projected out from the rest of the relation and stored separately maintaining Positional Indexing.
April 13, 2023 Memory Constrained DBMSs with Updates
13
ID Storage (cont)ID Storage (cont) Ping Pong Effect
At the boundaries, there is reorganization of ID values when the identifier length changes Frequent insertions and deletions at the boundaries might result in a lot of reorganization Phenomena should be avoided
No deletion of Domain values Domain structure means a future insertion might reference the deleted value Do not delete a domain value even it is not referenced
Setting a threshold for deletion for domain values Delete only if number of deletions exceeds a threshold Increase the threshold when boundaries are being crossed
to reduce ping pong effect
April 13, 2023 Memory Constrained DBMSs with Updates
14
ID Storage (cont)ID Storage (cont) Primary Key-Foreign Key relationship
Primary key is a domain in itself IDs for primary key values Values present in child table are the corresponding primary
key IDs Projected foreign key column forms a Join Index
Figure: Primary Key-Foreign Key Join Index
0
1
2
1
n
0
n
v0
v1
vn
Parent TableRelation R
Child Table
April 13, 2023 Memory Constrained DBMSs with Updates
15
ID Storage (cont)ID Storage (cont)
ID based Storage wins over Domain Storage when pointer size > log2D /8
Relations in a small device do not have a very high cardinality Above condition true for most of the data.
Advantages of ID storage Considerable saving in storage cost. Efficient join between parent table and child
table
April 13, 2023 Memory Constrained DBMSs with Updates
16
Query ProcessingQuery Processing
Considerations Minimize writes to secondary storage Use Main memory as write buffer
Need for Left-deep Query Plan Reduce materialization in flash memory. If
absolutely necessary use main memory Bushy trees use materialization Left deep tree is most suited for pipelined
evaluation Right operand in a left-deep tree is always a
stored relation
April 13, 2023 Memory Constrained DBMSs with Updates
17
Query Processing (cont)Query Processing (cont)
Need for optimal memory allocation Using nested loop algorithms for every operator
ensures that minimum amount of memory used to execute the plan
Nested loop algorithms are inefficient Different devices come with different memory sizes Query plans should make efficient use of memory.
Memory must be optimally allocated among all operators
Need to generate the best query execution plan depending on the available memory
April 13, 2023 Memory Constrained DBMSs with Updates
18
Query Processing (cont)Query Processing (cont)
Operator evaluation schemes Different schemes for an operator Schemes conform to left-deep tree query
plan All have different memory usage and cost Cost of a scheme is the computation time
April 13, 2023 Memory Constrained DBMSs with Updates
19
Query Processing (cont)Query Processing (cont)
2-Phase optimizer Phase 1: Query is first optimized to get a query plan Phase 2: Division of memory among the operators Scheme for every operator is determined in phase 1
and remains unchanged after phase 2, memory allocation in phase 2 is on the basis of the cost functions of the schemes
Memory is assumed to be available for all the schemes, this may not be true for a resource constrained device
Traditional 2-phase optimization cannot be used
April 13, 2023 Memory Constrained DBMSs with Updates
20
Query Processing (cont)Query Processing (cont)
1-Phase optimizer Query optimizer is made memory cognizant Modified optimizer takes into account
division of memory among operators while choosing between plans
Ideally, 1-phase optimization should be done but the optimizer becomes complex.
April 13, 2023 Memory Constrained DBMSs with Updates
21
Query Processing (cont)Query Processing (cont)
Modified 2-phase optimizer Optimal division of memory involves the
decision of selecting the best scheme for every operator
Phase 1: Determine the optimal left-deep join order using
dynamic programming approach
Phase 2: Divide memory among the operators Choose the scheme for every operator depending
on the memory allocated
April 13, 2023 Memory Constrained DBMSs with Updates
22
Query Processing (cont)Query Processing (cont)
Memory allocation algorithms Exact memory allocation Heuristic memory allocation
Conclusions Response times highest with minimum
memory and least with maximum memory Computing power of the handheld affects
the response time in a big way Heuristic memory allocation differed from
exact algorithm in a few points only
April 13, 2023 Memory Constrained DBMSs with Updates
23
Compression in DBCompression in DB
Advantages Saves space Reduces read time and write time as less
data is processed Logging consumes less space and time
Disadvantages CPU intensive Competes with other CPU intensive DBMS
tasks. May slow down the DBMS
April 13, 2023 Memory Constrained DBMSs with Updates
24
Compression in Disk DBCompression in Disk DB
Main assumption The high disk read time compensates for the extra
time required for compression and decompression E.g. Let time taken to read 10 blocks of data from the
disk be 10ms. Let the time taken for compression and decompression be 5ms. After compression 10 blocks occupy only 1 block.
Processing time with compression/decompression = ( 1ms + 5ms) = 6ms
Handheld DB is Flash memory based Read time is very less. Above assumption is no
longer valid!!
April 13, 2023 Memory Constrained DBMSs with Updates
25
Compression in HandheldsCompression in Handhelds
Techniques can exploit high write time of flash memory
Logging Compressed records consume lesser log space Writing time is reduced Decompression done when recovery is initiated
Highly beneficial if failures are rare
Saves communication cost when log records have to be sent over the networkE.g., Transaction management
April 13, 2023 Memory Constrained DBMSs with Updates
26
Compression in Handhelds (cont)Compression in Handhelds (cont)
Data compression in Smart cards Consider Handheld with Smart card support Data stored in smart cards is accessed and
updated E.g., Personal database
Memory in smart cards is limited Compression will save space Data can be decompressed and processed in
the handheld
April 13, 2023 Memory Constrained DBMSs with Updates
27
Transaction ManagementTransaction Management
Ensure ACID properties of local and global transactions Local transaction - Update address book
entry in Simputer Global transaction - Transfer money from a
bank account to an epurse in a smart card attached to a Simputer
Issues Frequent disconnections, resource
constraints, mobility, loss or damage to handheld
April 13, 2023 Memory Constrained DBMSs with Updates
28
We will Look into Concurrency control Atomicity
Local Global
Consistency Durability
Transaction Management (cont)Transaction Management (cont)
April 13, 2023 Memory Constrained DBMSs with Updates
29
Concurrency controlConcurrency control
Concurrency in handhelds depends on Multi-tasking support from the handheld OS
E.g., Linux in Simputer, PalmOS
User requirements Several tasks may have to execute concurrently E.g., A periodic synchronization task, address book access
and an aggregation operation may run concurrently.
Strict 2PL, table level locks can be used Small number of concurrent processes Very few data conflicts Table level locking has small overhead and allows non
conflicting processes to continue execution
April 13, 2023 Memory Constrained DBMSs with Updates
30
AtomicityAtomicity Ensure the All or nothing property Local atomicity
E.g., enter name, email, phone number in the address book of Simputer
Shadow based update vs. In place update
Global atomicity E.g., In an epurse application the updates
are made at the bank's server, the Simputer and the smart card
2PC, optimizations to 2PC, 1PC
April 13, 2023 Memory Constrained DBMSs with Updates
31
Local atomicityLocal atomicity Shadow based update
Advantages No disk locality problem in handheld DB Simplifies recovery
Disadvantages Poorly adopted to Pointer based storage models Cost increases with increase in size of flash memory
In place update Uses WAL Accommodates Pointer based storage models Cost does not increase with size of flash memory Buffer replacement policy is Steal
Dirty blocks can be written to Smart card storage to avoid Undo
April 13, 2023 Memory Constrained DBMSs with Updates
32
Two Phase Commit (2PC) Most commonly used atomic commit
protocol Shortcomings in handheld scenario
Two rounds (decision and voting) of messages imposes high communication overhead
Requires the handheld to be connected during the voting and decision phase
Large number of forced writes
Optimizations to 2PC Presumed commit Presumed abort
Global atomicityGlobal atomicity
April 13, 2023 Memory Constrained DBMSs with Updates
33
One Phase Commit (1PC) Advantages
Only one round of messages- no voting phase Handheld can disconnect as soon as log records
are transferred to fixed server Lesser number of forced writes Transactions involving Smart card and Handheld
can use 1PC Disadvantages
Requires participants to enforce 2PL. Will work with weak levels of consistency under certain conditions. In heterogeneous environment it is difficult to control the local DBMS concurrency control policies.
Global atomicity (cont)Global atomicity (cont)
April 13, 2023 Memory Constrained DBMSs with Updates
34
Consistency and DurabilityConsistency and Durability
Consistency Local consistency can be ensured by defining
integrity constraints
Durability Either the changes of the transaction or enough
information about the changes are written to stable storage before the transaction commits
Network durability- transfer log records to a server on the fixed network.
1PC ensures network durability Pointer based logging Extended ephemeral logging
April 13, 2023 Memory Constrained DBMSs with Updates
35
SynchronizationSynchronization
Access data Anytime and Anywhere using the handheld Mobile sales person, Wireless ware house
Problem – Not possible to remain connected always
Solution- Replicate data in the handheld Download a copy of the data into the
handheld from the remote server and process it offline. Periodically merge the changes with the server
April 13, 2023 Memory Constrained DBMSs with Updates
36
Synchronization -IssuesSynchronization -Issues
Data replication can lead to conflicts Update-update, Update-delete, Unique key violation,
Integrity constraint violation
Maintain global consistency between replicated copies Strict consistency with Data partitioning Strict consistency with Reservation protocols or Leases
Efficient when data is rarely shared
Weak consistency with Eventual consistency leases restrictive when data is shared between many copies
Independently access and update data
only tentative commits possible
Actual commit when transaction is executed at the server
April 13, 2023 Memory Constrained DBMSs with Updates
37
Synchronization – Issues (cont)Synchronization – Issues (cont)
Application specific conflict detection and resolution Maximum flexibility
Device, network and backend agnostic XML, Unicode
Incremental maintenance Save communication cost
Download parts of relations, i.e., views
April 13, 2023 Memory Constrained DBMSs with Updates
38
Synchronization –Existing ModelsSynchronization –Existing Models
Publish Subscribe Model Three tier
Enterprise applications
Independent updates
Eventual consistency
Conflict detection, resolution and merge
PC to Handheld Model Two tier Personal information
April 13, 2023 Memory Constrained DBMSs with Updates
39
Publish Subscribe ModelPublish Subscribe Model
Eventual consistency model Merge replication in Win SQL CE, Oracle Lite
Publish Subscribe Process Publication and article Publishing Subscribing Subscription Synchronization Merging
April 13, 2023 Memory Constrained DBMSs with Updates
40
Publish Subscribe ArchitecturePublish Subscribe Architecture
Application SQL DB Engine SQL Database Client Agent Server Agent Merge Agent
Conflict Detection Conflict Resolution
Replication Provider SQL Server Database Communication Link
April 13, 2023 Memory Constrained DBMSs with Updates
41
Conflict Detection and ResolutionConflict Detection and Resolution
Conflict detection Row level tracking Associate RowID and Version with each row RowID is used to uniquely identify each row Version is used to check whether the a given row has
changed in the server
Conflict resolution A conflict resolution procedure is invoked when a
conflict is detected. The resolution procedure is created when the article is published
output can be server wins or handheld wins. Here the server always wins
April 13, 2023 Memory Constrained DBMSs with Updates
42
Row level trackingRow level tracking
Row ID VER TID
0
1
0 IIT
0 CSE
SERVER
Row ID VER TID
0
1
0 IIT
0 CSE
HANDHELD 1
Row ID VER TID
0
1
0 IIT
0 CSE
SERVER
Row ID VER TID
0
1
0 IIT
0 EE
Handheld1 changesCSE to EE
Row ID VER TID
0
1
0 IIT
0 CSE
HANDHELD 2
Row ID VER TID
0
1
0 IIT
0 ME
Handheld2 changesCSE to ME
STEP 1 STEP 2
April 13, 2023 Memory Constrained DBMSs with Updates
43
Row level tracking (cont)Row level tracking (cont)
Row ID VER TID
0
1
0 IIT
1 EE
SERVER mergeswith Handheld 1
Row ID VER TID
0
1
0 IIT
1 EE
HANDHELD 1
Row ID VER TID
0
1
0 IIT
1 EE
SERVER mergeswith Handheld 2
Row ID VER TID
0
1
0 IIT
1 EE
Handheld1
Row ID VER TID
0
1
0 IIT
1 EE
Handheld 2
STEP 3 STEP 4
Row ID VER TID
0
1
0 IIT
0 ME
HANDHELD 2
April 13, 2023 Memory Constrained DBMSs with Updates
44
Current Implementation StatusCurrent Implementation Status
Two Synchronization tools have been implemented for the Simputer First Sync tool assumes that no updates are
done in the handheld database Second sync tool is based on Merge
replication in Windows SQL CE. It allows independent updates in the handhelds.
April 13, 2023 Memory Constrained DBMSs with Updates
45
ConclusionsConclusions
Handheld DBMS techniques have to consider the resource constraints, mobility, frequent disconnections, and security aspects of the handheld
The techniques used for one component will influence the choice of the technique used in another component. There is a very strong interdependence between the components of the handheld DBMS
Techniques rejected for the disk environment may be explored in the handheld environment
April 13, 2023 Memory Constrained DBMSs with Updates
46
Future workFuture work
Enhance the Sync tool Transaction management component Recovery management component Concurrency control component Performance analysis of existing
compression techniques in handheld environment
April 13, 2023 Memory Constrained DBMSs with Updates
47
ReferencesReferences
April 13, 2023 Memory Constrained DBMSs with Updates
48
References (cont)References (cont)
April 13, 2023 Memory Constrained DBMSs with Updates
49
References (cont)References (cont)
April 13, 2023 Memory Constrained DBMSs with Updates
50
References (cont)References (cont)
April 13, 2023 Memory Constrained DBMSs with Updates
51
Thank YouThank You
April 13, 2023 Memory Constrained DBMSs with Updates
52
Query Processing (cont)Query Processing (cont) Benefit/Size of a scheme
Every scheme is characterized by a benefit/size ratio which represents its benefit per unit memory allocation
Minimum scheme for an operator is the scheme that has max. cost and min. memory
Assume n schemes s1, s2,…sn to implement an operator o
min(o)=smin
i, 1≤i≤n : Cost(si) ≤ Cost(smin) , Memory(si) ≥ Memory(smin)
smin is the minimum scheme for operator o Benefit(si)=Cost(smin) – Cost(si) Size(si) =Memory(si) – Memory(smin
April 13, 2023 Memory Constrained DBMSs with Updates
53
Query Processing (cont)Query Processing (cont)
Every operator is a collection of (size, benefit) points, n points for n schemes
Operator cost function is the collection of (cost, memory) points of its schemes
Benefit
(0,0)
(s1,b1)
(s2,b2)
Figure: (Size, Benefit) points for an operator
Size Memory
Cost
(0,c1)
(m2,c2)
(m3,c3)
(0,0)
Figure: Operator cost function