normalization denormalization adaptive...

1
daslab.seas.harvard.edu lab Adaptive Denormalization Zezhou (Alex) Liu, Stratos Idreos Normalization Denormalization Adaptive Denormalization less storage/update costs slow queries (joins) more storage/update costs fast queries (scans, no joins) less storage/update costs fast queries (scans) base data lies in a normalized state hot data is adaptively and partially denormalized on-demand enables the advantages of both normalization and denormalization future queries can benefit from faster query processing over the denormalized data still maintains the efficient space utilization, updates, and loading time characteristics found in normalized schemas In the time it takes to join inputs of 100 million rows in a normalized schema, we can perform a (logical) join by scanning over 10 billion rows in denormalized schema. The disparity is larger when a higher percentage of rows are selected. Queries enrich existing partial universal tables vertically and/or horizontally. Future queries replace joins over these data regions with scans over the partial universal tables Amortizes overhead & cost of denormalization T1 T2 Partial Universal Table Operates within the given memory budget by dropping regions of the partial universal table in response to memory pressures. Adaptively denormalizes only regions of the data as they are queried and only data that has not yet been denormalized by previous queries. Q1 Q2 Q3 Adaptive denormalization (AD) achieves significant benefits even when the required data is only partially denormalized. Adaptive denormalization (AD) improves significantly over repeated join patterns without penalizing the first join queries.

Upload: others

Post on 24-Jul-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Normalization Denormalization Adaptive …daslab.seas.harvard.edu/doc/undergrad-poster/SIGMOD-UG...Adaptive denormalization (AD) achieves significant benefits even when the required

daslab.seas.harvard.edu lab

AdaptiveDenormalizationZezhou (Alex)Liu,Stratos Idreos

Normalization Denormalization AdaptiveDenormalization

lessstorage/updatecostsslowqueries(joins)

morestorage/updatecostsfastqueries(scans,nojoins)

lessstorage/updatecostsfastqueries(scans)

• basedataliesinanormalizedstate• hotdataisadaptivelyandpartiallydenormalized on-demand

• enablestheadvantagesofbothnormalizationanddenormalization

• futurequeriescanbenefitfromfasterqueryprocessingoverthedenormalized data

• stillmaintainstheefficientspaceutilization,updates,andloadingtimecharacteristicsfoundinnormalizedschemas

Inthetimeittakestojoininputsof100millionrowsinanormalizedschema,wecanperforma(logical)joinbyscanningover10billionrowsindenormalizedschema.Thedisparityislargerwhenahigherpercentageofrowsareselected.

• Queriesenrichexistingpartialuniversaltablesverticallyand/orhorizontally.

• Futurequeriesreplacejoinsoverthesedataregionswithscansoverthepartialuniversaltables

• Amortizesoverhead&costofdenormalization

T1 T2 PartialUniversalTable

Operateswithinthegivenmemorybudgetbydroppingregionsofthepartialuniversaltableinresponsetomemorypressures.

Adaptivelydenormalizes onlyregionsofthedataastheyarequeriedandonlydatathathasnotyetbeendenormalized bypreviousqueries.

Q1 Q2 Q3

Adaptivedenormalization (AD)achievessignificantbenefitsevenwhentherequireddataisonlypartiallydenormalized.

Adaptivedenormalization (AD)improvessignificantlyoverrepeatedjoinpatternswithoutpenalizingthefirstjoinqueries.