distributed query processing

of 87/87
Distributed Query Processing

Post on 30-Dec-2015

44 views

Category:

Documents

2 download

Embed Size (px)

DESCRIPTION

Distributed Query Processing. Agenda. Recap of query optimization Transformation rules for P&D systems Memoization Queries in heterogenous systems Query evaluation strategies Eddies Open-ended and stream-based queries. Introduction. Alternative ways of evaluating a given query - PowerPoint PPT Presentation

TRANSCRIPT

  • Distributed Query Processing

  • AgendaRecap of query optimizationTransformation rules for P&D systemsMemoizationQueries in heterogenous systems

    Query evaluation strategiesEddiesOpen-ended and stream-based queries

  • IntroductionAlternative ways of evaluating a given queryEquivalent expressionsDifferent algorithms for each operation (Chapter 13)

    Cost difference between a good and a bad way of evaluating a query can be enormousExample: performing a r X s followed by a selection r.A = s.B is much slower than performing a join on the same condition

    Need to estimate the cost of operationsDepends critically on statistical information about relations which the database must maintainNeed to estimate statistics for intermediate results to compute cost of complex expressions

  • Introduction (Cont.)Relations generated by two equivalent expressions have the same set of attributes and contain the same set of tuples, although their attributes may be ordered differently.

  • Introduction (Cont.)Generation of query-evaluation plans for an expression involves several steps:Generating logically equivalent expressionsUse equivalence rules to transform an expression into an equivalent one.Annotating resultant expressions to get alternative query plansChoosing the cheapest plan based on estimated costThe overall process is called cost based optimization.

  • Equivalence Rules1.Conjunctive selection operations can be deconstructed into a sequence of individual selections. 2.Selection operations are commutative. 3.Only the last in a sequence of projection operations is needed, the others can be omitted. Selections can be combined with Cartesian products and theta joins.(E1 X E2) = E1 E2 1(E1 2 E2) = E1 1 2 E2

  • Equivalence Rules (Cont.)5.Theta-join operations (and natural joins) are commutative. E1 E2 = E2 E16.(a) Natural join operations are associative: (E1 E2) E3 = E1 (E2 E3) (b) Theta joins are associative in the following manner: (E1 1 E2) 2 3 E3 = E1 2 3 (E2 2 E3) where 2 involves attributes from only E2 and E3.

  • Pictorial Depiction of Equivalence Rules

  • Equivalence Rules (Cont.)7.The selection operation distributes over the theta join operation under the following two conditions: (a) When all the attributes in 0 involve only the attributes of one of the expressions (E1) being joined. 0E1 E2) = (0(E1)) E2 (b) When 1 involves only the attributes of E1 and 2 involves only the attributes of E2. 1 E1 E2) = (1(E1)) ( (E2))

  • Equivalence Rules (Cont.)8.The projections operation distributes over the theta join operation as follows:(a) if involves only attributes from L1 L2: (b) Consider a join E1 E2. Let L1 and L2 be sets of attributes from E1 and E2, respectively. Let L3 be attributes of E1 that are involved in join condition , but are not in L1 L2, and let L4 be attributes of E2 that are involved in join condition , but are not in L1 L2.

  • Equivalence Rules (Cont.)The set operations union and intersection are commutative E1 E2 = E2 E1 E1 E2 = E2 E1 (set difference is not commutative).Set union and intersection are associative. (E1 E2) E3 = E1 (E2 E3) (E1 E2) E3 = E1 (E2 E3)The selection operation distributes over , and . (E1 E2) = (E1) (E2) and similarly for and in place of Also: (E1 E2) = (E1) E2 and similarly for in place of , but not for 12.The projection operation distributes over union L(E1 E2) = (L(E1)) (L(E2))

  • Multiple Transformations (Cont.)

  • Optimizer strategiesHeuristicApply the transformation rules in a specific order such that the cost converges to a minimum

    Cost basedSimulated annealingRandomized generation of candidate QEPProblem, how to guarantee randomness

  • Memoization Techniques How to generate alternative Query Evaluation Plans?Early generation systems centred around a tree representation of the plan Hardwired tree rewriting rules are deployed to enumerate part of the space of possible QEPFor each alternative the total cost is determinedThe best (alternatives) are retained for execution

    Problems: very large space to explore, duplicate plans, local maxima, expensive query cost evaluation.

    SQL Server optimizer contains about 300 rules to be deployed.

  • Memoization TechniquesHow to generate alternative Query Evaluation Plans?Keep a memo of partial QEPs and their cost. Use the heuristic rules to generate alternatives to built more complex QEPsr1 r2 r3 r4r1 r2r2 r3r3 r4r1 r4xLevel 1 plans r3 r3Level 2 plansLevel n plans r4r2 r1

  • Distributed Query ProcessingFor centralized systems, the primary criterion for measuring the cost of a particular strategy is the number of disk accesses.In a distributed system, other issues must be taken into account:The cost of a data transmission over the network.The potential gain in performance from having several sites process parts of the query in parallel.

  • Par &dist Query processingThe world of parallel and distributed query optimizationParallel world, invent parallel versions of well-known algorithms, mostly based on broadcasting tuples and dataflow driven computations

    Distributed world, use plan modification and coarse grain processing, exchange large chunks

  • Transformation rules for distributed systemsPrimary horizontally fragmented table:Rule 9: The union is commutative E1 E2 = E2 E1Rule 10: Set union is associative. (E1 E2) E3 = E1 (E2 E3)Rule 12: The projection operation distributes over union L(E1 E2) = (L(E1)) (L(E2))

    Derived horizontally fragmented table:The join through foreign-key dependency is already reflected in the fragmentation criteria

  • Transformation rules for distributed systemsVertical fragmented tables:Rules: Hint look at projection rules

  • Optimization in Par & DistrCost model is changed!!!Network transport is a dominant cost factor

    The facilities for query processing are not homogenous distributedLight-resource systems form a bottleneckNeed for dynamic load scheduling

  • Simple Distributed Join ProcessingConsider the following relational algebra expression in which the three relations are neither replicated nor fragmentedaccount depositor branch

    account is stored at site S1depositor at S2branch at S3For a query issued at site SI, the system needs to produce the result at site SI

  • Possible Query Processing StrategiesShip copies of all three relations to site SI and choose a strategy for processing the entire locally at site SI.Ship a copy of the account relation to site S2 and compute temp1 = account depositor at S2. Ship temp1 from S2 to S3, and compute temp2 = temp1 branch at S3. Ship the result temp2 to SI.Devise similar strategies, exchanging the roles S1, S2, S3Must consider following factors:amount of data being shipped cost of transmitting a data block between sitesrelative processing speed at each site

  • Semijoin StrategyLet r1 be a relation with schema R1 stores at site S1Let r2 be a relation with schema R2 stores at site S2Evaluate the expression r1 r2 and obtain the result at S1.

    1. Compute temp1 R1 R2 (r1) at S1.2. Ship temp1 from S1 to S2.3. Compute temp2 r2 temp1 at S24. Ship temp2 from S2 to S1.5. Compute r1 temp2 at S1. This is the same as r1 r2.

  • Formal DefinitionThe semijoin of r1 with r2, is denoted by:r1 r2 it is defined by:R1 (r1 r2) Thus, r1 r2 selects those tuples of r1 that contributed to r1 r2.In step 3 above, temp2=r2 r1.For joins of several relations, the above strategy can be extended to a series of semijoin steps.

  • Join Strategies that Exploit ParallelismConsider r1 r2 r3 r4 where relation ri is stored at site Si. The result must be presented at site S1.r1 is shipped to S2 and r1 r2 is computed at S2: simultaneously r3 is shipped to S4 and r3 r4 is computed at S4S2 ships tuples of (r1 r2) to S1 as they produced; S4 ships tuples of (r3 r4) to S1 Once tuples of (r1 r2) and (r3 r4) arrive at S1 (r1 r2) (r3 r4) is computed in parallel with the computation of (r1 r2) at S2 and the computation of (r3 r4) at S4.

  • Query plan generationApers-Aho-HopcroftHill-climber, repeatedly split the multi-join query in fragments and optimize its subqueries independently

    Apply centralized algorithms and rely on cost-model to avoid expensive query execution plans.

  • Query evaluators

  • Query evaluation strategyPipe-line query evaluation strategyCalled Volcano query processing modelStandard in commercial systems and MySQLBasic algorithm:Demand-driven evaluation of query tree.Operators exchange data in units such as recordsEach operator supports the following interfaces: open, next, closeopen() at top of tree results in cascade of opens down the tree.An operator getting a next() call may recursively make next() calls from within to produce its next answer.close() at top of tree results in cascade of close down the tree

  • Query evaluation strategyPipe-line query evaluation strategyEvaluation:Oriented towards OLTP applicationsGranule size of data interchangeItems produced one at a timeNo temporary filesChoice of intermediate buffer size allocationsQuery executed as one processGeneric interface, sufficient to add the iterator primitives for the new containers.CPU intensiveAmenable to parallelization

  • Query evaluation strategyMaterialized evaluation strategyUsed in MonetDBBasic algorithm: for each relational operator produce the complete intermediate result using materialized operandsEvaluation:Oriented towards decision support queriesLimited internal administration and dependenciesBasis for multi-query optimization strategyMemory intensiveAmendable for distributed/parallel processing

  • Heterogeneous Distributed DatabasesMany database applications require data from a variety of preexisting databases located in a heterogeneous collection of hardware and software platformsData models may differ (hierarchical, relational , etc.)Transaction commit protocols may be incompatibleConcurrency control may be based on different techniques (locking, timestamping, etc.)System-level details almost certainly are totally incompatible.A multidatabase system is a software layer on top of existing database systems, which is designed to manipulate information in heterogeneous databasesCreates an illusion of logical database integration without any physical database integration

  • AdvantagesPreservation of investment in existingHardware, system software, ApplicationsLocal autonomy and administrative control Allows use of special-purpose DBMSsStep towards a unified homogeneous DBMSFull integration into a homogeneous DBMS facesTechnical difficulties and cost of conversionOrganizational/political difficultiesOrganizations do not want to give up control on their dataLocal databases wish to retain a great deal of autonomy

  • Unified View of DataAgreement on a common data modelTypically the relational modelAgreement on a common conceptual schemaDifferent names for same relation/attributeSame relation/attribute name means different thingsAgreement on a single representation of shared data E.g. data types, precision, Character setsASCII vs EBCDICSort order variationsAgreement on units of measure Variations in namesE.g. Kln vs Cologne, Mumbai vs Bombay

  • Query ProcessingSchema translationWrite a wrapper for each data source to translate data to a global schemaWrappers must also translate updates on global schema to updates on local schema

    Limited query capabilitiesSome data sources allow only restricted forms of selectionsE.g. web forms, flat file data sourcesQueries have to be broken up and processed partly at the source and partly at a different siteRemoval of duplicate information when sites have overlapping informationDecide which sites to execute queryGlobal query optimization is limited

  • Eddies: Continuously Adaptive Query processingR. Avnur, J.M. HellersteinUCBACM Sigmod 2000

  • Problem StatementContext: large federated and shared-nothing databases

    Problem: assumptions made at query optimization rarely hold during execution

    Hypothesis: do away with traditional optimizers, solve it thru adaptation

    Focus: scheduling in a tuple-based pipeline query execution model

  • Problem Statement RefinementLarge scale systems are unpredictable, becauseHardware and workload complexity,bursty servers & networks, heterogenity, hardware characteristics

    Data complexity,Federated database often come without proper statistical summaries

    User Interface ComplexityOnline aggregation may involve user control

  • Research Laboratory settingTelegraph, a system designed to query all data available online

    River, a low level distributed record management system for shared-nothing databases

    Eddies, a scheduler for dispatching work over operators in a query graph

  • The IdeaRelational algebra operators consume a stream from multiple sources to produce a new stream

    A priori you dont now how selective- how fast- tuples are consumed/produced

    You have to adapt continuously and learn this information on the fly

    Adapt the order of processing based on these lessons

  • The Ideanextnextnextnextnextnext

  • The IdeaStandard method: derive a spanning tree over the query graph Pre-optimize a query plan to determine operator pairs and their algorithm, e.g. to exploit access paths

    Re-optimization a query pipeline on the fly requires careful state management, coupled withSynchronization barriersOperators have widely differing arrival rates for their operandsThis limits concurrency, e.g. merge-join algorithmMoments of symmetryAlgorithm provides option to exchange the role of the operands without too much complicationsE.g switching the role of R and S in a nested-loop join

  • Nested-loopRs

  • Join and sortingIndex-joins are asymmetric, you can not easily change their roleCombine index-join + operands as a unit in the process

    Sorting requires look-aheadMerge-joins are combined into unit

    Ripple joinsBreak the space into smaller pieces and solve the join operation for each piece individuallyThe piece crossings are moments of symmetry

  • The Ideanextnextnextnext

  • Rivers and EddiesEddies are tuple routers that distribute arriving tuples to interested operatorsWhat are efficient scheduling policies?Fixed strategy? Random ? Learning?

    Static EddiesDelivery of tuples to operators can be hardwired in the Eddie to reflect a traditional query execution plan

    Nave Eddie Operators are delivered tuples based on a priority queueIntermediate results get highest priority to avoid buffer congestion

  • Observations for selectionsExtended priority queue for the operatorsReceiving a tuple leads to a credit incrementReturning a tuple leads to a credit decrementPriority is determined by weighted lottery

    Nave Eddies exhibit back pressure in the tuple flow; production is limited by the rate of consumption at the output

    Lottery Eddies approach the cost of optimal ordering, without a need to a priory determine the order

    Lottery Eddies outperform heuristicsHash-use first, or Index-use first, Naive

  • ObservationsThe dynamics during a run can be controlled by a learning schemeSplit the processing in steps (windows) to re-adjust the weight during tuple delivery

    Initial delays can not be handled efficiently

    Research challenges:Better learning algorithms to adjust flowAggressive adjustmentsRemove pre-optimizationBalance hostile parallel environmentDeploy eddies to control degree of partitioning (and replication)

  • Database streams: You only get one chance to lookProf. Dr. Martin KerstenCWIAmsterdamMarch 2003

  • Database research topic listIndexing, Access methods, data structuresQuery/transaction processing and optimizationDistributed, heterogeneous, mobile databasesView maintenance/materialisationMining data, text, and webSemi-structured data, metadata and XMLTemporal, Spatial, Scientific, Statistical, Biological DBData warehousing and OLAPMiddleware, Workflow and SecurityHOT: XML, Semantic Web, P2P, Streams, Biological

  • OutlineIntroduction to Data Streaming Management System (DSMS)A reference architecture for a DSMSGrouping thousands of user queriesMerging and abstraction of streamsConclusions

  • The tranquil database sceneTraditional DBMS data stored in finite, persistent data sets, SQL-based applications to manage and access itOLTP-webapplicationAd-hocreportingRDBMSData entryapplication

  • The tranquil database sceneThe user community grows and MANY wants up-to-the-second (aggregate) information from the databaseOLTP-webapplicationAd-hocreportingRDBMSData entryapplication

  • The tranquil database sceneDatabase entry is taken over by a remote device which issues a high-volume of update transactionsOLTP-webapplicationAd-hocreportingRDBMSDataentryapplicationData entryapplication

  • The tranquil database sceneDatabase entry is taken over by MANY remote devices which issues a high-volume of update transactionsOLTP-webapplicationAdhocreportingRDBMSDataentryapplicationDataentryapplication

  • The tranquil database sceneDatabase solutions can not carry the weight OLTP-webapplicationAdhocreportingRDBMSDataentryapplicationDataentryapplication

  • Application domainsPersonalized financial tickersPersonalized information deliveryPersonalized environment control

    Business to business middelwareWeb-services application based on XML exchange

    Monitoring the real-world environment (pollution, traffic)Monitoring the data flow in an ISPMonitoring web-traffic behaviourMonitoring the load on a telecom switchMonitoring external news-feeds

  • Application visionRe-define the role of a DBMS in the complete application support lineIt manages a persistent storeIt handles and coordinates updatesIt supports ad-hoc querying

    Application servers carry the loadJ2EE, JBOS, Websphere,BEA,.

    Or partly redesign the DBMS

  • Application domainsPersonalized financial tickersPersonalized information deliveryPersonalized environment control

    Business to business middelwareWeb-services application based on XML exchange

    Monitoring the real-world environment (pollution, traffic)Monitoring the data flow in an ISPMonitoring web-traffic behaviourMonitoring the load on a telecom switchMonitoring external news-feeds

  • Application domainsPersonalizedPersonalizedPersonalized

    middelware on XML exchange

    MonitoringMonitoringMonitoringMonitoringMonitoringQUERYINGSTREAM UPDATEWEB SERVICES

  • Continuous queriesContinous query the user observes the changes made to the database through a queryQuery registration onceContinously up-to-date answers.ContinuousqueriesRDBMS

  • Data StreamsData streams The database is in constant bulk load modeThe update rate is often non-uniformThe entries are time-stampedThe source could be web-service, sensor, wrapped sourceDSMSDataentryapplication

  • DSMSData Stream Management Systems (DSMS) support high volume update streams and real-time response to ad-hoc complex queries.

    What can be salvaged from the DBMS core technology ?What should be re-designed from scratch ?DSMSDataentryapplication

  • DBMS versus DSMSPersistent relations

    Transaction oriented

    One-time queries

    Precise query answering

    Access plan determines physical database design

    Transient streams

    Query orientation

    Continuous queries

    Best-effort query answering

    Unpredictable data characteristics

  • Old technology to rescue?Many stream based applications are low-volume with simple queriesThus we can live with automatic query refresh

    Triggers are available for notification of changesThey are hooked up to simple changes to the datastoreThere is no technology to merge/optimize trigger groups

  • Outline of remainderQuery processing over multiple streams

    Organizing hundreds of ad-hoc queries

    Sensor-network based querying

  • A stream application[Widom] Consider a network traffic system for an ISP with customer link and backbone link and two streams keeping track of the IP traffic

  • A stream application[Widom] Consider a network traffic system for an ISP with customer link and backbone link and two streams keeping track of the IP traffic

    TPc(saddr, daddr, id, length, timestamp)TPb(saddr, daddr, id, length, timestamp)PTcPTb

  • A stream applicationQ1 Compute the load on the backbone link averaged over one minute period and notify the operator when the load exceeds a threshold T

    Select notifyoperator(sum(length))From PTbGroup By getminute(timestamp)Having sum(length) >T

    With low stream flow it could be handled with a DBMS trigger,Otherwise sample the stream to get an approximate answer

  • A stream applicationQ2 Find the fraction of traffic on the backbone link coming from the customer network to check cause of congestion.

    ( Select count(*) From PTc as C, PTb as B Where C.saddr = B.saddr and C.daddr=B.daddr and C.id=B.id ) /( Select count(*) From PTb)

    Both streams might require an unbounded resource to perform the join, which could be avoided with an approximate answer and synopsis

  • A stream applicationQ3 Monitor the 5% source-to-destination pairs in terms of traffic on the backbone.

    With Load As (Select saddr, daddr,sum(length) as traffic From PTb Group By saddr,daddr)Select saddr, daddr, trafficFrom Load as l1Where (Select count(*) From Load as l2 Where l2.traffic (Select 0.95*count(*) From Load)Order By Traffic This query contains blocking operators

  • STREAM architectureAnswer

  • Q1 Compute the load on the backbone link averaged over one minute period and notify the operator when the load exceeds a threshold T

    Select notifyoperator(sum(length))From PTbGroup By getminute(timestamp)Having sum(length) >T

    The answer store area simply needs an integer

  • Q2 Find the fraction of traffic on the backbone link coming from the customer network to check cause of congestion.

    ( Select count(*) From PTc as C, PTb as B Where C.saddr = B.saddr and C.daddr=B.daddr and C.id=B.id ) /( Select count(*) From PTb)

    The scratch area should maintain part of the two streams to implement the join. Or a complete list of saddr and daddr.

  • Joining two tablesRelARelBNested loop join

  • Joining two tablesRelARelBNested loop join

  • Joining two tablesRelARelBNested loop join

  • Joining two stream..PTbNested loop joinPTa..An unbounded store would be required

  • Joining two stream..PTbmerge joinPTa..If the streams are ordered a simple merge join is possibleWith limited resource requirements

  • Joining two stream..PTbJoin synopsisPTa..A statistical summary could provide an approximate answerhistogramhistogramwindow

  • Q3 Monitor the 5% source-to-destination pairs in terms of traffic on the backbone.

    With Load As (Select saddr, daddr,sum(length) as traffic From PTb Group By saddr,daddr)Select saddr, daddr, trafficFrom Load as l1Where (Select count(*) From Load as l2 Where l2.traffic (Select 0.95*count(*) From Load)Order By Traffic The scratch area should maintain part of the two streams to implement the join.

  • Finance[DeWitt] Consider a financial feed where thousands of clients can register arbitrary complex continues queries.XML stream queryingXML

  • FinanceQ5 Notify me whenever the price of KPN stock drops below 6 euro

    Select notifyUser(name, price)From ticker t1Where t1.name = KPN and t1.price < 6

  • FinanceQ5 Notify me whenever the price of KPN stock drops by 5% over the last hour

    Select notifyUser(name, price)From ticker t1,t2Where t1.name = KPN and t2.name= t1.name and getminutes(t1.timestamp-t2.timestamp)

  • FinanceQ6 Notify me whenever the price of KPN stock drops by 5% over the last hour and T-mobile remains constant

    Select notifyUser(name, price)From ticker t1,t2, t3,t4Where t1.name = KPN and t2.name= t1.name and getminutes(t1.timestamp-t2.timestamp)

  • Query signaturesTraditional SQL applications already use the notion of parameterised queries, I.e. some constants are replaced by a program variable.Subsequent calls use the same query evaluation plan

    In a DSMS we should recognize such queries as quick as possibleOrganize similar queries into a groupDecompose complex queries into smaller queriesManage the amount of intermediate store

  • FinanceQueries can be organized in groups using a signature and evaluation can be replaced by single multi-user request.

    Select notifyUser(name, price)From ticker t1Where t1.name = KPN and t1.price < 6

    ClientNameThresholdPrice192.871.12.1KPN6192.777.021ING12

  • FinanceQueries can be organized in groups using a signature and evalution can be replaced by single multi-user request.

    Select notifyUser(c.client, t1.name, t1.price)From ticker t1, clients cWhere t1.name = c.name and t1.price < c.price

    ClientNameThresholdPrice192.871.12.1KPN6192.777.021ING12

  • FinanceTimer-based queries call for a stream window with incremental evaluationMultiple requests can be organized by time-table and event detection methods provided by database triggers.

    Select notifyUser(name, price)From ticker t1,t2Where t1.name = KPN and t2.name= t1.name and getminutes(t1.timestamp-t2.timestamp)

  • FinanceComplex queries can be broken down into independent components

    Select notifyUser(name, price)From ticker t1,t2, t3,t4Where t1.name = KPN and t2.name= t1.name and getminutes(t1.timestamp-t2.timestamp)

  • FinanceIntermediate results should be materialized. Can be integrated in tradition query evaluation schemest1.timestamp=t3.timestamp and t2.timestamp=t4.timestamp

  • Sensor networks[Madden] Sensor networks are composed of thousands of small devices, interconnected through radio links. This network can be queried.Sensors have limited energySensors have limited reachabilitySensors can be crushed

  • Aggregate Queries Over Ad-Hoc Wireless Sensor Networks

  • Sensor networksQ7 Give me the traffic density on the A1 for the last hour

    Select avg(t.car)From traffic tWhere t.segment in (Select segment From roadsWhere name = A1)Group By gethour(t.timestamp)

  • Sensor networksThe sensors should organize themselves into a P2P infrastructureAn aggregate query is broadcasted through the networkEach Mote calculates a partial answer and sent it to its peersPeers aggregate the information to produce the final answer.ProblemsThe energy to broadcast some information is highTuples and partial results may be dropped

  • Conclusions and outlookData stream management technology require changes in our expectation of a DBMS functionality

    Queries not necessarily provide a precise answerQueries continue as long as we are interested in their approximate result

    The persistent store not necessarily contains a consistent and timeless view on the state of the database.

  • Conclusions and outlookDatastream management technology capitalizes upon proven DBMS technology

    DSMS provide a basis for ambient home settings, sensor networks, and globe spanning information systems

    It is realistic to expect that some of the properties to support efficient datastream management will become part of the major productsMulti query optimization techniques should be added.

  • LiteratureNiagaraCQ: A Scalable Contious Query System for Internet Databases, J. Chen, D.J. deWitt, F. Tian, Y. Wang, Wisconsin Univ.

    Streaming Queries over Streaming Data , Sirish Chandrasekaran, Michael J. Franklin, Univ Berkeley

    Continous Queries over Data Streams, S.Babu, J. Widom, Stanford University