ogsa-dqp:service-based distributed query processing on the grid m.nedim alpdemir department of...
TRANSCRIPT
![Page 1: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/1.jpg)
OGSA-DQP:Service-Based OGSA-DQP:Service-Based Distributed Query Processing on Distributed Query Processing on
the Gridthe Grid
M.Nedim AlpdemirDepartment of Computer
ScienceUniversity of Manchester
![Page 2: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/2.jpg)
People, Places, FundingManchester
M Nedim AlpdemirAnastasios
GounarisNorman W PatonAlvaro A A FernandesRizos Sakellariou
Newcastle upon Tyne
Arijit MukherjeeJim Smith
Paul Watson
![Page 3: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/3.jpg)
Outline Requirements, Motivation Background OGSA-DQP Overview OGSA-DQP Details – How does it work OGSA-DQP Details – A query Example Summary
![Page 4: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/4.jpg)
Story: Ad-hoc data integration
p r o c es s q u er yq u er yp r o c es sq u er y
C lien t Ap p lic a tio n
L o c atio n 1 L o c atio n 2 L o c atio n 3
![Page 5: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/5.jpg)
Story: Middleware to the rescue
q u er y
C lien t Ap p lic a tio n
G r id I n f r as tr u c tu r e ( W eb S er v ic es , O G S A, Un ic o r e , C o n d o r )
D ata Ac c es s an d I n teg r a tio n F r am ew o r k s ( e .g . O G S A- D AI )
Hig h - lev e l D ata I n teg r a tio n S y s tem s ( e .g . O G S A- D Q P )
S em an tic S ear c h & D is c o v er y / S c h em a I n teg r a tio n / O n to lo g ies
D ata s o u r c e w r ap p er D ata s o u r c e w r ap p er D ata s o u r c e w r ap p er
![Page 6: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/6.jpg)
Motivation Pull by applications:
overwhelming amounts of semantically complex data in
very diverse, structurally dissimilar, and autonomous, geographically dispersed data sources
requiring computationally demanding analysis.
Push from technologies and infrastructure:
Web service impetus Grid services and
protocols that enable: dynamic resource
discovery dynamic resource
allocation and use Data access
standards
![Page 7: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/7.jpg)
The Grid needs DQP: Declarative, high-
level resource integration with implicit parallelism.
DQP-based solutions should in principle run faster than those manually coded.
DQP needs the Grid: Systematic access
to remote data and computational resources.
Dynamic resource discovery and allocation.
Mutual Benefits
![Page 8: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/8.jpg)
Background: mediator/wrapper middleware
A well-known approach to data integration is to use mediator/wrapper middleware.
wrappers reconcile differences and impose a global schema.
Often there is only one mediator in one fixed location, which is limiting.
DBMS
data
mediator
DBMS
data
Query Results
wrapper wrapper
![Page 9: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/9.jpg)
OGSA-DQP adopts mediator/wrapper
OGSA-DQP uses a middleware approach.
It can be seen as a mediator over OGSA-DAI wrappers.
It promises bottom-lines regarding:
efficiency: “leave to it to schedule in parallel”;
effectiveness: “leave to it to orchestrate your services”;
usability: “use it as a Grid data service”.
DBMS
data
OGSA-DQP
DBMS
data
Query Results
OGSA-DAI
OGSA-DAI
![Page 10: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/10.jpg)
Background: Service-based approaches
S e rv iceR e g is try
S e rv iceR e qu e s te r
S e rv icePro v ide r
F in d P u b lish
B in d
C re de n ti a l s , ro l e s
Q oS n e goti a ti on tok e n s
M on i tori n g , accou n ti n g ,n oti fi ca ti on
I n fra s tru ctu re
Virtualisation of ResourcesVirtualisation of Resources
A convenient cooperation model for Distributed Systems (e.g. Grid)A convenient cooperation model for Distributed Systems (e.g. Grid)
facilitate
Leads to
![Page 11: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/11.jpg)
Background: Grid Database Service (GDS)
Databases are made available on the Grid through integration with other Grid services, and provision of standard interfaces
Build upon OGSA to deliver high-level data management functionality for the Grid.
Seek to provide two classes of components: Data access components Data integration components
![Page 12: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/12.jpg)
Innovations OGSA-DQP dynamically allocates
evaluators to do work on behalf of the mediator.
This allows for runtime circumstances to be taken into account when the optimiser decides how to partition and schedule.
OGSA-DQP uses a parallel physical algebra: most mediator-based query processors do not.
![Page 13: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/13.jpg)
OGSA-DQP overview: A Service-based framework Service-based in two orthogonal sense:
Supports querying over data storagedata storage and analysis analysis resources made available as servicesservices
ConstructionConstruction of distributed query plans and their executionexecution over the grid are factored out as services
Uses the emerging standard for GDSs to provide consistent access to database metadata and to interact with databases on the Grid.
A query may refer to database (GDS) and computational services.
![Page 14: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/14.jpg)
OGSA-DQP Overview: Two Separate Grid Services
Grid Distributed Query Services (GDQSs) that:
interact with clients; find and retrieve service
descriptions; parse, compile, partition
and schedule the query execution over a union of distributed data sources.
The query plan is an orchestration of GQESs
Grid Query Evaluation Services (GQESs) that:
implement the physical query algebra;
implement the query execution model and semantics;
run a partition of a query execution plan generated by a GDQS;
interact with other GQESs/GDSs/WSs but not with clients.
![Page 15: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/15.jpg)
OGSA-DQP Overview: An illustration
G D Q S
GD S 1
GD S 2
W e bS e rv ice s
C lie n t
re s o u rce lis t
W S D L
D B S ch e m a
D B S ch e m a
G L o g ica lO pt im is e r G
Ph y s ica lO pt im is e r
G Pa rt it io n e r GS ch e du le r
G
OQ
L P
arse
r
Po la r* Q u e ry O pt im is e r En g in e
GD S Q u e r yR e qu e s t D oc .
O Q LQ u e r y
pr in t
e xc h an g e
h as h join
s c ane x ch a n g e
s c an
P1
P2
P3
GGQ ES 3
GGQ ES 2
GGQ ES 1
Distributed QueryExecution Engine
sub- pla n
sub- pla n
da ta b lock s
da ta b lock s
s u b-qu e ry
s u b-qu e ry
o pe ra t io n ca ll
<?xml version="1.0" encoding="UTF-8"?>
<GDQDataSourceList xmlns="http://dqp.ogsadai.org.uk/schema/gdqs" >
<importedDataSource>
<GDSFactoryHandle>http://phoebus.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle>
<GDSFactoryHandle>http://rpc676.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle>
<GDSFactoryHandle>http://mygrid.ncl.cs.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle>
</importedDataSource>
<importedService>
<wsdlURL>http://phoebus.cs.man.ac.uk:9090/axis/services/EntropyAnalyserService?WSDL</wsdlURL>
</importedService>
</GDQDataSourceList>
<?xml version="1.0" encoding="UTF-8"?>
<databaseSchema xmlns="">
<logicalSchema>
<table name="goterm">
<column fullName="goterm_id" length="32" name="id">
<sqlTypeName>varchar</sqlTypeName>
<sqlJavaTypeID>12</sqlJavaTypeID>
</column>
<column fullName="goterm_type" length="55" name="type">
<sqlTypeName>varchar</sqlTypeName>
<sqlJavaTypeID>12</sqlJavaTypeID>
</column>
<column fullName="goterm_name" length="255" name="name">
<sqlTypeName>varchar</sqlTypeName>
<sqlJavaTypeID>12</sqlJavaTypeID>
</column>
<primaryKey>
<columnFullName>id</columnFullName>
</primaryKey>
</table>
</logicalSchema>
<physicalSchema>
<hostMachine>130.88.192.230</hostMachine>
<database join_buffer_size="131072" max_join_size="4294967295">
<physTable avgRowLength="67" dataLength="766784" indexLength="126976" name="goterm" rowFormat="Dynamic" rows="11369"/>
</database>
</physicalSchema>
<GDSFHandle>http://phoebus.cs.man.ac.uk:9090/ogsa/services/ogsadai/GridDataServiceFactory</GDSFHandle>
</databaseSchema>
<?xml version="1.0" encoding="UTF-8"?>
<Partitions>
<Partition>
<evaluatorURI>http://130.88.198.195:9090/ogsa/services/ogsadai/dqp/GridQueryEvaluationFactory/hash-11025450-1076603541049</evaluatorURI>
<Operator operatorID="0" operatorType="TABLE_SCAN">
<tupleType>
<type>goterm</type>
<name>goterm.OID</name>
<type>string</type>
<name>goterm.id</name>
<type>string</type>
<name>goterm.type</name>
<type>string</type>
<name>goterm.name</name>
</tupleType>
<TABLE_SCAN>
<dataResourceName> goterms </dataResourceName>
<GDSHandle> http://130.88.192.230:9090/ogsa/services/ogsadai/GridDataServiceFactory/hash-31056514-1076603576481</GDSHandle>
<tableName> goterms </tableName>
<predicateExpr>
<predicate>
<comparativeOperator>LIKE</comparativeOperator>
<leftOperand name=" goterm.id" type="13"/>
<rightOperand name=" GO:0000%" type="16"/>
</predicate>
</predicateExpr>
</TABLE_SCAN>
</Operator> . . .
</Partition> . . .
</Partitions>
![Page 16: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/16.jpg)
Details: Setting up GDQS
Setup phase involves: Importing schemas of participating data sources
Results in a global schema (currently no schema integration support)
Importing WSDL documents of participating analysis services
Service metadata obtained (e.g. Port types, operations, types etc.)
Collecting computational resource metadata (implicit) Set-up strategy depends on the life-time model of
GDQS and GDSs GDQS instance is created per-client But it can serve multiple-queries This model avoids complexity of multi-user interactions
while ensures that the set-up cost is not high
![Page 17: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/17.jpg)
Details: A query Example
Given two DBMSs and one analysis tool (e.g., a WS):
proteinTerm to a GO Gene Ontology running as a remote mySQL DB,
protein to a GIMS Genome Warehouse running as a remote ODMG-compliant DB,
Blast (sequence alignment scoring);
We can obtain alignment scores for a sequence against proteins of a certain kind:
select p.proteinId, Blast(p.sequence)from protein p, proteinTerm twhere t.termId = ‘GO:0005942’ and
p.proteinId = t.proteinId
• Then, OGSA-DQP acts as an enactor of a declarative orchestration of services on the Grid:
index_scantermId=GO:0005942(proteinTerm)
table_scan(protein)
reduce reduce
hash_join(proteinId)
op_call(Blast)reduce
exchange exchange
exchange
3,4
2
1
5
![Page 18: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/18.jpg)
Details: what parallelisms do I benefit from?
3 4partitionedparallelism
pipelinedparallelism
table_scantermId=GO:0005942(proteinTerm)
table_scan(protein)
reduce reduce
hash_join(proteinId)
op_call(Blast)
reduce
exchange exchange
exchange
3,4
2
1 5
![Page 19: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/19.jpg)
GFactory G Q ES F
GFactory G Q ES F
GFactory G Q ES F
N 2
N 1
N3
GC lie n tGG D S
GG D S
G D Q
G D T
G D Q S
N 0G D S
GFactory G Q ES F
N4
p erform (Q u ery)1
cre a te S e rv ice
cre a te S e rv ice2
cre ate S e rvi ce
2
2
GG D S G Q ES 2
G D T
GG D S G Q ES 3
G D T
GG D S G Q ES 1
G D T
GG D S G Q ES 1
G D T
p erform (Q u ery S u b p la n )
p erform (Q u ery S u b p la n )
perform(Q
uer ySu bpl an)
3
s eq u en t ial_ s can
red u ce (p r o tein ID ,s eq u en ce )
s eq u en t ial_ s can ( ter m = 8 3 7 2 )
red u ce (p r o tein ID )
h as h _ jo in(p .p r o tein ID = t.p r o tein ID )
3
o p erat io n _ callb la s t(p .s eq u en ce)
red u ce (p .p r o tein ID , b la s t)
o p erat io n _ callb la s t(p .s eq u en ce)
red u ce (p .p r o tein ID , b la s t)
3
W e b S e rvi ce s (B L A S T)
resu lts
resu lts
resu lts
4
1144
Details: What goes on behind the scene?
![Page 20: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/20.jpg)
summary1. High-level data access and integration services are
needed if applications that have data with complex structure and complex semantics are to benefit from the Grid.
2. Standards for data access are emerging, and middleware products that are reference implementations of such standards are already available.
3. Distributed query processing technology is one approach to delivering (1.) given the availability of (2.).
4. OGSA-DQP is a service-based distributed query processor for the Grid that is:
Exposed as a service; Implemented as an orchestration of services. Improves on Grid portals when only retrieval and analysis is
involved.
![Page 21: OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649f455503460f94c66bea/html5/thumbnails/21.jpg)
where to find out moreOGSA-DQPGrid middleware to query distributed data
sources
www.ogsadai.org.uk/dqp OGSA-DAI
Grid middleware to interface with data(bases)
www.ogsadai.org.uk/