on evaluating and publishing data concerns for data as a service
DESCRIPTION
TRANSCRIPT
APSCC 2010, Hangzhou 9 Dec 2010 1
On Evaluating and Publishing Data Concerns for Data as a Service
Hong-Linh Truong and Schahram Dustdar
Distributed Systems Group, Vienna University of Technology
[email protected]://www.infosys.tuwien.ac.at/Staff/truong
APSCC 2010, Hangzhou 9 Dec 2010 2
Overview
Motivation and background Data concern-aware service engineering
process A framework for evaluating and publishing QoD
of DaaS Experiments Conclusions and future work
APSCC 2010, Hangzhou 9 Dec 2010 3
The rise of DaaS
Web services technologies and the cloud computing model foster the concept of data/information as a service (DaaS) Provide data capabilities rather than provide
computation or software Providing DaaS is an increasing trend
In both business and e-science environments Bio data, weather data, company balance sheets,
etc., via Web services But data is associated with many data concerns
Quality of data, privacy, licensing, etc.
4
Examples of DaaS Source: http://www.undata-api.org/ Source:
http://www.strikeiron.com/Catalog/StrikeIronServices.aspx
Source: http://docs.gnip.com/w/page/23722723/Introduction-to-Gnip
APSCC 2010, Hangzhou 9 Dec 2010 5
Motivation: the role of data concerns
Data consumers/data integrators need “data concerns” to use data in a right way: Is the data good? Or free? to filter irrelevant results: avoid information
overloading to save processing time/energy and storage
Both DaaS service and data providers need to evaluate and provide data concerns
Should we perform data composition?
APSCC 2010, Hangzhou 9 Dec 2010 6
Motivation: service provider versus data provider
The DaaS service provider is separated from the data provider
DaaS
Consumer
DaaS
Sensor
DaaS
Consumer Service provider Data provider
privacy1
quality1
quality2
privacy2
the lack of techniques and tools to deal with the evaluation and publishing of data concerns for DaaS
APSCC 2010, Hangzhou 9 Dec 2010 7
Example: DaaS provider =! data provider
Source: http://www.infochimps.org
APSCC 2010, Hangzhou 9 Dec 2010 8
Background: data resources
Data items → data resources → DaaS APIs → consumers
DaaS and data providers have the right to publish the data
Data items
Data items
Data items
Data resource
DaaS
Data resource Data resource
Data resourceData resource
Ser
vice
AP
IsConsumer
Consumer
SOAP/REST
9
Backgroud: diverse concerns associated with service and data
Hong-Linh Truong, Schahram Dustdar "On Analyzing and Specifying Concerns for Data as a Service" , The 2009 Asia-Pacific Services Computing Conference (IEEE APSCC 2009), (c) IEEE Computer Society, December 7-11, 2009, Biopolis, Singapore.
APSCC 2010, Hangzhou 9 Dec 2010 10
Data concern-aware service engineering process Typical activities
for data wrapping and publishing
Typical activities for data updating &
retrieval
APSCC 2010, Hangzhou 9 Dec 2010 11
Wrapping, selecting, and updating data in DaaS
Typically different strategies for structured data and unstructured data – not our main work
We just reuse existing techniques in order to plug our data concern evaluation and publishing techniques
APSCC 2010, Hangzhou 9 Dec 2010 12
Evaluating data concerns (1)
Based on three concepts:
evaluation scope, evaluation modes and integration model
Evaluation scopes – enable fine-grained evaluation
Three scopes: data resource, service operation, and service as a whole
Evaluation modes – suitable for different types of data
Off-line (before the access to data) and on-the-fly (when the data is requested)
Integration models – suitable for different tool integration strategies Push and pull data concerns Pass-by-value versus pass-by-reference to data concerns
evaluation tools
APSCC 2010, Hangzhou 9 Dec 2010 13
Evaluating data concerns (2)
Pull, pass-by-references Pull, pass-by-values
Push, pass-by-values
APSCC 2010, Hangzhou 9 Dec 2010 14
Publishing data concern information
Off-line publishing of data concerns suitable for static data concerns
the publishing of data concerns of a data resource is separated from the service operation which provides the access to the data resource
On-the-fly publishing of data concerns by associating concerns with retrieved data resources the resulting data resources (e.g., via queries) are annotated with data
concerns evaluated by data concerns evaluation tools.
suitable for providing dynamic data concerns
On-the-fly publishing of data concerns through queries the use of different service operation parameters to query data
concerns of data resources
suitable for validating data concerns before accessing data resources
APSCC 2010, Hangzhou 9 Dec 2010 15
How do we utilize the data concern-aware service engineering process?
Using this model we can determine and publish several concerns
Our “a proof-of-concept” A framework for evaluating and publishing QoD of
DaaS A proof-of-concept implementation of data concern-
aware service engineering process
Another example: model and publish privacy concerns for DaaS [ECOWS 2010]
Michael Mrissa, Salah-Eddine Tbahriti, Hong-Linh Truong, "Privacy model and annotation for DaaS", The 8th European Conference on Web Services (ECOWS 2010), (c)IEEE Computer Society, 1-3 December, 2010, Ayia Napa, Cyprus
APSCC 2010, Hangzhou 9 Dec 2010 16
QoD framework: pull QoD evaluation models for DaaS
Pull QoD Evaluation Models for DaaS
Pass-by-references and pass-by-value
References of data resources: URI
Values: any object
Third-party data evaluation tools
APSCC 2010, Hangzhou 9 Dec 2010 17
QoD framework: publishing concerns (1)
Off-line data concern publishing a common data concern
publication specification a tool for providing data concerns
according to the specification supported by external service
information systems
APSCC 2010, Hangzhou 9 Dec 2010 18
QoD framework: publishing concerns (2)
On-the-fly querying data concerns associated with data resources Using our proposed REST parameter convention in
[Composable Web 2010] Based on metric names in the data concern
specification Specifying requests by using utilizing query parameters
the form of metricName=value
GET/resource?accuracy="0.5"&location=’’Europe”
Hong Linh Truong, Schahram Dustdar, Andrea Maurino, Marco Comerio: Context, Quality and Relevance: Dependencies and Impacts on RESTful Web Services Design. ICWE Workshops 2010: 347-359
APSCC 2010, Hangzhou 9 Dec 2010 19
QoD framework: QoD monitoring and composition
QoD concerns monitoring and composition are useful for the evaluation of aggregated data resources
Our approach Utilizing monitoring rules QoD metrics of data resources are passed to an rule
engine Rules are user-defined for monitoring and composing
QoD metrics
APSCC 2010, Hangzhou 9 Dec 2010 20
Experiments
Implementation Java, JAX-RS/Jersey
Drools
Utilizing UNDataAPI - www.undata-api.org XML data sets without QoD
Illustrating examples: check data from 1990-2009 datasetcompleteness: the completeness of the list of
countries
dataelementcompleteness: the completeness of data elements in the list metrics
RESTful services wrapping to UNDataAPI
APSCC 2010, Hangzhou 9 Dec 2010 21
Experiment: evaluating and annotating QoD metrics
http://www.infosys.tuwien.ac.at/prototyp/SOD1/dataconcerns/
APSCC 2010, Hangzhou 9 Dec 2010 22
Experiments: publishing QoD with data resources
APSCC 2010, Hangzhou 9 Dec 2010 23
Experiments: simple rules for monitoring and composing QoD
APSCC 2010, Hangzhou 9 Dec 2010 24
Conclusions and future work A novel, generic data concern-aware service engineering
process for DaaS A proof-of-concept implementation for evaluating of
quality of data in REST-based DaaS but in principle other concerns can be supported more evaluation are needed
Open research questions: how to deal with other concerns ? what are the trade-offs between on-line and off-line
evaluation ? how to utilize evaluated data concerns for optimizing
data compositions ?
APSCC 2010, Hangzhou 9 Dec 2010 25
Thanks for your attention!
Hong-Linh TruongDistributed Systems GroupVienna University of TechnologyAustria
[email protected]://www.infosys.tuwien.ac.at