lundi 7 décembre 2015 lavoisier. motivations data sources provided by many partners...
TRANSCRIPT
Saturday, April 22, 2023
LavoisierLavoisier
MotivationsMotivations
data sources provided by many partners– heterogeneity of used technologies
objectives– reduce complexity / increase maintainability– factorize development efforts– enable accessing data
• efficiently• reliably
What is Lavoisier ?What is Lavoisier ?
an extensible service for providing an unified view of data collected from multiple heterogeneous sources– data is represented in XML– query language is XSLT
• easy to write cross data sources queries
OverviewOverview
XSLTplugin
WSplugin
SQLplugin
flat fileplugin
data view manager
flatfile
flatfile
heterogeneousdata sources
WS
service
processXSL
SetRP
RPRPRPRP
QueryRP
GetMultipleRP
GetRP
RDBMS
startup notified
DataView
refreshed
configconfig
DataView
administrator
user(e.g. CIC-portal)
getDataView
XML
DataView
about to expire
XML
DataView
XML
developer
Legend
plug-ins
Engine
operations
trigger
DataView
existant
Role: plug-in developerRole: plug-in developer
reusable plug-ins– RDBMS
– LDAP
– Web Services (WSRF)
– Run command line
– Local XML file
– Remote XML file• http, https
– HTML file (in progress)
– Flat file (in progress)
specific plug-ins– GGUS– get server public cert.– any java code that build
an XML document
other plug-ins– index of data views– status of data views– XSL transform– XML filter (SAX-based)
Role: administrator (1/2)Role: administrator (1/2)
configure plug-ins– validation of data view– retry rules
• for each Exception• java.lang.Exception is the
catch-all exception
– argument values• static values• extracted from another
data view (xpath+regex)
– plug-in specific config
configure data views cache management, depending on characteristics and usage profile of
– the data source• total amount of data,
update frequency, effective latency
– the generated view• amount of generated
data, time-to-live, tolerable latency
Role: administrator (2/2)Role: administrator (2/2)
configure data views cache management– cache type
• in-memory• on-disk• no cache
– cache validity period• in case of data source
unavailability
– set of rules triggering cache update
• startup• time-based• notification• view access
– write– read
• cache expiration• cache dependencies
– with or without enforcement of data views consistency
combination
Reload configuration on the fly (restartonly plug-ins with modified configuration)=> minimal service interruption
Reload configuration on the fly (restartonly plug-ins with modified configuration)=> minimal service interruption
Role: userRole: user
query data views– through WSRF standard commands
• with any WSRF-compliant client (e.g. Globus 4)
– through server-side XSLT processing• only the result of the processing is transferred• the result can be
– XML– HTML– text
ConclusionConclusion
Maintainability– thanks to unified view of data
Factorization of efforts– thanks to separation of roles
• plug-in developer, administrator, user
Data access– efficiency
• thanks to caching of data views
– robustness• by keeping previous data view if data source is not available
(Some of the) perspectives(Some of the) perspectives
Move to Apache Maven (improve build process) Schedule plug-ins execution according to memory
consumption Add new configuration features
– rules to ignore some partial failures, new triggers…
Develop new plug-ins– XQuery, rewrite remote XML file plug-in (with JSAGA)
…