Damia: Data Mashups for Intranet Applications
David E. Simmen, et alIBM Almaden Research Center
Presented by John Nielsen
Terms and Acronyms (in order of appearance)
Damia: DAta Mashups for Intranet Applications
Web 2.0: Focus on applications, collaboration and interaction (rather than pages and browsing)
AJAX: Asynchronous JavaScript and XML. Umbrella term for web development techniques that utilize background data transfer and scripted client-side applications
ATOM: XML-based web syndication (feed) format standard nominally intended to replace RSS
XML: eXtensible Markup Language JSON: JavaScript Object Notation.
Lightweight data structure transmission format
RSS: Really Simple Syndication. XML-based web syndication (feed) format
REST: REpresentational State Transfer. Umbrella term for simple interfaces used to transfer data via (e.g.) HTTP without another messaging layer
PHP: PHP Hypertext Preprocessor. Server-side scripting language that can be embedded in HTML
Ruby on Rails: complete framework for building database-backed web applications with (intended) relative ease
API: Application Programmer Interface. Abstraction of the functions, classes, etc. in a program or library that are available to other programs
GUI: Graphical User Interface URL: Universal Resource Locator LAMP stack: Originally Linux, Apache,
MySQL, PHP. Generalized to mean any solution stack comprised of free/open source software used to run a web application server
Zend Framework: Open source, object-oriented web application framework written in PHP
DB2: a family of IBM relational database products
Terms and Acronyms (continued)
MySQL: a freely available relational database
Dojo toolkit: Tools and utilities for creating AJAX/JavaScript applications.
LOB: Line of business(?) ADM: Augmentation-level Data Model Xquery: query language for extracting
data from XML documents XDM: Xquery Data Model FLWOR expression: For, Let, Where,
Order by, Return. Style of Xquery query that performs projections and joins on one or more XML sources and returns a sorted list of results
Closed operator: Given an operator (or function or transformation) and a domain (or set of inputs), the operator is closed under the domain if for every member of the domain the result of the operation is also a member of the domain
MIME types: Multipurpose Internet Mail Extensions. A standard and extensible set of document types used to identify the content of e-mail and html docs
CSV: Comma-Separated Values. A simple text format for tabular data where fields (or columns) are delimited by a comma and records (or rows) are delimited by a newline character
DOM: Document Object Model. Standard model for representing XML documents as objects
Curl: Open source tool and libraries for retrieving remote files and documents via URL using HTTP, FTP and other protocols
GNR: IBM Global Name Recognizer (phonetic similarity)
EII: Enterprise Information Integration ETL: Extract, Transform, Load. A
method of pulling data into a warehouse RDF: Resource Description Framework.
Simple model for describing metadata for e.g. the semantic web
Motivation
Business leaders want “situational” applications that use data from many sources, some of them nontraditional
Web technologies have evolved to allow information exchange and collaboration using lightweight standards—web services, Web 2.0, etc.
Damia uses modern web technology to allow business users to create “mashups” using whatever data sources they choose
How does it work?
User specifies sources either as existing RSS/ATOM feeds or as custom feeds
Custom feeds can be created by uploading documents of a known type (CSV) or through a data-source-specific connector
User specifies filters, join conditions and other transformations using a fixed set of operators
Damia converts the user input to an XML-formatted Mashup specification
On execution the Augmentation engine reads the sources, performs the mashup operations, and publishes the result as a new RSS/ATOM feed
Mashup results intended to be requested and consumed by other applications
Available Operators (from MashupHub)
Source: Import data to the mashup
Combine: Create one feed from two or more inputs
Filter: Output only entries from the input satisfying certain conditions
For Each: Place the values from one operator into the URL parameter for another operator, return results
Group: Organize entries into categories based on a specified element
Merge: Join two inputs based on certain match conditions
Publish: Specify output format of the mashup
Sort: Re-order entries based on a specified field value
Transform: Modify entries from the input based on specified text or math functions
Usage Scenarios (from paper)
Customer Service– Receive name suggestions from phonetic
similarity matcher (source, transform)– Look up matches in customer service DB (source,
merge (augment))– Adjust output to desired format (transform)– Publish
Usage Scenarios (continued)
Weather Alerts for Insurance Agent– Upload insurance data spreadsheet (source, via
custom feed)– Identify zip codes from spreadsheet (filter)– Import weather alerts from NWS (source)– Compare/match zip codes from spreadsheet w/
zip codes from weather alerts (merge (augment))– Publish formatted list of customers likely to be
affected by severe weather (transform, publish)
Future Work
Data Standardization Continuous mashups—true mashup
subscription rather than just on-demand Additional data import connectors In-depth search Data quality (or pedigree)