introduction to apache oodt

24
Introduction to Apache OODT Yang Li Mar 9, 2012

Upload: bendek

Post on 13-Jan-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Apache OODT. Yang Li Mar 9, 2012. What is OODT. Object Oriented Data Technology Science data management Archiving Systems that span scientific disciplines - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Apache OODT

Introduction to Apache OODT

Yang Li

Mar 9, 2012

Page 2: Introduction to Apache OODT

What is OODT

• Object Oriented Data Technology

• Science data management

• Archiving Systems that span scientific disciplines

• Enable interoperability among data agnostic systems (astrophysics, planetary, space science data systems, open source web analytics)

Page 3: Introduction to Apache OODT

History

• 2001– deployed to make virtual specimen bank for Early

Detection Research Network (oncology)• 2004

– Core architectural software of Planetary Data System Data Distribution deployed by NASA (planetary science)

• 2007– deployed for the Orbiting Carbon Observatory and

Seawinds missions (earth science)• 2008

– deployed in for National Polar-Orbiting Environmental Satellite System (atmospheric science)

Page 4: Introduction to Apache OODT

Framework

• Catalog & Archive

• Utilities

• Grid

• Agility

Page 5: Introduction to Apache OODT

Catalog & Archive

• Deal with large-scale ingest of data, metadata extraction of data, post-processing of data into derived and higher-order products, cataloging of data, searching of catalogs, versioning, and retrieval

• Components:– Catalog, Crawling framework, Curation, File

manager, Metadata, PCS, Push/Pull framework, Resource management, Workflow, CAS install, Web apps

Page 6: Introduction to Apache OODT

Catalog

• Virtualize underlying catalogs for use in the CAS system

• Heterogeneous catalog models are mapped to a common dictionary, and then integrated locally so that they may be queried across and ingested into

Page 7: Introduction to Apache OODT

CAS Crawler

• Standardize the common ingestion activities– identification of files and directories to

crawl– satisfaction of ingestion pre-conditions– metadata extraction

• Ingestion

Page 8: Introduction to Apache OODT

CAS Crawler

Page 9: Introduction to Apache OODT

Curation

• A web application for managing policy for products and files and metadata that have been ingested via the CAS component– Use a servlet container to deploy the web app– Staging area

• Directories on local machine holding data products

– Metadata generation area• Create metadata files to associate with data

products

Page 10: Introduction to Apache OODT

File Manager

• Provide everything to catalog, archive and manage files, and directories, and their associated metadata

• Separate data stores and metadata stores as standard interfaces

Page 11: Introduction to Apache OODT

Workflow

• Provides everything to execute workflows, and science processing pipelines.

• Separate workflow repositories and workflow engines as standard interfaces

Page 12: Introduction to Apache OODT

Resource Management

• Job management– Execution, monitoring, traking

• Underlying software system and hardware resources– e.g. disk space, computational resources,

and shared identity

Page 13: Introduction to Apache OODT

Resource Management (Cont)

• Critical objects– Job, Job Input, Job Spec, Job Instance,

Resource Node

Page 14: Introduction to Apache OODT

Metadata

• A Multi-valued, generic Metadata container class

• Internal map of string keys pointing to vectors of strings – [std:string key] std:vector of std:strings⇒

Page 15: Introduction to Apache OODT

Framework

• Catalog & Archive

• Common Utilities

• Grid

• Agility

Page 16: Introduction to Apache OODT

Common Utilities

• Provide needed support for catalogs, archives, and grids

• Query Expression – Platform neutral and extensible way of

posing questions

• Single Sign On

• Commons– Lots of miscellaneous utilities, including I/O

streams, logging, XML, and more

Page 17: Introduction to Apache OODT

Query Expression

• Provide a way to express queries in a generic manner

• Use boolean postfix expressions to capture the domain, range, and constraint of a query, regardless of the source of the query

• Encapsulate the results of a query– standard way to pass a query and its

results between servers, clients, nodes, and other components.

Page 18: Introduction to Apache OODT

Framework

• Catalog & Archive

• Utilities

• Grid

• Agility

Page 19: Introduction to Apache OODT

Grid

• Profile (metadata) and Product (data) services• Product

– Retrieves resources (products) in platform-neutral formats

• Profile– Describes and discovers resources using

extensible metadata called "profiles"• Web Grid

– provides profile and product services over a REST-ful interface.

• XML Product/Profile handlers– provides XML-configurable, Database profile and

product handlers.

Page 20: Introduction to Apache OODT

Product

• Provide access to data products– datasets, images, documents, or anything

with an electronic representation

• Accept standard query expressions and return zero or more matching products

• Transform products from proprietary formats and into Internet standard formats without impacting local stores or operations.

Page 21: Introduction to Apache OODT

Profile

• Describes and Locates resources using metadata descriptions– resource's inception, composition, and

location

• Catalogs metadata descriptions and provides creating, updating, and querying capabilities.

Page 22: Introduction to Apache OODT

Framework

• Catalog & Archive

• Utilities

• Grid

• Agility

Page 23: Introduction to Apache OODT

Agility

• Re-implementation of Grid in Python with a focus on high performance in the face of gargantuan data sets as well as accelerated development and integration into existing systems.

Page 24: Introduction to Apache OODT

Questions