Having Your Cake and Eating It Too

Download Having Your Cake and Eating It Too

Post on 14-Jan-2016

34 views

Category:

Documents

4 download

DESCRIPTION

Having Your Cake and Eating It Too. With Apache OODT and Apache Solr. Andrew F. Hart Paul M. Ramirez. About Myself. Software Engineer NASA Jet Propulsion Laboratory Data Management Committer: OODT, SIS, Gora, Streams (Incubating) Mentor: Streams (Incubating). What Well Cover. - PowerPoint PPT Presentation

TRANSCRIPT

  • Having Your Cake and Eating It TooWith Apache OODT and Apache SolrAndrew F. HartPaul M. Ramirez

  • About MyselfSoftware EngineerNASA Jet Propulsion LaboratoryData Management Committer: OODT, SIS, Gora, Streams (Incubating)Mentor: Streams (Incubating)

  • What Well CoverOverview of OODT & Solr ProjectsStrategies for Combining OODT and SolrDetailed Deployment/Config. ExampleWhere to Learn More & Participate

  • Apache OODTObject Oriented Data TechnologyOrigin in NASA mission data systemsComponents forInformation integrationData cataloging and archivingConfigurable workflow processing

  • Apache OODTOODT @ ApacheIncubation: 2010, Graduation: 201129 CommittersLatest Release: 0.5 (Dec. 26, 2012)

  • Apache OODTKaroo Array Telescope (KAT-7)

  • Apache OODTVirtual Pediatric Intensive Care Unit

  • Apache OODTRegional Climate Model Evaluation System

  • Apache OODTCommonalities between systemsLots of dataDefined processing steps / algorithmsArchives important ( search important)

  • Apache OODTStrengths of OODT for the above use casesLoosely coupled componentsStandard protocols, well-defined interfacesHighly configurableVetted, reliable code

  • Apache SolrSearch + Web ServicesPowerful featuresFlexible formatsHighly configurable

  • Apache SolrThe White House

  • Apache SolrNetflix

  • Apache SolrNASA Planetary Data System

  • OODT & SolrWhy use these projects together?Archives often need search capabilitySimilarities / CompatibilitiesXML-based configurationEnvironment (Java, Tomcat)

  • Example IntegrationStandard Data Archive Pipeline

  • Example IntegrationStandard Data Archive Pipeline + Search

  • OODT ProductsTypically 1-1 with FilesEach uniquely identifiable (GUID)Support for higher-level ProductTypeA way to define collections

  • OODT MetadataAnnotations for productsKey:{Val|Multival}Common across all OODT componentsTwo general classes: SystemUser

  • OODT MetadataSystem MetadataAdded automatically by OODT ComponentsUsed to track stateUsed to encode relationships between data

  • OODT MetadataUser MetadataSpecified as policyCan be product-level, or productType-levelUsed to extract & persist information from files as they are ingested (become products)

  • OODT MetadataMetadata (Policy) Example

    (external)

  • Solr SchemaXML documentDefine what will be indexed (Fields)Provide high-level context hintsData type, behavior, pre-processingExtremely flexible, extensible

  • Solr SchemaSolr Schema Example

    (external)

  • Making the ConnectionSolrIndexer ToolPart of the File Manager component toolsMap OODT Metadata to Solr FieldsCreate Solr documents from OODT productsNote: only talking about metadata

  • SolrIndexer ToolOrg.Apache.Oodt.Cas.Filemgr.ToolsAvailable since 0.4 ReleaseRecommend to use 0.5+ as some stability improvements were addedSeveral modes of operation

  • SolrIndexer Tool

  • SolrIndexerToolInvocation Examples: Ingest all products from the specified File Manager instance

    java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --all \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr

  • SolrIndexerToolInvocation Examples: Ingest all products from the specified ProductType(s)

    java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --types urn:some:ProductType \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr

  • SolrIndexerToolInvocation Examples: Ingest a single product by its unique product id

    java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --product 19bcb4b8-7999-11e1-b581-8b771498975d \ [--delete] \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr

  • SolrIndexerToolInvocation Examples: Force optimization of the Solr index

    java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --optimize --solrUrl http://localhost:8080/solr

  • Indexer.propertiesConfiguration file for the SolrIndexerSpecify mapping between OODT product metadata and Solr fieldsAdditional pre-processing features

  • Indexer.propertiesExample Indexer.properties file

    (external)

  • Use Case IBuilding a searchable data archiveLong-term / Lights-out archiveProducts & metadata immutableMany NASA mission data systems use this modelWant to make it easily searchable

  • Use Case IStandard Data Archive Pipeline + Search

  • Use Cases IIBuilding an interactively editable, searchable data archiveData and metadata mutableWant to dynamically select product(s) to edit based on metadata

  • Use Case IIInteractively Editable Data Archive Pipeline + Search

  • Use Case IIInteractively Editable Data Archive Pipeline + SearchSolr catalog out of sync!

  • SynchronizationTwo ways (at least) to solve this:Modify the OODT Curator ServicesTreat OODT Curator Services as black box and write wrapper service to invoke Curator Services AND update Solr (via scripted call to SolrIndexer, for example)

  • Modify Curator ServicesServices implemented in JAX-RS/curator/src/main/java/org/apache/oodt/cas/curation/service[curator_url]/services/metadata/updateOptions:Utilize Solr Java APIWrap call to OODT SolrIndexer tool

  • Use Case II-AModified Curator Services to Simultaneously update Solr

  • ExampleInteractive event tagging

  • Wrap Curator ServicesCurator Service/API is black boxDevelop custom service that: Issues POST request to Curator serviceUpdates Solr index via, e.g.:Utilize Solr Java APIWrap call to OODT SolrIndexer tool

  • Use Case II-BWrapping OODT Curation Services with Custom UI & Services

  • Example

  • LessonsSolr compliments OODT File ManagerRESTful interfaces (Solr + OODT Curator) allow for great flexibility in designing services and UIBest approach depends on situation

  • Next StepsDevelop SolrCatalog for OODT File Manager?Pros: Reduction in moving partsCons: Restrictive?Implement Use Case II-A as optional mode for Curator web service layer

  • Learning MoreSolrhttp://lucene.apache.org/solrsolr-user@lucene.apache.orgOODThttp://oodt.apache.orghttps://cwiki.apache.org/confluence/display/OODT/Homeoodt-user@apache.org

  • Thanks!Questions?

    Post-ingest success action (crawler action) [ runs python script]*JAX-RS framework flexibilityJAX-RS version of a ServletFilter (intercept incoming request and outgoing response) ResponseHandler- Decouple updates (2nd call to Solr) from the original black-box curator met update call*

Recommended

View more >