the power of marklogic

Download The Power of MarkLogic

If you can't read please download the document

Upload: marklogic

Post on 30-Jul-2015

102 views

Category:

Software


5 download

TRANSCRIPT

1. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. The Power of Combining Documents, Semantic Triples, and Values with Enterprise NoSQL Dave Cassel, Developer Community Manager, MarkLogic April 28, 2015 2. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 2 Agenda Document Stores Semantic Triple Stores What Do I Mean By Values? Combining Use Cases 3. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 3 Agenda Document Stores Semantic Triple Stores What Do I Mean By Values? Combining Use Cases 4. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 4 NoSQL Document Store Level Set A) I regularly work with Document Stores B) I have read about or have basic understanding of Document Stores C) Is that in Copley Place? 5. Lets play a game 6. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 6 A book document looks like this Book Info Title = I Love Penguins Author = C. Lion Section Chapter Paragraph = I love penguins because Paragraph = On the subject of food Chapter Paragraph Section Chapter Chapter Chapter Paragraph Paragraph title author Section I love Penguins C. Lion How should I store a bibliography? Is a Forward different from a Chapter? What about an index? What if a book has multiple authors? 7. What if we store the book like this Book Info Title = I Love Penguins Author = C. Lion Section Chapter Paragraph = I love penguins because Paragraph = On the subject of food Chapter Paragraph Section Chapter Chapter Chapter Paragraph Paragraph 8. and add a little structure I Love PenguinsC. LionI love penguins becauseBlah blah 9. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 9 NoSQL Works for Relational Document Store Supports any-structured data via hierarchical data model Stores compressed trees Employees Full-time Name Address Road City Post Code Apt First Last Start Date Trade Cashflows Party Identifier Net Payment Payment Date Party Reference Payer Party Trade ID Payment AmountReceiver Party 10. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 10 Documents and Data Types Suspicious vehicle 2012-11-12Z suspicious activity suspicious vehicle 37.497075 -122.363319 A blue van IRIID IRIID isa value license-plate ABC 123 observation/surveillance 11. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 11 Denormalizing Book id title author pub_date chapter 12. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 12 Denormalizing Book id title author pub_date Chapter id book_id chap_title Paragraph id chapter_id text Author id first last BookAuthor book_id author_id 13. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 13 Denormalizing I Love PenguinsC. LionBob's BooksI love penguins becauseBlah blahBob's Books123 Main St.SomewherePA 14. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 14 Expansion ScaleUp 15. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 15 Expansion Scale Out 16. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 16 Sharding Book id title author pub_date Chapter id book_id chap_title Paragraph id chapter_id text Author id first last BookAuthor book_id author_id 17. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 17 Sharding Employees Full-time Name Address Road City Post Code Apt First Last Start Date Employees Full-time Name Address Road City Post Code Apt First Last Start Date 18. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 18 Document Modeling: Progressive Enhancement 19. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 19 Beauty of Document Stores No tables Ingest as is Index Everything! Elastic 20. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 20 Agenda Document Stores Semantic Triple Stores What Do I Mean By Values? Combining Use Cases 21. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 21 Semantic Triple Store Level Set A) I SPARQL B) I know the concepts C) the most exciting play in baseball 22. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 22 Semantics Is: A New Way to Organize Data Data is stored in Triples, expressed as: Subject : Predicate : Object John Smith : livesIn : London London : isIn : England Query with SPARQL, gives us simple lookup .. and more! Find people who live in (a place that's in) England "John Smith" "England" livesIn "London" isIn livesIn RDF Triples 23. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 23 Triple Store A Simple Triple Store is good when you want to look up facts model atomic facts, relationships reference data explore a graph model relationships/links combine sources triples are easy to share, easy to combine update some triples easy to insert/delete/update a single fact easy to insert/delete/update any part of the ontology (facts about the data) use the magic of inference simpler data modeling, data integration 24. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 24 Biology Neuro-Biology London New York Works at Located inLives inLived in Studies type of Neurology Related To Specializes In Funded By Specializes In Reference Data Metadata Provenance Modeling facts, relationships, links Triple Store as Graph 25. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 25 DOMAIN WORLD AT LARGE DOCUMENTS 26. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 26 Facts from the World at Large Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ Linked Open Data freely available easily consumed 27. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 27 Facts from your domain Like Open Data, but domain specific Might be proprietary within a company Or shared across an industry Includes data and ontologies Some Examples A bank's proprietary reference data A pharmaceutical company's drug ontology An industry-wide ontology such as FIBO Proprietary Semantic Facts (Facts and Taxonomies in your organization or industry) 28. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 28 Facts from Documents 29. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 29 Facts About Data 30. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 30 SPARQL W3C Standard query language for RDF query select (count(distinct ?s) as ?count) where { ?s http://dbpedia.org/resource/Ukraine } discovery describe ?person where {?person ""Weird Al""@en } transitive SELECT ?member WHERE { ?member :hasParent* :Joe . } 31. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 31 SPARQL inference PREFIX ex: http://example.com/ rule "bornInCountry" CONSTRUCT { ?person ?place2 } { ?person ?place1 . ?place1 ?place2 } update INSERT DATA { http://dbpedia.org/resource/Ukraine } 32. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 32 Agenda Document Stores Semantic Triple Stores What Do I Mean By Values? Combining Use Cases 33. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 33 Values in a Document Store Article Title Dates Abstract Body Section Section Para Para Para Para Para Trade Type Dates Amount Parties Seller Buyer Name Affiliation Name Affiliation Paid By 34. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 34 Range indexes map document IDs to values, and vice-versa in a compact in-memory representation. Range indexes work like a built-in in-memory column store Real-Time Analytics Range indexes can be used for Faceted search Aggregation and visualization 35. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 35 Real-Time Analytics Range indexes can be used for Faceted search Aggregation and visualization Analytics including custom user-defined functions 36. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 36 Real-Time Analytics Range indexes can be used for Faceted search Aggregation and visualization Analytics including custom user-defined functions Co-occurrence 37. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 37 Agenda Document Stores Semantic Triple Stores What Do I Mean By Values? Combining Use Cases 38. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 38 Triples + Documents + Data: Complementary 39. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 39 Triples + Documents + Data: Intertwingled file:///Users/dcassel/git/why-learn/triples/concepts.ttlhttp://whylearn.org/subject#Arithmetichttp://whylearn.org/branch/http://whylearn.org/subject#Mathematics http://whylearn.org/subject#GameTheoryhttp://whylearn.org/branch/http://whylearn.org/subject#Mathematics 40. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 40 Triples + Documents + Data: Intertwingled Documents can contain triples Man bites doghttp://example.org/news/42http://example.org/published2013-09-10 41. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 41 Triples + Documents + Data: Intertwingled Triples can be annotated in documents AP Newswirehttp://example.org/news/Nixonhttp://example.org/wentToChina 42. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 42 Complementary: Better Search, Better Answers Feature: call SPARQL from server-side XQuery or JavaScript Benefit: expand search terms using SPARQL Look up synonyms, related terms/entities, nicknames, city-country-region, etc. Example: user types in "La Verde" (a nickname for the Mexico national soccer team) SPARQL expands the term to "Mexico"; looks up players for "recommendation"; looks up Mexico flag, games in current championship, previous scores to add to results 43. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 43 Complementary: graph contains documents Feature: the subject or object of a triple can be a document or data in the database Benefit: query using SPARQL, return a document or data as a result Look up synonyms, related terms/entities, provenance, ownership, etc. Return a document or data Example: user types in "La Verde" (a nickname for the Mexico national soccer team) SPARQL expands the term to "Mexico"; returns a document or data 44. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 44 Intertwingled[1]: Triples annotated in a generalized way Feature: triple storage can be annotated by XML or JSON metadata Triples metadata can be added in a completely generalized way Benefit: query the triples with SPARQL, restrict by the context of the document Find facts, but only where the metadata matches some criteria Provenance; dates; bitemporal; security; etc. Example: Show me the earnings and earningsrank of every sportsperson, but only where the facts are from a reliable source, where we have at least 70% confidence, and they were published this year 45. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 45 Intertwingled[2]: Triples embedded in document, data Feature: Triples can be embedded in an XML or JSON document The triple index stores the DocID Benefit: query the triples with SPARQL, restrict by the context of the rest of the document Find facts, but only where they appear in some context; find the document where those facts appear Show me all the people that John met: but only where that fact was found in a police report; within the last 6 months; that mentions a place within 5 miles of a training camp; and the interview notes mention an explosive device. Example: Show me the injuries that occurred in high-scoring games in the 2010 World Cup where the text mentions a hat-trick Now show me the match report did the injuries affect the match? You can only abstract some structured information from a document! 46. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 46 Documents and Triples Complementary Features Documents Triples Tree data organization Graph data organization Hierarchical relationships Granular, multi-dimension relationships Multiple, integrated hierarchies XML inline markup for structured data Data properties for structured data Manage documents unstructured content Reference metadata Cross-document links URI pointers from triples to related triples and documents Data provenance, including bitemporal annotation Linked Data, connecting data across federated databases 47. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 47 Unifying Documents and Triples Annotate documents Relate documents over selected concepts Extract entities from content Abstract concepts to support semantic search Connect documents to external data resources Annotate triples provenance confidence Enable bitemporal queries 48. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 48 What goes where? Frequently-accessed records Metadata and annotations about triples Documents Document content structure Facts Simple data lookups Relationships Find an entity and its related entities Metadata and annotations about documents Taxonomies Data links to external data resources 49. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 49 What goes where? Embedded Triples that describe the document Facts, relationships and annotations about the contents of the document Links to related documents and external data Triples that describe the semantic model and other data Definitions of shared data concepts, rules Data that is not specific to a particular document 50. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 50 Agenda Document Stores Semantic Triple Stores What Do I Mean By Values? Combining Use Cases 51. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 51 Who is doing this? 52. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 52 Agriculture Company Semantics intelligence for R&D Data and Research (90+ data sources) Search App Classification, Publishing, Ontology Mgmt, Semantic Enrichment RDF Triple Store, Search and Query, Indexing Semantics Intelligence Platform What is the corn yield and the underlying soil type for this set of states? Corn yield data- (state_50yr_mean_corn yld.xlsx) Geospatial boundaries SSURGO soil type data 53. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 53 APA Sophisticated analysis for academic publishing 2. Doing Sophisticated Data Analysis1. Defining Relationships in the Data Leveraging semantic data for efficient big data analytics (e.g. who cited APA, who cited those citations, and so-on) Designing an ontology (vocabulary) to manage the structure and relationships of content Author Subject Is an expert in University Went to school at Is sponsored by Company 54. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 54 Entertainment Company Entertainment Company Semantic metadata hub Ontology Mgmt, Semantic Enrichment RDF Triple Store, Search and Query Metadata HubAssets Downstream Systems RDF Outputs Title HD Master Dates Production Date Editing Date Release Date International Date is Asset Title Character Film Series Animated Actress City Data Model Using Documents + Data + Triples Search App 55. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 55 Link different terms that mean the same or similar things 1 Compositional hierarchy to relate each part to the whole (partonomy) 2 Engine Engine cooling Conditioner compressor gasket oil pan gasket 196,000+ Unique Vehicles Vocabulary 1 Vocabulary 2 Vocabulary 3 Vocabulary 4 Searchable Knowledge Graph Mitchell1 Knowledge graph for car parts 56. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 56 MarkLogic: Powerful, Agile & Trusted Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time Alerting Native RDF Triple Store Bitemporal Tiered Storage Fully Transactional Server-side JavaScript Hadoop and HDFS Cloud Ready (AWS) SQL Support Scalable and Elastic MarkLogic Content Pump REST API Samplestack Ad-hoc Queries Schema Agnostic XA Transactions 24/7 Engineering Support LDAP and Kerberos Security Security Certifications Configuration Management Monitoring and Management Performance at scale Customizable Failover Customizable Backup Atomic Forests Point-in-time Recovery ACID Transactions Index Across Data Types Flexible Replication Semantic Inference Multi-OS Support POWERFUL AGILE TRUSTED 57. Learn More about NoSQL & MarkLogic Read nosqlfordummies.com See It! Meet marklogic.com/training [email protected] Comment marklogic.com/blog 58. MarkLogic World Tour 2015 San Francisco | April 13-17 Washington, DC | May 7 Amsterdam | May 12 London | May 14 Boston | May 19 Tokyo | May 21 Chicago | June 2 New York City | June 18 Houston | July 16 Join us! Register today: world.marklogic.com Listen online! mlwonline.marklogic.com 59. COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 59 Dave Cassel Twitter: @dmcassel http://developer.marklogic.com http://davidcassel.net Powerful, Agile, Trusted Triples Documents Values