Download - Open Platform
© 2008 Palantir Technologies Inc. All rights reserved.
Palantir Open PlatformBrian SchimpfForward Deployed Engineer
Presentation Overview
Palantir is an open platform– Designed from the ground up to be open and extensible– Rich set of APIs spanning the product
Palantir works with your IT infrastructure In this talk
– Integrating with existing software ecosystem– Palantir extensibility
Existing IT Ecosystem
Your existing IT infrastructure– Authentication– Information Extractors– Legacy data stores– Rapidly changing data sources
Existing IT Ecosystem
Your existing IT infrastructure– Authentication– Information Extractors– Legacy data stores– Rapidly changing data sources
Authentication
Already have an existing authentication and authorization infrastructure
May have multiple authentication sources
Want to provide a unified access control solution across information sources
LDAP Credentials
Public Key Credentials
Authentication WS provides a common interface Provide users, groups and group memberships Allows multiple sources to be registered
PKI Auth Source
LDAP Auth Source
Authentication Web Service
Dispatch Server
Sample User AUsername: jdoe@source1Name: Jane DoeUID: ABCD-EFGH-IJKLGroups: 1234, 5678
Sample User BUsername: jsmith@source2Name: John SmithUID: ZXYW-VUTS-RQPOGroups: 9876, 5432
Authjdoe@source1
Authjsmith@source2
Authentication Web Service
Prebuilt implementation for LDAP– Compatible with Microsoft Active Directory
Implemented via SOAP-RPC– Can be arbitrarily complex
Works seamlessly with Palantir Access Control Model– ACLs can span authentication sources
Can be leveraged by other applications for authentication
Existing IT Ecosystem
Your existing IT infrastructure– Authentication– Information Extractors– Legacy data stores– Rapidly changing data sources
Information Extractors
Large repositories of unstructured text Multiple information extractors have been run across the text Provide different types of extraction
– Entities– Relationships– Metadata– Geotagging
Siloed view of each entity extractors output Want to combine these views alongside structured data into one
interface
Entity Extractor SDK
Palantir provides excellent visualization and integration of entity extracted documents
Entity Extractor SDK provides common interface to all major extractors– Command line interface– SOA Web Service
Entity Extractor SDK
Leverages DocXML format to represent data– Can combine multiple extractor outputs into one representation– See Palantir XML Formats Presentation for more information
Standard SOAP-RPC and XML allows custom implementations in any language on any platform
Open interface and format allow the platform to be leveraged by other applications
Existing IT Ecosystem
Your existing IT infrastructure– Authentication– Information Extractors– Legacy data stores– Rapidly changing data sources
Legacy Data Systems
Multiple stovepiped sources of information No common schema No common interface No common access control Want to provide common interface for analysis and data access
XML APIs
Palantir XML provides a serialized form of the Palantir Object Model Can exactly control the representation of data in Palantir
– Fine grained access control – Tracking of pedigree and lineage
See Palantir XML Formats Presentation for more information
Existing IT Ecosystem
Your existing IT infrastructure– Authentication– Information Extractors– Legacy data stores– Rapidly changing data sources
Rapidly Changing Data Sources
New data sources come on line all the time Want to easily integrate this content with existing data to discover
new information Palantir has flexible user interface and backend data import utilities
– Easy to quickly map new datasets– Handles popular unstructured document formats– Rapidly transforms structured import sources
• Flat files• Excel spreadsheets• Relational databases
Data Quality
This looks great, but… The quality of analysis is only as good as the data that goes into it
Palantir handles dirty data– Attempts to parse and validate attribute values– Unparseable, incomplete or invalid data is still allowed and
indexed but does not clutter the system Data parsing and validation framework is extensible through the
Palantir Ontology APIs
Object Model
Property Ontology APIs
The Palantir Ontology APIs enable developers to extend the functionality of the Property Ontology
Structure of a Property
Two types of property values are supported in Palantir– Simple
• Used for single, unparsed values• e.g. Nationality, Organization Name
– Composite• Used for values composed of discreet, semantic units• e.g. Name (first & last), Address (city, state, zip, etc).
Palantir Data Store
Approx Gen
Validator
Maker
Lifecycle of a Property
Extracted Value Raw property value
Transforms raw string
Validate components
Generate approxes
Store to database and index
ExtractTransform
Load
Property Maker
Data parsing interface Transform tool that can be leveraged by both XML APIs and
standard import interface
In: String with value “John Smith” Out: Name Property with
First Name: John Last Name: Smith
Property Parser API
In: Lindengasse 24-9, A-1020 Vienna Out: Address Property with
– Address 1: Lindengasse 24-9– City: Vienna– Postal Code: 1020– Country: AT
Passport Machine Readable Zone– Encoding of passport information in standardized form– Includes checksum after each field
Validator can verify checksum digits
Property Validator
Data validation interface Presents notification to the user if the property does not pass
validation
P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<L898902C<3UTO6908061F9406236ZE184226B<<<<<14
Approx Generator
Fuzzy searching interface Property can support multiple types of approxes Approxes are indexed for fast searching
فكري:محمد
Example: Arabic Name Normalization
Transliterate Name to Arabic
ETL Tools
The Palantir Ontology APIs allow you to customize Palantir’s data handling
Extensions are leveraged across all imports– Data integration without a complex ETL toolchain
Works with manually entered or tagged data as well
Palantir Extensibility
I have all this data in Palantir, now what?
I need to extract the information for some other tool I need to present the information to the user in a different way
Client Connection API allows for all these operations
Client Connection API
Used by Palantir Workspace Proxies all requests to Dispatch Written in Java Get started coding in 5 minutes Provides abstraction for
– Object Model– Revisioning DB– Access Control Model
Dispatch Server
Spring HTTP RPC
Client
SOA Data Interface
Client Connection API used to provide SOA WS interface to Palantir Examples
– Searching • Works across all data sources• SearchQuery and SearchAroundQuery classes
– Entity Extractor tuning• Retrieve manual edited tagging to train entity extractor• getAppEventObjectInfo for begin and end date
– Revisioning database• Extract history of changes to objects• DBEvent class
Custom Presentation
Simple access to data for custom presentation Searching and storing objects requires a few API calls Examples
– Data entry forms• Standard border crossing forms• createBlankObject, Property.attemptToCreate
– Report generation• Report on changes in activity• SearchQuery and HGBin class
– Thin client graph presentation• Transform graph to HTML• Graph class
Client Connection API
Provides simple and powerful access to Palantir data Functionality of application plus more Complete web-based viewer application written in under 6 hours
Summary
Palantir integrates with and becomes a part of your infrastructure Can unify your authentication, information extraction and data
resources in one environment Provides a rich platform that can be leveraged in other projects
© 2008 Palantir Technologies Inc. All rights reserved.
Palantir Open PlatformBrian SchimpfForward Deployed Engineer