extending solr: packaging common - · pdf fileextending solr: packaging common sense ......

33

Upload: hoangxuyen

Post on 02-Feb-2018

243 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin
Page 2: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Extending Solr: Packaging Common

SenseCarlos Valcarcel

Solutions Consultant, Lucidworks

Page 3: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

•In the beginning: Solr (Classic)•Solr Cloud•Solr•Security•Content Ingestion•Fusion•Architecture•Connectors•Pipelines•Signals•Visualization

Agenda

Page 4: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

In The Beginning…

Page 5: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Solr was (un)officially born August 2004

It became an official Apache project January 3, 2006

Implemented on top of Lucene

The initial architecture was Master/Slave

Blog post: https://lucidworks.com/blog/2016/02/02/happy-10th-birthday-apache-solr/

Solr (Classic)

Page 6: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

A natural evolution of Solr

•Zookeeper for property administration and synchronization•Transparent scaling (just add more replicas)•Replication rules•Major pro: New Solr features are distributed!•Major con: Not all Solr features are distributed! More testing!

Solr Cloud

Page 7: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

- Search for the masses!- Easy to use- Full control over how documents are indexed and queried- DIY- Open source: you can extend it in any way you like- Mature search technology, strong underlying libraries- Used by…well, almost everyone

Awesomeness of Solr

Page 8: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

- Solr is not trivial to use- Consultants in high demand- DIY == Fix it Yourself- Full control == Lots of responsibilities- Open source == you can extend it, but so can everybody

else- Mature search technology == higher-level search

abstractions aren’t always implemented- Large audience == Harder to implement custom features

that don’t break in between releases

Awesomeness of Solr…at a price

Page 9: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

• Apache Lucene

• Grouping and Joins

• Stats, expressions, transformations and more

• Lang. Detection

• Extensible

• Massive Scale/Fault tolerance

Solr Key Features

• Full text search (Info Retr.)

• Facets/Guided Nav galore!

• Lots of data types

• Spelling, auto-complete, highlighting

• Cursors

• More Like This

• De-duplication

Page 10: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

•SSL•Jetty•Kerberos•Solr plug-in•Document-level security•Sharepoint•Windows Shares

Reference: https://wiki.apache.org/solr/SolrSecurityhttps://cwiki.apache.org/confluence/display/solr/Enabling+SSL https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Solr Security

Page 11: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Pro: Lots of choices!Con: Lots of choices!

Basic: Request Handlers- Structured File types

- CSV- XML- JSON

- Binary File types- PDF- MS Office format

Advanced: External Repositories- ManifoldCF- Commercial products

Solr and Content Ingestion

Page 12: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

And then along came:Fusion

Page 13: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Solr is so powerful that it needs a front-end

- Administration- Care and feeding- Development: REST API w/security- Allows for control of multiple external Solr Clouds- Integrates with other OS projects

- Spark- Banana (SiLK)

- First generation

Fusion and the search for World Domination

Page 14: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Lucidworks Fusion Is Search-Driven Everything

• Drive next generation relevance via Content, Collaboration and Context

• Harness best in class Open Source: Apache Solr + Spark

• Simplify application development and reduce ongoing maintenance

Fusion is built on three core principles:

Page 15: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Fusion Processes

zk

9983

ui

8765

•  admin UI •  authentication

8983

•  1 replica SolrCloud •  embedded

Zookeeper (shared with other components)

•  aggregator •  index pipelines •  query pipelines •  scheduler •  collection management •  recommender •  system metrics •  spark jobserver

8764

api (“backend”)

Developer Fusion Deployment

8769

spark worker

8766

spark master

connectors

8984 •  data sources •  index pipelines

Page 16: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Fusion Architecture

REST

API

Worker Worker Cluster Mgr.

Apache Spark

Shards Shards

Apache Solr

HD

FS (O

ptio

nal)

Shared Config Mgmt

Leader Election

Load Balancing

ZK 1

Apache Zookeeper

ZK N

DATABASEWEBFILELOGSHADOOP CLOUD

Connectors

Alerting/Messaging

NLP

Pipelines

Blob Storage

Scheduling

Recommenders/

Core Services

Admin UI

SECURITY BUILT-IN

Page 17: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Database - JDBC- CouchDB- MongoDBFilesystem - Box- Dropbox- FTP- GoogleDrive- HDFS- Local- S3- S3H- SolrXML- Windows Share

Fusion Connectors

Hadoop - Apache Hadoop- Cloudera- Hortonworks- Mapr- PivotalLogging - LogstashSocial Media - Jive- Slack- Twitter search- Twitter streamingWeb Websphere

Repository - Alfresco- Azure blob- Azure table- Drupal- GitHub- JIRA- Salesforce- SharePoint- ServiceNow- Solr- Subversion- Zendesk

Push - Content to a portScript - roll your own

Page 18: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Index Pipelines/stages- Aggregating- Apache Camel- Apache Tika Parser- CSV Parser- Date Parsing- Exclusion Filter- Field Mapper- Find Replace- Fusion Pipeline- Gazetteer Lookup Extractor- HTML Transform- Indexing RPC- Javascript- JDBC- and others

Fusion Pipelines and Stages

Query Pipelines/stages- Active Directory Security Trimming- Advanced Boosting- Aggregating- Block Documents- Boost Documents- Facet- Javascript- JDBC- Landing Pages- Logging Query Stage- Recommendation Boosting- Return QueryParams Query Stage- Rollup Aggregator Query Stage- Search Fields Query Stage- and others

Pipelines: preprocess incoming information in a predictable way

Page 19: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Fusion Signals

Signals are captured user events that tell us something about what the user is doing- page views- page pings- clicked links- custom configured events

Can be used to equate user behavior:- at different times of day- in different geographic locations- during different weather conditions

Reference:http://www.slideshare.net/lucidworks/events-processing-and-data-analysis-with-lucidworks-fusion-presented-by-kiran-chitturi-lucidworks

Page 20: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Fusion Signals

test

Primary collection

Raw signals

collection

Aggregated signals

collection

test_signals test_signals_aggr

Signals Service

JSON payloads

Snowplow payloads

Solr

Signals - data flow

Page 21: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Fusion Signals

2

Aggregations - data flow

Aggregation job

Aggregator Spark Agent

test

Primary collection

Raw signals collection

Worker Worker Cluster Mgr.

Spark

Aggregated signals collection

Spark Driver

Stores aggregated results

Fetches raw signals for processing

test_signals test_signals_aggr

Page 22: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Fusion Signals

3

Boosting search results using aggregated documents User App

Search query

Query-pipeline

stages

Set Params Query Solr

Raw signals collection

Aggregated signals

collection

test_signals test_signals_aggr

Recommendation Stages

test

Primary collection

1.  Query aggregated documents 2.  Process results 3.  Add parameters to the request

Search response

Page 23: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Fusion Visualization

Page 24: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

The Day After Tomorrow…

Page 25: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

• Ease of Use

• Modern, consistent, “introspectable" APIs

• Scalability

• Cross Data Center Replication

• Performance improvements

• Analytics and Relevance

• SQL

• Graphs

Major Themes

• Ease of Use

• Point and click Time Series indexing

• Relevance and Taxonomy Mgmt tools

• Indexing Previews

• Analytics and Relevance

• Query intent and related machine learning

• ZoomData integration

Apache Solr (6.0) Fusion

Page 26: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

• Improved Spark-Solr data locality integration

• 10x performance improvement!

• Lucene analyzers for Spark data processing

• Easily and simply build and deploy Spark-based Machine Learning with minimal coding

• Leverage best in class libraries like MLLib, Mahout and DL4J

Simplicity on Top, Power Under the Hood

Page 27: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

• Standalone Reference Search UI showcasing Fusion best practices (April/May)

• Signals, pipelines, auto-suggest, faceting, search, did you mean

• Built on AngularJS

• Performance improvements in pipelines (Fusion 2.3)

• 30-50% overall increase for all pipelines

• 3x improvement for pipelines using Javascript stages

• Improved Devops support (plugins, distributed coordination — 2.3 and 2.4)

• Monitoring, Server Management, Deployment

“Too much of a good thing can be wonderful.” - Mae West

Page 28: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

You Have Questions…I might have a few answers

Page 29: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

2016

OCTOBER 11-14BOSTON, MA

CALL FOR PAPERS OPEN THROUGH APRIL 30!lucenerevolution.org

Page 30: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

2016

OCTOBER 11-14BOSTON, MA

Meetup Discount: 20% off Super Early Bird registration through April 30

- OR -

www.lucenesolr-revolution-2016.eventbritecom/

Code: NESTMeetup0316

Page 31: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin
Page 32: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

• Greatly simplify the care and feeding of time-based indexes

• Point and click (or single API call) creation of time series shards

• Total control over number of shards and replication

• Easily defined retention and archiving policies (e.g. 30 day retention)

• Intelligent query parsing optimizes shard access

• Ideal for log data and signals

Time Series Done Right (Fusion 2.4)

Page 33: Extending Solr: Packaging Common - · PDF fileExtending Solr: Packaging Common Sense ... •Zookeeper for property ... //cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Experiments >> Rules

Fusion 2.4 will support:

Experiment management framework for large scale, multi-variate testing

Bandit algorithms for high volume experimentation

Capture and reporting of search (and other) metrics all from within Fusion