leveraging the power of solr with spark
TRANSCRIPT
![Page 1: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/1.jpg)
Leveraging the Power of SOLR with SPARK
Johannes Weigend QAware GmbH Germany pache Big Data Europe
September 2015
![Page 2: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/2.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Welcome
• Johannes Weigend- CTO QAware GmbH- Software architect / developer- 25 years of experience- Custom enterprise solutions (Java, JS,…)- Lecturer for UI development at the University of
Applied Science in Rosenheim - Focus on performance and scalability- SOLR user since 2011
2
![Page 3: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/3.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Brute Force Data Analysis
3
Read Read Read
Filter Filter Filter
Map Map Map
Reduce
Dataflow
Not Indexed
foreach() -> Minutes / Hours
![Page 4: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/4.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Search Based Data Analysis
4
Filter
Search Search Search
Map Map Map
Reduce
DataflowFilter Filter
Indexed Data (There’s no free lunch)
foreach() -> Seconds/Minutes
![Page 5: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/5.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Agenda
SOLR cloudDemo
SPARK clusterDemo
Importing data into SOLR with SPARKDemo
Analysis with SOLR and SPARKDemo
5
1
2
3
4
![Page 6: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/6.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
• Horizontally scalable, distributed NoSQL (Index) Database • Document oriented• A document is a collection of fields (string, number, date, …)• Simple and multiple fields (similar to arrays)• Schema and schema less• Powerful query language (Lucene)
• Distributed data in shards• Replication• Powerful full text search capabilities• Aggregation functions (aka facets)• Stable —> V 5.3
6
1 2 3 4
![Page 7: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/7.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
SOLR@QAware
• AIR• Aftersales Information Research
• ZEBRA• Part explosion for complex products
• EKG • Software Electro Cardiogram
• QAsearch• Enterprise search across all repositories including
history
7
![Page 8: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/8.jpg)
8
![Page 9: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/9.jpg)
9
![Page 10: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/10.jpg)
10
![Page 11: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/11.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Apache SOLR for BigData Analysis?
• Text Search Engine?• Aggregations?• Slice and Dice?• Pivots?
11
![Page 12: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/12.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Demo: SOLR Cloud
• Installing and configuring SOLR Cloud• Searching, sorting and filtering• Facets
• Terms (count by term)• Ranges (count in range)• Functions (avg, sum, …)• Sub-Facets (pivot)
12
![Page 13: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/13.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Counting as Term Facet
13
![Page 14: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/14.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Statistics as Function Facet
14
![Page 15: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/15.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Pivots as Sub Facets
15
![Page 16: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/16.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
careerbuilder.com
16
![Page 17: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/17.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Banana
17
![Page 18: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/18.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany 18
![Page 19: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/19.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
What’s Missing?
• Client-side processing of SOLR results does not scale• No built-in M/R support• Where to store really big data?
• Images• Videos• Binaries / large text documents
• No interfaces to R / ML
19
![Page 20: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/20.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
• Distributed job execution engine• Map/Reduce framework• Scala based (runs on JVM)• Java/Scala/Python APIs• Processes data from various data sources
• Textfiles (accessible from all nodes)• Hadoop File System (HDFS)• Databases (JDBC)• SOLR!
20
1 2 3 4
Must Read: https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
![Page 21: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/21.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Combining Spark with SOLR
• Use Cases• Distributed ETL – Importing data into SOLR-
Cloud• Our Usecase: importing N logfiles into SOLR
• Distributed processing – data analysis• Statistics on binary data• Map/Reduce
21
![Page 22: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/22.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Four Ways to Import Data into SOLR 1. Using built-in functions
post scriptDataimport handler,Admin-UI
2. Writing custom parallel code using the SOLRJ API 3. Using and customizing Apache Nutch (Hadoop !)4. Using and customizing Apache Spark
22
![Page 23: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/23.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Demo: Import Logfiles with Spark• Writing a Spark job which imports a bunch of
logfiles in one directory • Using Lucidwork’s Solr-Spark library
23
1 2 3 4
![Page 24: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/24.jpg)
24
![Page 25: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/25.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Demo: Distributed Analysis with Spark• Write a Spark Job which calculates the Duration of Business Actions • Use Spark to access SOLR per SQL / JDBC
25
1 2 3 4
![Page 26: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/26.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
SolrRDD - The Spark Abstraction to process SOLR Resultshttps://github.com/LucidWorks/spark-solr
26
![Page 27: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/27.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
SPARK Supports Parallel SQL
27
![Page 28: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/28.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Dataframe API
28
![Page 29: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/29.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
SPARK WorkerSOLR 5.3SHARD #4
29
Odroid XU4 2 GB RAM 64 GB eMMC Disk Ubuntu Linux 70$
SPARK WorkerSOLR 5.3SHARD #3
SPARK WorkerSOLR 5.3SHARD #1
SPARK WorkerSOLR 5.3SHARD #2
SPARK Master
SOLR 5.3SHARD #0
SPARK Worker
ZOOKEEPERNFS
40 Cores 10 GB RAM 320 GB eMMC Disk
![Page 30: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/30.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Summary
30
![Page 31: Leveraging the power of solr with spark](https://reader031.vdocuments.mx/reader031/viewer/2022022200/58a316221a28ab1d068b5943/html5/thumbnails/31.jpg)
Apache Big Data Europe | Leveraging the power of SOLR with SPARK | Johannes Weigend | QAware GmbH Germany
Any Questions ?
31