sfbay area solr meetup - june 18th: box + solr = content search for business
DESCRIPTION
"Box + Solr = Content Search for Business" - Wei Zhao, BoxTRANSCRIPT
![Page 1: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/1.jpg)
1
June 2014
Box + Solr = Content Search for Business
![Page 3: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/3.jpg)
3
to make organizations more productive, competitive and collaborative by connecting people and their most important information
Box mission
![Page 4: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/4.jpg)
4
25MM+ Users
225K+ Businesses
99% Fortune 500
![Page 5: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/5.jpg)
5
Box search mission is to make user content easy to discover.
![Page 6: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/6.jpg)
6
10Billion+ Documents
10TB+ Index size
100M+Daily requests
Box uses Solr for search
![Page 7: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/7.jpg)
7
Quick Search
![Page 8: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/8.jpg)
8
Quick Search
![Page 9: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/9.jpg)
9
Full Search
![Page 10: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/10.jpg)
10
Sharding – splitting the index
Agenda
Highly available search
A few more things
1
2
3
4
5 Q&A
Currently working on
![Page 11: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/11.jpg)
11
We shard things
![Page 12: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/12.jpg)
12
Shard ID = File ID % Total Shards
![Page 13: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/13.jpg)
13
Multi-tenant – One big logical index for all users
Solr index
Shard1 Shard2 Shard3 ShardN
![Page 14: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/14.jpg)
14
Search scope
![Page 15: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/15.jpg)
15
File ID: 12345
OwnerID: user1
Parent Folders IDs: folder1, folder2
File Name: Solr.ppt
File Content: blah......
A typical Solr Document
![Page 16: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/16.jpg)
16
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1Parent:Folder1Folder4
File 1 File 2
File 3 File 4
![Page 17: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/17.jpg)
17
User1 with no share folder
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1Parent:Folder1Folder4
Filter: User1
File 1 File 2
File 3 File 4
![Page 18: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/18.jpg)
18
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1Parent:Folder1Folder4
File 1 File 2
File 3 File 4
![Page 19: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/19.jpg)
19
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1Parent:Folder1Folder4
Filter: User1 + Folder2
File 1 File 2
File 3 File 4
![Page 20: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/20.jpg)
20
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder5
Owner: User1Parent:Folder1Folder4
File 1 File 2
File 3 File 4
Removed out of Folder2
![Page 21: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/21.jpg)
21
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder5
Owner: User1Parent:Folder1Folder4
Filter: User1 + Folder2
File 1 File 2
File 3 File 4
Removed out of Folder2
![Page 22: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/22.jpg)
22
Highly Available Search
![Page 23: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/23.jpg)
23
• Index is highly available
• Search functionality is highly available
![Page 24: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/24.jpg)
24
Index workflow
![Page 25: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/25.jpg)
25
Box Front End
UploadIndex Queue
Queue 1
Queue 2
Queue 3
Indexer 1
Indexer 3
Indexer 2
MySQL
Index1
Index2
Index2
![Page 26: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/26.jpg)
26
Search workflow
![Page 27: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/27.jpg)
27
Box Front End
query HA Proxy Head
nodeHA Proxy
1 2 3 N
Box Front End
query HA Proxy Head
nodeHA Proxy
1 2 3 N
Data center boundary
![Page 28: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/28.jpg)
28
A few more things
![Page 29: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/29.jpg)
29
File Content Search
![Page 30: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/30.jpg)
30
Box Front End
Upload
MySQL Box FileStorage
IndexerSolr Index
Text Extraction ExtractedText
![Page 31: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/31.jpg)
31
Multi-language support
![Page 32: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/32.jpg)
32
Raw file content
Language detector
English tokenizer
Spanish tokenizer
Japanese tokenizer
German tokenizer
file_content_en
File_content_es{hola}
file_content_ja....
File_content_de
![Page 33: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/33.jpg)
33
To Dos
• Scale language support
• Support document with mixed languages
![Page 34: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/34.jpg)
34
Search Warm-up
![Page 35: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/35.jpg)
35
• Front end informs backend to warm up on keyboard focus
• Backend prepares the search filter and caches it in a search session
• Backend sends a warm-up query to Solr
![Page 36: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/36.jpg)
36
What we are working on
![Page 37: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/37.jpg)
37
• Search suggestions
• Search operators
• Use machine learning to influence ranking
• Logical sharding
Things we are working on
![Page 38: SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business](https://reader036.vdocuments.mx/reader036/viewer/2022062703/55509845b4c90590208b46f4/html5/thumbnails/38.jpg)
38
Question?