search at twitter: presented by michael busch, twitter

Search @twitter

Michael Busch@[email protected] [email protected]

mailto:[email protected]




Agenda

‣ Introduction

- Search Architecture

- Lucene Extensions

- Outlook

Search @twitter

Introduction

Introduction

Twitter has more than 284 million monthly active users.

Introduction

500 million tweets are sent per day.

Introduction

More than 300 billion tweets have been sent since company founding in 2006.

Introduction

Tweets-per-second record: one-second peak of 143,199 TPS.

Introduction

More than 2 billion search queries per day.

Search @twitter

Agenda

- Introduction

‣ Search Architecture

- Lucene Extensions

- Outlook

Search Architecture

RT index

Search Architecture

RT streamAnalyzer/Partitioner

RT index(Earlybird)

Blender

RT indexArchive index

MapreduceAnalyzer

rawtweets

HDFS

searcheswrites

Searchrequests

analyzedtweets

analyzedtweets

rawtweets

Tweet archive

RT index

Search Architecture

TweetsAnalyzer/Partitioner

RT index(Earlybird)

Blender


queue

HDFS

Searchrequests

Updates Deletes/Engagement (e.g. retweets/favs)

searcheswrites

MapreduceAnalyzer

RT index

Search Architecture

RT index(Earlybird)

Blender


searcheswrites

Searchrequests

• Blender is our Thrift service aggregator

• Queries multiple Earlybirds, merges results

Social graph

Social graph

Social graphUser

search

Search Architecture

RT index(Earlybird)

Archive index

Usersearch

Search Architecture

RT index(Earlybird)

Archive index

• For historic reasons, these used to be entirely different codebases, but had similar features/technologies

• Over time cross-dependencies were introduced to share code

Usersearch

Lucene

Search Architecture

RT index(Earlybird)

Archive index

Usersearch

Lucene

Lucene Extensions

• New Lucene extension package

• This package is truly generic and has no dependency on an actual product/index

• It contains Twitter’s extensions for real-time search, a thin segment management layer and other features

Search @twitter

Agenda

- Introduction


‣ Lucene Extensions

- Outlook

Lucene Extensions

Lucene Extension Library

• Abstraction layer for Lucene index segments

• Real-time writer for in-memory index segments

• Schema-based Lucene document factory

• Real-time faceting

• API layer for Lucene segments

• *IndexSegmentWriter

• *IndexSegmentAtomicReader

• Two implementations

• In-memory: RealtimeIndexSegmentWriter (and reader)

• On-disk: LuceneIndexSegmentWriter (and reader)


• IndexSegments can be built ...

• in realtime

• on Mesos or Hadoop (Mapreduce)

• locally on serving machines

• Cluster-management code that deals with IndexSegments

• Share segments across serving machines using HDFS

• Can rebuild segments (e.g. to upgrade Lucene version, change data schema, etc.)


HDFS EarlybirdEarlybirdEarlybird

Mesos

Hadoop (MR)

RT pipeline


RealtimeIndexSegmentWriter

• Modified Lucene index implementation optimized for realtime search

• IndexWriter buffer is searchable (no need to flush to allow searching)

• In-memory

• Lock-free concurrency model for best performance

Concurrency - Definitions

• Pessimistic locking

• A thread holds an exclusive lock on a resource, while an action is performed [mutual exclusion]

• Usually used when conflicts are expected to be likely

• Optimistic locking

• Operations are tried to be performed atomically without holding a lock; conflicts can be detected; retry logic is often used in case of conflicts

• Usually used when conflicts are expected to be the exception

Concurrency - Definitions

• Non-blocking algorithm

Ensures, that threads competing for shared resources do not have their execution indefinitely postponed by mutual exclusion.

• Lock-free algorithm

A non-blocking algorithm is lock-free if there is guaranteed system-wide progress.

• Wait-free algorithm

A non-blocking algorithm is wait-free, if there is guaranteed per-thread progress.

* Source: Wikipedia

Concurrency

• Having a single writer thread simplifies our problem: no locks have to be used to protect data structures from corruption (only one thread modifies data)

• But: we have to make sure that all readers always see a consistent state of all data structures -> this is much harder than it sounds!

• In Java, it is not guaranteed that one thread will see changes that another thread makes in program execution order, unless the same memory barrier is crossed by both threads -> safe publication

• Safe publication can be achieved in different, subtle ways. Read the great book “Java concurrency in practice” by Brian Goetz for more information!

Java Memory Model

• Program order rule

Each action in a thread happens-before every action in that thread that comes later in the program order.

• Volatile variable rule

A write to a volatile field happens-before every subsequent read of that same field.

• Transitivity

If A happens-before B, and B happens-before C, then A happens-before C.

* Source: Brian Goetz: Java Concurrency in Practice

Concurrency

0RAM

int x;

Cache

Thread 1 Thread 2

time

Concurrency

0RAM

int x;

Cache 5

Thread 1 Thread 2

x = 5;

Thread A writes x=5 to cache

time

Concurrency

0RAM

int x;

Cache 5

Thread 1 Thread 2

x = 5;

while(x != 5);time

This condition will likely never become false!

Concurrency

0RAM

int x;

Cache

Thread 1 Thread 2

time

Concurrency

0RAM

int x;

1

Cache

Thread 1 Thread 2

time

volatile int b;

x = 5;5

Thread A writes b=1 to RAM, because b is volatile

b = 1;

Concurrency

0RAM

int x;

1

Cache

Thread 1 Thread 2

time

volatile int b;

x = 5;5

Read volatile b

b = 1;

int dummy = b;

while(x != 5);

Concurrency

0RAM

int x;

1

Cache

Thread 1 Thread 2

time

volatile int b;

x = 5;5b = 1;

int dummy = b;

while(x != 5);

• Program order rule: Each action in a thread happens-before every action in that thread that comes later in the program order.

happens-before

Concurrency

0RAM

int x;

1

Cache

Thread 1 Thread 2

time

volatile int b;

x = 5;5b = 1;

int dummy = b;

while(x != 5);

happens-before

• Volatile variable rule: A write to a volatile field happens-before every subsequent read of that same field.

Concurrency

0RAM

int x;

1

Cache

Thread 1 Thread 2

time

volatile int b;

x = 5;5b = 1;

int dummy = b;

while(x != 5);

happens-before

• Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.

Concurrency

0RAM

int x;

1

Cache

Thread 1 Thread 2

time

volatile int b;

x = 5;5b = 1;

int dummy = b;

while(x != 5);

This condition will be false, i.e. x==5

• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.

Concurrency

0RAM

int x;

1

Cache

Thread 1 Thread 2

time

volatile int b;

x = 5;5b = 1;

int dummy = b;

while(x != 5);

Memory barrier

• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.

Concurrency

IndexWriter IndexReader

time

write 100 docs

maxDoc = 100

in IR.open(): read maxDoc

search upto maxDoc

maxDoc is volatile

write more docs

Concurrency

IndexWriter IndexReader

time

write 100 docs

maxDoc = 100

in IR.open(): read maxDoc

search upto maxDoc

maxDoc is volatile

write more docs

happens-before

• Only maxDoc is volatile. All other fields that IW writes to and IR reads from don’t need to be!

• Not a single exclusive lock

• Writer thread can always make progress

• Optimistic locking (retry-logic) in a few places for searcher thread

• Retry logic very simple and guaranteed to always make progress

Wait-free

In-memory Real-time Index

• Highly optimized for GC - all data is stored in blocked native arrays

• v1: Optimized for tweets with a term position limit of 255

• v2: Support for 32 bit positions without performance degradation

• v2: Basic support for out-of-order posting list inserts


• RT term dictionary

• Term lookups using a lock-free hashtable in O(1)

• v2: Additional probabilistic, lock-free skip list maintains ordering on terms

• Perfect skip list not an option: out-of-order inserts would require rebalancing, which is impractical with our lock-free index

• In a probabilistic skip list the tower height of a new (out-of-order) item can be determined without knowing its insert position by simply rolling a dice


• Perfect skip list


• Perfect skip list

Inserting a new element in the middle of this skip list requires re-balancing the towers.


• Probabilistic skip list


• Probabilistic skip list Tower height determined by rolling a dice BEFORE knowing the insert location; tower height

never has to change for an element, simplifying memory allocation and concurrency.

• Apps provide one ThriftSchema per index and create a ThriftDocument for each document

• SchemaDocumentFactory translates ThriftDocument -> Lucene Document using the Schema

• Default field values

• Extended field settings

• Type-system on top of DocValues

• Validation

Schema-based Document factory


Schema

Lucene Document

SchemaDocumentFactory

Thrift Document

• Validation

• Fill in default values

• Apply correct Lucene field settings


Schema

Lucene Document

SchemaDocumentFactory

Thrift Document

• Validation

• Fill in default values

• Apply correct Lucene field settings

Decouples core package from specific product/index. Similar

to Solr/ElasticSearch.

Search @twitter

Agenda

- Introduction


- Lucene Extensions

‣ Outlook

Outlook

Outlook

• Support for parallel (sliced) segments to support partial segment rebuilds and other cool posting list update patterns

• Add remaining missing Lucene features to RT index

• Index term statistics for ranking

• Term vectors

• Stored fields

Questions?Michael Busch@[email protected] [email protected]





Backup Slides

Searching for top entities within Tweets

• Task: Find the best photos in a subset of tweets

• We could use a Lucene index, where each photo is a document

• Problem: How to update existing documents when the same photos are tweeted again?

• In-place posting list updates are hard

• Lucene’s updateDocument() is a delete/add operation - expensive and not order-preserving


• Task: Find the best photos in a subset of tweets

• Could we use our existing time-ordered tweet index?

• Facets!

Inverted index

Query Doc ids

Forward indexDoc id Document

Metadata

Facetindex

Term id Term label

Doc id Term ids


Storing tweet metadata

FacetindexDoc id Term ids

Facetindex

Matching doc id

Term ids

5 15 9000 9002 100000 100090

48239 831241 2

Top-k heap

Id Count

Query


Facetindex

Matching doc id

Term ids

5 15 9000 9002 100000 100090

48239 1531241 1285932 86748 3

Top-k heap

Id Count

Query


Facetindex

Matching doc id

Term ids

5 15 9000 9002 100000 100090

48239 1531241 1285932 86748 3

Top-k heap

Id Count

Query

Weighted counts (from engagement features) used

for relevance scoring


Facetindex

Matching doc id

Term ids

5 15 9000 9002 100000 100090

48239 1531241 1285932 86748 3

Top-k heap

Id Count

Query

All query operators can be used. E.g. find best photos in

San Francisco tweeted by people I follow


Inverted indexTerm id Term label


pic.twitter.com/jknui4w 45pic.twitter.com/dslkfj83 23pic.twitter.com/acm3ps 15pic.twitter.com/948jdsd 11pic.twitter.com/dsjkf15h 8pic.twitter.com/irnsoa32 5

48239 4531241 2385932 156748 11

74294 83728 5

Id Count Label Count

Inverted index


Summary

• Indexing tweet entities (e.g. photos) as facets allows to search and rank top-entities using a tweets index

• All query operators supported

• Documents don’t need to be reindexed

• Approach reusable for different use cases, e.g.: best vines, hashtags, @mentions, etc.

search at twitter: presented by michael busch, twitter

Technology