© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Searching for Success
Amazon CloudSearch and Relational Databases
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Agenda
Finding things• Types of Databases
Making Choices
What is CloudSearch?
Combining CloudSearch with Relational
Sample Code
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Finding Things
So Many Databases
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Finding Your Information
Your users need to find things• What do you use?
A Database!• What Kind?
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
It's a Big World Out There!
"Database" != "Relational Database"
Tons of relational databases• Amazon RDS• MySQL• MSSQL• Oracle
but…
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Many Other Types
NoSQL databases• Dynamo, Cassandra, CouchDB…
Graph databases• Neo4J, Titan, …
Column oriented databases• Redshift, Bigtable…
Text Search Engine• CloudSearch, Lucene, Autonomy...
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Text Search Engine
Good at text queries• "Harry Potter and the Philosopher's Stone"
Harry Potter and the Philosopher's Stone
harry potter and the philosopher's stone
harry potter and the philosopher stone
harry potter philosopher stone
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Text Search Engine
Basic element is the document
Documents are made of fields"title" => "star wars"
Fields can be• Missing• Multi-valued• Variable length
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Text Search Engine
Documents are not "normalized"• In a relational database
• A movie table• A director table• An actor table
• In CloudSearch• One document per movie
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
RelationalID Document
1 title:star trek actor: chris pine zacchary quinto zoe saldana directory: j j abrams
ID Title
1 Star Wars
2 Star Trek
3 Dark Star
ID Actor
1 Zacchary Quinto
2 Chris Pine
3 Zoë Saldana
ID Director
1 J.J. Abrams
2 George Lucas
3 John Carpenter
Text Search Engine
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Relevance
Key differentiator for text search
Not "does this match?"• "how WELL does this match?
Includes multiple factors• Term Frequency, Document Frequency, Proximity
Users can customize this• Distance• Popularity• Field Weighting
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Text is more than "War & Peace"
It's not just books & blog posts
Meta-data• Author, Title, Category, Tags• Can include numbers: counts, dates, latitude,…
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Making Choices
Relational? CloudSearch?
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Relational Database
Good at • Exact matches• Joins• Atomic Transactions
Not so good at• Relevance
• How well does this match?
• Handling words
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Text Search Engines
Good at finding • Words, Phrases• Relevance
Not so good at• Joins• Transactions
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Options for Search
Can I just use a relational database?• Yes.
Do I want to just use a relational database?• Probably not
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Simple Approach
Widely supported, easy
SELECT id, title FROM books WHERE title LIKE "%amazon%"
Does not perform well
Doesn't deal with multiple words
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Text Extensions for Relational Databases
Vendor specificSELECT id,title FROM books WHERE MATCH(title) AGAINST('Harry Potter') IN NATURAL LANGUAGE MODE
• Use different index structures• Typically MUCH less mature than relational code• More manual processes
• Scaling, (if possible)• Managing
• minimal relevance, no control
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Appropriate Tools
VS
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Options
Relational database• Weak relevance• Scaling & performance limits
Text Search Engine• No transactions & locking• No Joins
Both• Some extra effort, then best of both worlds
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
What is Amazon CloudSearch?
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
CloudSearch
Fully-managed text search engine
High Performance
Automatically Scaling
Reliable, Resilient
Based on Amazon Product Search
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Search Features
Faceting
Complex queries• (and 'potter harry' (not author:'rowling'))
Configurable synonyms, stemming & stopwords
Custom Sorting/Ranking
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Scaling
CloudSearch scales automatically• Handle your spikes• Plan for success, but don't spend until you need it• Handle more data• Scaling is seamless – no downtime
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Automatic Scaling
SEARCH INSTANCEIndex Partition n
Copy 1
SEARCH INSTANCEIndex Partition 2
Copy 2
SEARCH INSTANCEIndex Partition n
Copy 2
SEARCH INSTANCEIndex Partition 2
Copy n
SEARCH INSTANCE
DATA Document Quantity and Size
TRAFFICSearch Request Volume and Complexity
Index Partition nCopy n
SEARCH INSTANCEIndex Partition 1
Copy 1
SEARCH INSTANCEIndex Partition 2
Copy 1
SEARCH INSTANCEIndex Partition 1
Copy 2
SEARCH INSTANCEIndex Partition 1
Copy n
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Easy to Use
Rest API
Simple to add• Http Post
Simple to query• q=star trek
Simple to integrate• JSON
Documents
CloudSearch
Queries
HTTP
HTTP
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon CloudSearch Architecture
DNS / Load Balancing AWS Query
Search API Console ConfigAPI
CommandLine Tools
ConsoleDoc Svc API
CommandLine Tools
Console
SEARCH SERVICE DOCUMENT SERVICE CONFIG SERVICE
Search Domain
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
What Can You Search For With CloudSearch?
Wine
Your college buddies
Curly hair products
Downton Abbey episodes
News in Bermuda
Playoff tickets
Online courses
Cat memes
Furniture
Doctor reviews
Take out food
Vacation rentals
Trademarks
African safaris
Kids arts & crafts
French dating/marriage
Online videos
Recipes
Weather insurance
Fashion news
Bollywood music
Stock artAnd more!
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Combining CloudSearch+
Relational Database
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Combining the Two
Best of both worlds• Relational queries run on relational database• Text queries run on CloudSearch
Downside: Complexity• More moving parts• Synchronization
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Synchronization
Which one is the master?• Usually the relational database
Updates• All at once• At regular intervals• When data is available
Deletes
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Dataflow
One source
Simultaneous updates
RDBMS
CloudSearch
LoaderSource
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Dataflow
One source
Two loaders
RDBMS CloudSearchLoaderSource
Loader
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Dataflow
One source
Log updates
Two loader
RDBMS CloudSearchLoaderSource
Log Loader
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Dataflow
RDBMS CloudSearchLoaderSource
Log Loader
Source
Source
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Sample Code
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Dataflow
One source
Two loaders
RDBMS CloudSearchLoaderSource
Loader
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Java Example
Read from MySQL• JDBC – Nothing special
Post to CloudSearch• Apache HTTP Client
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Libraries
Apache • HTTP Client• HTTP Core• Commons Logging
AWS Java SDK
MySQL connector
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Source Files
CloudSearchRDS• Just does the setup for the demo
ExtractAndUpload• Does the main work
Batcher• Groups documents into batches
PosterHttp• Posts to CloudSearch
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Main Loop
ResultSet rs = stmt.executeQuery("select * from movies");ResultSetMetaData meta = rs.getMetaData();for (int col = 1; col <= meta.getColumnCount(); col++)
names.add(meta.getColumnName(col));while (rs.next()) {
int version = (int) (lastModified.getTime() / 1000);JSONObject doc = new JSONObject();for (String name : names) {
doc.put(name, rs.getString(name));}String id = rs.getString("id");if (batcher != null) {
batcher.addDocument(doc, version, id);}
}
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
SQL
select * from movies;
select key as id, title as name from movies
Denormalizing may require multiple queries
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Demo
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Search: It's not just for Relational Data
You can pull data from • S3• Redshift• Web• Internal Documents• And more…
And make it searchable
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Indexing S3
ListObjectsRequest listObjectsRequest = new ListObjectsRequest().withBucketName(bucketName);
ObjectListing objectListing;
do {
objectListing = s3client.listObjects(listObjectsRequest);
for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
processObject(objectSummary);
}
listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Summary
Use the right tool!• Text Search for Searching Text
CloudSearch is fully managed text search
Easy to get data from relational DB
Easy to load data into CloudSearch
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Next Step: Free Trial
One month (750 hours) free.
Set up an account
Give it a try!
Questions? • [email protected]