building a recommendation engine with spring and hadoop
DESCRIPTION
Speaker: Michael Minella Big Data Track The Amazon’s and Google’s of the world have had Ph.D.’s locked up in back rooms for years creating algorithms to get you to click on things and subsequently buy stuff. One of the big things that those smart people have been working on are recommendation engines. Today, a recommendation engine isn’t something that only the Amazon’s of the world can have. With an hour, and a handful of open source tools, we’ll build a recommendation engine based on the data from the website we probably spend the most time on…StackOverflow. We’ll use Spring XD and Spring Batch to orchestrate the full lifecycle of Hadoop processing (ingest, process, export) and use Apache Mahout to provide us with the recommendation processing. A basic understanding of Hadoop concepts (what Map/Reduce is) and Spring (basic D/I configuration) is expected for this talk.TRANSCRIPT
BUILDING
ENGINES
WITH SPRING
MICHAEL MINELLATWITTER: @MICHAELMINELLA
HOME PAGE: SPRING.IO/TEAM/MMINELLA
WHAT I’M NOT
https://github.com/SpringOne2GX-2014/
THANK YOUSEBASTIAN SCHELTERPAT FERREL
13
RECOMMENDATION
ALGORITHMS
L E T ’ S S E T S O M E
EXPECTATIONS
SCALE OF THE PROBLEM
MILLIONS OF
USERS
100,000’s OF
ITEMS
TOOLS AND
TECHNOLOGIES
1SPRING BOOT
2MYSQL
3HADOOP
4SPRING XD
5MAHOUT
SPRING XDEXTREME DATA
APPLICATIONCOMPLEXITY
L O T S O F
BOILERPLATE
MANY DOMAINS TO
BRIDGE
I N C O N S I S T E N T
APIS
SOURCE, CHANEL, SINK
DATA FLOW MODEL
ADAPTER, CHANEL, FILTER, TRANSFORMER, ETC
EIP PATTERNS
=
JOB, CONNECTOR
IMPORT/EXPORT
JOB, ITEMREADER/ITEMWRITER
BATCH PROCESSING
=
WORKFLOW, ACTION
WORKFLOWORCHESTRATION
JOB, STEP
BATCH PROCESSING
=
SPRING XDEXTREME DATA
SPRING
Ingestion
Orchestration
Extraction
Real-time
Analytics
D I S T R I B U T E D
RUNTIME
STREAMING
BATCH&
--directory=/xd/dir1
filter --expression=“payload?.price > 3.00” |
http | hdfs--port=8181
BATCH PROCESSING FOR
HEAVY LIFTING
JOB
STEP
TASKLET
CHUNK
SPRING FOR
APACHE HADOOP
TOTAL LINES OF CUSTOM CODE
47 Lines of Java
29 Lines of XML
6 Spring XD Shell Commands
RECOMMENDATION
ALGORITHMS
PREDICTING THE
FUTURE
C O L L A B O R AT I V E
FILTERING
TWO OPTIONS
USER BASED
USER ITEM 1ITEM 2ITEM 3ITEM 4ITEM 5
DEREK
MICHAEL
PHIL
DARREL ?
USER BASED
USER BASED
ITEM BASED
?
ITEM DEREKMICHAELPHILDARREL
ITEM 1
ITEM 2
ITEM 3
ITEM 4
ITEM 5
ITEM BASED
ITEM BASED
PEOPLE ARE
FUNNY
USER_ID, TAG_ID, VOTES
TAG_ID, TAG_ID, SCORE
LOOKING INTO THE
FUTURE
SNAPSHOTS AHEAD!
MAP REDUCE
M A P R E D U C E
PROBLEMS
A P I I S V E R Y
LOW LEVEL
H I G H
LATENCY
N O T A LWAY S A
GOOD FIT
POTENTIALLY
FASTER
HIGHER LEVEL
APIS
scala> textFile.count()
res0: Long = 126
USER_ID, TAG_ID, VOTES
TAGID,TAGID:RANK…
U S E A
SEARCH ENGINE1
D ATA
NORMALIZATION2
Learn More. Stay Connected.
Spring BatchProject: spring.io/spring-batchGithub: github.com/spring-projects/spring-batchJira: jira.spring.io/browse/BATCH
Spring BootProject: spring.io/spring-bootGithub: github.com/spring-projects/spring-boot
Spring XDProject: spring.io/spring-xdGithub: github.com/spring-projects/spring-xdJira: jira.spring.io/browse/XD
Twitter: twitter.com/springcentral
YouTube: spring.io/video
LinkedIn: spring.io/linkedin
Google Plus: spring.io/gplus
Servers by Jaime Carrion
from The Noun Project
Question by Jessica Lock
from The Noun Project
Check Box by Hrag Chanchanian
from The Noun Project
Crane by Kenneth Von Alt
from The Noun Project
Nut by Naomi Atkinson
from The Noun Project
Funnel by Volodin Anton
from The Noun Project
Circuit by Piotrek Chuchla
from The Noun Project
Puzzle by Matthew Hall
from The Noun Project
Database by Anton Outkine
from The Noun Project
Network by Mister Pixel
from The Noun Project
Puzzle by Eric M. Ellis
from The Noun Project
People by Wilson Joseph
from The Noun Project
Maze by Gilbert Bages
from The Noun Project
Fork by Dmitry Baranovskiy
from The Noun Project
Algebra by Ilsur Aptukov
from The Noun Project
Thumbs Up by Jørgen Bovolden
from The Noun Project
Scale by Edward Boatman
from The Noun Project
Users by Vittorio Maria Vecchi
from The Noun Project
Flow Chart by Michael Wohlwend
from The Noun Project
Running by Dimiter Petrov
from The Noun Project
Move by Dmitry Baranovskiy
from The Noun Project
Running by Dimiter Petrov
from The Noun Project
Abacus byAlice Mortaro
from The Noun Project
Stopwatch by Scott Lewis
from The Noun Project
Lego by jon trillana
from The Noun Project
Lego by jon trillana
from The Noun Project
Lego by jon trillana
from The Noun Project
Lego by Jake Dunham
from The Noun Project
TheEnd