linked data, big data, and user science at globo.com
TRANSCRIPT
Ícaro Medeiros [email protected] [email protected]
I Encontro de Computação Semântica
@UFRJ 11/03/2015
LINKED DATA BIG DATA
USER SCIENCE@
globo.com
( icaro, home_globoesporte, pageview@23:00 )
( icaro, materia_1, scroll+2min@14:00 )
Signals
( materia_1: [messi, neymar, barcelona] )content description
LINKED DATA (content)
Ontologies‣ 288 classes
‣ Person: 65K
‣ Place: 50K
‣ Athlete: 22K
‣ Politicians: 32K
Annotation tool
Interface follows the ontology
Fields
Search ranges
Suggest as you type
Triples stored in Virtuoso
Automatic entity extraction
Fast search in Elastic Search
Contextual navigation
globoesporte.com
globoesporte.com
globoesporte.com
Automatic page generation
Intelligent Search
BIG DATA
Cluster Stats
‣ 10 machines
‣ 1 TB RAM
‣ 500 TB disk
‣ 338 VCores
Signal Capturing
Beyond clicks (engagement science)
‣ Attention-based metrics
‣ Scroll
‣ Time spent on page
‣ Dwell time
‣ Social Media Analytics
http://labs.yahoo.com/publication/beyond-clicks-dwell-time-for-personalization/
Shares are noisy
http://time.com/12933/what-you-think-you-know-about-the-web-is-wrong/
Scroll
http://time.com/12933/what-you-think-you-know-about-the-web-is-wrong/
Recommendation
‣ TF-IDF
‣ Collaborative Filtering
‣ Users
‣ Content
‣ Latent Factor Analysis
USER SCIENCE for news reading
User Modeling (for news reading)
‣ Dynamic profiling
‣ Explicit personal data
‣ Interests (implicit)
‣ Temporal constraints: periodicity
Signal Capturing
Excelsior
Signals
Semantic User Modeling
‣ Annotations from engaged content
‣ Profile can answer:
‣ My favourite team
‣ City I live in
‣ My hometown
Spreading Activation
My profile on
City/State I live in
Hometown and State
Football team test (3.5MM users)
82% precision
95% precision@top3
* When the user has read at least one article that cites their team
How fast?
mean request time
between interaction
and profile update
5 min 48 ms
Potential uses
‣ Personalized homepages
‣ Targeted advertising
‣ Granular user/content description
‣ Semantic Recommendation
‣ Clustering
‣ Demographic data
‣ Informed product creation/evolution
github.com/globocom/
IWantToWorkAtGloboCom
Ícaro Medeiros [email protected]
Semantic team [email protected]
globo.com
slides icaromedeiros.com.br
slideshare.net/icaromedeiros