machine learning techniques for the semantic web
DESCRIPTION
TRANSCRIPT
Machine Learning
Semantic Web
What is Semantic Web?
Ontology
RDF
Machine Learning is about Data
actually...
Making Predictions Based on Data
FOAFSimple Example
Marco Neumann<http://www.marconeumann.org/foaf.rdf> <http://xmlns.com/foaf/0.1/knows> <http://community.linkeddata.org/dataspace/person/kidehen2/about.rdf> .<http://www.marconeumann.org/foaf.rdf> <http://xmlns.com/foaf/0.1/knows> <http://www.johnbreslin.com/foaf/foaf.rdf> .<http://www.marconeumann.org/foaf.rdf> <http://xmlns.com/foaf/0.1/knows> <http://swordfish.rdfweb.org/people/libby/rdfweb/webwho.xrdf> .<http://www.marconeumann.org/foaf.rdf> <http://xmlns.com/foaf/0.1/knows> <http://danbri.org/foaf.rdf> .
Marco only knows 4 people?
Two Degrees Out
4 - <http://www.w3.org/People/Connolly/home-smart.rdf>4 - <http://jibbering.com/foaf.rdf>2 - <http://sw.deri.org/~haller/foaf.rdf>2 - <http://sw.deri.org/~knud/knudfoaf.rdf>2 - <http://www-cdr.stanford.edu/~petrie/foaf.rdf>
Three Degrees
9 - <http://sw.deri.org/~knud/knudfoaf.rdf>8 - <http://www.w3.org/People/Connolly/home-smart.rdf>7 - <http://jibbering.com/foaf.rdf>6 - <http://www.aaronsw.com/about.xrdf>5 - <http://sw.deri.org/~aharth/foaf.rdf>
but that’s not really machine learning
Short
Machine Learning is
• How you formulate the problem
• How you represent the data
• Graphical Models
• Vector Space Models
Back to FOAFConvert RDF triples to vector space
We Want to Find Groups of People
To make predictions on their interests...
(subject) (predicate) (object)Paul knows JeffPaul knows JoePaul knows MarcoJeff knows Joe
Vector Space Representation
Jeff Joe Marco Paul
Jeff 1 1
Joe 1 1
Marco 1
Paul 1 1 1
Latent Factors Analysis
• Used in Latent Semantic Indexing (LSI)
• Good for finding synonyms
• Good for finding “genres”
Latent Factors Methods
• Principle Component Analysis (PCA)
• Singular Value Decomposition (SVD)
• Restricted Boltzmann Machines (RBM)
Considerations for Semantic Web Data
• Large Data Sets
• Sparse Data Sets
Netflix Prize Research
• Movie Review Data set has similar problems
• Generalized Hebbian Algorithm for Dimensionality Reduction in NLP (Gorrell ’06.)
Reduce Dimensions
• 1m x 1m matrix with 1m people
• Reduce to 1m x 100
100 Latent FactorsRepresent different groups of people based on who
they know.
Factor 1 Factor 2
Paul 0.678 0.311
Joe 0.455 0.432
Jeff 0.476 0.398
Marco 0.203 0.789
What the Data Might Look Like
Find Similar Peoplek Nearest Neighbors
Pick a Similarity Metric
• Euclidean Distance
• Jaccard index
• Cosine Similarity
Joe’s Similarity to Paul(Paul (f1) - Joe (f1))^2 + (Paul (f2) - Joe (f2))^2)^1/2
• Fill In Missing Interests
• Target Ads, Content, Products
• ???
• Profit!
Once We’ve Calculated Similarities
Generalizing RDF Triples to Vector Space
• Subjects are Rows
• Objects are Columns
• Predicates are values
Object 1 Object 2
Subject 1 Predicate
Subject 2
Predicates Should be Mutually Exclusive
• Paul likes Ruby
• Paul hates PHP
• Paul loves PHP
Assign Values to Predicates
• 1 = Hates
• 2 = Dislikes
• 3 = Neutral
• 4 = Likes
• 5 = Loves
More Applications
Supervised Learning
• Classifiers
• Ontology Mapping
• Assigning Instances to Concepts
Ontology Mapping
• Examples from Ontology A
• Examples from Ontology B
Train Classifiers
• One Classifier for each Concept in A
• One Classifier for each Concept in B
Classify Instances
• Use A Classifiers to predict which concepts B instances map to
• Use B Classifiers to predict which concepts A instances map to
Use Classified Instances
• Predict Concept Mappings
• Which in A match ones in B
Limitations
• One Classifier per Concept
• Large Ontologies Could be a Problem
• Ontologies should be a little similar
Unsupervised Learning
• Clustering
• Hierarchical Clustering
• Learning Ontologies from Text
Machine Learning as Triage
• Automatically tag or recommend Examples the algorithm is Certain About
• Send uncertain examples to human for review
Thank YouPaul Dix
[email protected]://pauldix.net