phd viva - disambiguating identity web references using social data

67
hew Rowe - Disambiguating Identity Web References using Social Data Disambiguating Identity Web References using Social Data Matthew Rowe Organisations, Information and Knowledge Group Department of Computer Science University of Sheffield

Upload: matthew-rowe

Post on 08-May-2015

1.200 views

Category:

Education


2 download

TRANSCRIPT

Page 1: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguating Identity Web References using Social Data

Matthew Rowe

Organisations, Information and Knowledge GroupDepartment of Computer Science

University of Sheffield

Page 2: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Outline

• Problem Setting• Research Questions• Claims of the Thesis• State of the Art• Requirements for Disambiguation and Seed Data• Disambiguating Identity Web References

– Leveraging Seed Data from the Social Web– Generating Metadata Models– Disambiguation Techniques

• Evaluation• Conclusions• Dissemination and Impact

Page 3: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Personal Information on the Web

• Personal information on the Web is disseminated:– Voluntarily– Involuntarily

• Increase in personal information:– Identity Theft– Lateral Surveillance

• Web users must discover their identity web references– 2 stage process

• Finding• Disambiguating

– Disambiguation = reduction of web reference ambiguity• My thesis addresses disambiguation

Page 4: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Ambiguity!

Page 5: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Matthew Rowe: Composer

Page 6: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Matthew Rowe: Cyclist

Page 7: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Matthew Rowe: Gardener

Page 8: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Matthew Rowe: Song Writer

Page 9: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Matthew Rowe: PhD Student

Page 10: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Problem Setting

• Performing disambiguation manually:– Time consuming– Laborious

• Handle masses of information– Repeated often

• The Web keeps changing

• Solution = automated techniques– Alleviate the need for humans– Need background knowledge

• Who am I searching for?• What makes them unique?

Page 11: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Research Questions

How can identity web references be disambiguated automatically?

1. Alleviate human processing:• Can automated techniques replace humans?

2. Supervision:• Can automated techniques function independently?

3. Seed Data:• How can this be gathered inexpensively?

4. Interpretation:• How can automated techniques interpret information?

Page 12: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Claims of the Thesis

• Automated disambiguation techniques are able to replace human processing– Retrieve and process information at large-scale– With high accuracy

• Data found on Social Web platforms is representative of real identity information– Platforms allow users to build a digital identity

• Social data provides the background knowledge required by automated disambiguation techniques– Overcoming the burden of seed data generation

Page 13: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

State of the Art

• Disambiguation techniques are divisible into 2 types: – Seeded techniques

• E.g. [Bekkerman and McCallum, 2005], Commercial Services • Pros

– Disambiguate web references for a single person• Cons:

– Require seed data– No explanation of how seed data is acquired

– Unseeded techniques• E.g. [Song et al, 2007]• Pros

– Require no background knowledge• Cons

– Groups web references into clusters– Need to choose the correct cluster

Page 14: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Requirements

• Requirements for Seeded Disambiguation:– Bootstrap the disambiguation process with minimal supervision– Achieve disambiguation accuracy comparable to human processing– Cope with web resources not containing seed data features– Disambiguation must be effective for all individuals

• Requirements for Seed Data:– Produce seed data with minimal cost– Generate reliable seed data

Page 15: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguating Identity Web References

Page 16: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Harnessing the Social Web

• WWW has evolved into a web of participation• Digital identity is important on the Social Web

• Digital identity is fragmented across the Social Web• Data Portability from Social Web platforms is limited

http://www.economist.com/business/displaystory.cfm?story_id=10880936

Page 17: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Data found on Social Web platforms is representative of real identity information

Page 18: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

User Study

• 50 participants from the University of Sheffield • Consisted of 3 stages, each participant:

1. List real world social network2. Extract digital social network3. Compare networks

M Rowe. The Credibility of Digital Identity Information on the Social Web: A User Study. In proceedings of 4th Workshop on Information Credibility on the Web, World Wide Web Conference 2010. Raleigh, USA. (2010)

Data found on Social Web platforms is representative of real identity information

Relevance: 0.23Coverage: 0.77

Updates previous findings [Subrahmanyam et al, 2008]

Page 19: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguating Identity Web References

Page 20: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Leveraging Seed Data from the Social Web

3. Seed Data:• How can this be gathered inexpensively?

Page 21: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Leveraging Seed Data from the Social Web

M Rowe and F Ciravegna. Getting to Me - Exporting Semantic Social Network Information from Facebook. In proceedings of Social Data on the Web Workshop, ISWC 2008, Karlsruhe, Germany. (2008)

Use Semantics!

http://www.dcs.shef.ac.uk/~mrowe/foafgenerator.html

Page 22: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Leveraging Seed Data from the Social Web

Link things together!

Page 23: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Leveraging Seed Data from the Social Web

1. Blocking Step• Only compare people with

the same name2. Compare values of Inverse

Functional Properties• E.g. Homepage/Email

3. Compare Geo URIs• E.g. Matching locations

4. Compare Geo data• Using Linked Data sources

M Rowe. Interlinking Distributed Social Graphs. In proceedings of Linked Data on the Web Workshop, World Wide Web Conference, Madrid, Spain. (2009)

Page 24: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Leveraging Seed Data from the Social Web

• Allows remote resource information to change• Automated techniques:

– Follow the links– Retrieve the instance information

Page 25: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguating Identity Web References

Page 26: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Generating Metadata Models

• Input to disambiguation techniques is a set of web resources• Web resources come in many flavours:

– Data models– XHTML documents containing embedded semantics– HTML documents

4. Interpretation:How can automated techniques interpret information?

• Solution = Semantic Web technologies!– Convert web resources to RDF– Metadata descriptions = ontology concepts

• Information is– Consistent– Interpretable

Page 27: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Generating RDF Models from XHTML Documents

http://events.linkeddata.org/ldow2009/

Page 28: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Generating RDF Models from XHTML Documents

Page 29: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Generating RDF Models from HTML Documents

• Rise in use of lowercase semantics!– However only 2.6% of web documents contain semantics

[Mika et al, 2009]• Majority of the web is HTML

– Bad for machines• Must extract person information

– Then build an RDF model• Person information is structured

– for legibility– for segmentation

• i.e. logical distinction between elements

Page 30: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Generating RDF Models from HTML Documents

Page 31: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Generating RDF Models from HTML Documents

• HTML is often poorly structured– Need a Document Object Model– Therefore Tidy it!

Page 32: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Generating RDF Models from HTML Documents

• Identify document segments for extraction– 1 window = Info about 1 person– Get Xpath expression to the window

Page 33: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Generating RDF Models from HTML Documents

• Extract information using a Hidden Markov Model– E.g. name, email, www, location– Train model parameters: Transition probs, emission probs, start probs– Use Viterbi algorithm to label tokens with states

– Returns most likely state sequence

Page 34: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Generating RDF Models from HTML Documents

M Rowe. Data.dcs: Converting Legacy Data into Linked Data. In proceedings of Linked Data on the Web Workshop, World Wide Web Conference 2010. Raleigh, USA. (2010)

Page 35: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguating Identity Web References

Page 36: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

1. Extract instances from Seed Data2. For each instance, build a rule:

• Build a skeleton rule• Add triples to the rule• Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

Disambiguation 1: Inference Rules

Page 37: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

1. Extract instances from Seed Data2. For each instance, build a rule:

• Build a skeleton rule• Add triples to the rule• Create a new rule is a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

Disambiguation 1: Inference Rules

Page 38: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

1. Extract instances2. For each instance, build a rule:

• Build a skeleton rule• Add triples to the rule• Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .

?url foaf:topic ?p .?p foaf:name ?n .

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:name ?m .?url foaf:topic ?r .?r foaf:name ?m

}

Disambiguation 1: Inference Rules

Page 39: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

1. Extract instances2. For each instance, build a rule:

• Build a skeleton rule• Add triples to the rule• Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .

?url foaf:topic ?p .?p foaf:name ?n .

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:homepage ?h .?url foaf:topic ?r .?r foaf:homepage ?h

}

Disambiguation 1: Inference Rules

Page 40: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

1. Extract instances2. For each instance, build a rule:

• Build a skeleton rule• Add triples to the rule• Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .

?url foaf:topic ?p .?p foaf:name ?n .

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:homepage ?h .?url foaf:topic ?r .?r foaf:homepage ?h

}

Disambiguation 1: Inference Rules

Page 41: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Advantages:• Highly precise• Applies graph patterns

Disadvantages:• Does not learn from past decisions (supervised)• Strict matching: lack of generalisation

M Rowe. Inferring Web Citations using Social Data and SPARQL Rules. In proceedings of Linking of User Profiles and Applications in the Social Semantic Web, Extended Semantic Web Conference 2010. Heraklion, Crete. (2010)

Disambiguation 1: Inference Rules

Page 42: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguation 2: Random Walks

• Seed data and web resources are RDF– RDF has a graph structure:

<subject, predicate, object><source_node, edge, target_node>

• Graph-based disambiguation techniques:– E.g. [Jiang et al, 2009]– Build a graph-space– Partition data points in the graph-space

• Requires methods to:– Compile a graph-space– Compare nodes– Cluster nodes

Page 43: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguation 2: Random Walks

• Link the social graph with the web resources• Via common resources/literals

Page 44: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguation 2: Random Walks

Page 45: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguation: Random Walks

Page 46: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguation 2: Random Walks

• Graph space may contain islands of nodes• Inhibit transitions through the graph space

• Get the component containing the social graph

Page 47: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguation 2: Random Walks

• Perform Random Walks through the graph1. Derive Adjacency Matrix 2. Derive Diagonal Degree Matrix 3. Compute Transition Probability Matrix

Page 48: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguation 2: Random Walks

• Measure Distances:• Commute Time distance

• Leave node i : reach node j : return to node i• Optimum Transitions

• Move through the graph until probability peaks

Page 49: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguation: Random Walks

• Measure Distances:• Commute Time distance

• Leave node i : reach node j : return to node i• Optimum Transitions

• Move through the graph until P peaks

Page 50: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguation 2: Random Walks

• Group web resources with social graph• Via agglomerative clustering• Every point is in a cluster• Merge clusters until none can be merged

Page 51: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguation 2: Random Walks

Advantages:• Semi-supervised• Exploits the graph structure of RDF

Disadvantages:• Computationally heavy (Matrix powers!)• Relies on tuning clustering threshold

M Rowe. Applying Semantic Social Graphs to Disambiguate Identity References. In proceedings of European Semantic Web Conference 2009, Heraklion, Crete. (2009)

Page 52: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Disambiguation 3: Self-training

• Classic ML scenario:– Lots of unlabelled data– Limited labelled data

• Disambiguating identity web references is just the same!– Possible web citations = large– Social data = small

• Semi-supervised learning is a solution– Train a classifier– Using labelled and unlabelled data!

• Classification task is binary– Does this web resource refer to person X or not?

Page 53: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

• Positive training data = seed data• Generate negative training data:

– Via Rocchio classification:1. Build centroid vectors: positive set and negative set

• Negative set = unlabelled data

2. Compare possible web citations with vectors3. Choose strongest negatives

Disambiguation 3: Self-training

Page 54: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

• Positive training data = seed data• Generate negative training data:

– Via Rocchio classification:1. Build centroid vectors: positive set and negative set

• Negative set = unlabelled data

2. Compare possible web citations with vectors3. Choose strongest negatives

Disambiguation 3: Self-training

Page 55: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

• Positive training data = seed data• Generate negative training data:

– Via Rocchio classification:1. Build centroid vectors: positive set and negative set

• Negative set = unlabelled data

2. Compare possible web citations with vectors3. Choose strongest negatives

Disambiguation 3: Self-training

Page 56: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

• Positive training data = seed data• Generate negative training data:

– Via Rocchio classification:1. Build centroid vectors: positive set and negative set

• Negative set = unlabelled data

2. Compare possible web citations with vectors3. Choose strongest negatives

Disambiguation 3: Self-training

Page 57: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

• Begin Self-training:1. Train the Classifier2. Classify the web resources3. Rank classifications4. Enlarge training sets5. Repeat steps 1-4

Disambiguation 3: Self-training

Page 58: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

• Training/Testing data is RDF• Convert to a machine learning dataset

– Features = RDF instances• Vary the feature similarity measure:

– Jaccard Similarity– Inverse Functional Property Matching– RDF Entailment

• Tested three different classifiers:– Perceptron– Support Vector Machine– Naïve Bayes

Disambiguation 3: Self-training

Page 59: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

• Advantages– Directly learn from disambiguation decisions– Utilise abundance of unlabelled data

• Disadvantages– Requires reliable negatives– Mistakes can reinforce themselves

M Rowe and F Ciravegna. Harnessing the Social Web: The Science of Identity Disambiguation. In proceedings of Web Science Conference 2010. Raleigh, USA. (2010)

Disambiguation 3: Self-training

Page 60: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Evaluation

• Measures:– Precision, Recall, F-Measure

• Dataset– 50 participants from the Semantic Web and Web 2.0 communities– ~17300 web resources: 346 web resources for each participant

• Baselines– Baseline 1: Person name as positive classification– Baseline 2: Hierarchical Clustering using Person Names

• [Malin, 2005]– Baseline 3: Human Processing

Page 61: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Evaluation: Inference Rules

• High precision– Better than humans– Precise graph pattern matching

• Low recall– Rules are strict

• No room for variability– Hard to generalise

• No learning from disambiguation decisions

Precision Recall F-MeasureInference Rules 0.955 0.436 0.553Baseline 1 0.191 0.998 0.294Baseline 2 0.648 0.592 0.556Baseline 3 0.765 0.725 0.719

Page 62: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Evaluation: Random Walks

• High recall– Higher than humans– Incorporates unlabelled data into random walks

• Uses features not in the seed data

• Precision– Lower than humans and rules– Ambiguous name literals lead to false positives

Precision Recall F-MeasureCommute Time 0.707 0.798 0.705Optimum Transitions 0.659 0.805 0.684Baseline 1 0.191 0.998 0.294Baseline 2 0.648 0.592 0.556Baseline 3 0.765 0.725 0.719

Page 63: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Evaluation: Self-trainingPrecision Recall F-Measure

Perceptron + Entailment 0.629 0.905 0.728

Perceptron + IFP 0.630 0.878 0.715

Perceptron + Jaccard 0.651 0.820 0.700

SVM +Entailment 0.613 0.910 0.731

SVM + IFP 0.628 0.864 0.711

SVM + Jaccard 0.755 0.695 0.691

Baseline 1 0.191 0.998 0.294

Baseline 2 0.648 0.592 0.556

Baseline 3 0.765 0.725 0.719

• High Recall– SVM + Entailment classifies 91% of references

• High F-Measure– Higher than humans

• Perceptron + Entailment and SVM + Entailment

Page 64: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Conclusions: Research Questions

1. Alleviate human processing:• Can automated techniques replace humans?

– Performance is comparable to humans– Suited to low web presence

2. Supervision:• Can automated techniques function independently?

– Inference Rules : Induce rules from seed data– Random Walks : Graph space built from models– Self-training : Learn + retrain a classifier

3. Seed Data:• How can this be gathered inexpensively?

– Utilise Social Web platforms– Digital identities are similar to real world identities

4. Interpretation:• How can automated techniques interpret information?

– Solution = Semantic Web technologies– Convert web resources into metadata models

Page 65: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Conclusions: Claims

• Automated disambiguation techniques are able to replace human processing– Techniques are comparable to humans– Overcome manual processing

• Data found on Social Web platforms is representative of real identity information– 77% of a real world social network is covered online

• Social data provides the background knowledge required by automated disambiguation techniques– Techniques function using social data– Biographical and social network enables disambiguation

Page 66: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Dissemination and Impact

• Published 21 peer-reviewed publications– Paper in the Journal of Web Semantics (impact: 3.5)– Presented work at many international conferences

• Program committee member for 5 international workshops• Invited Expert for the World Wide Web Consortium’s Social Web Incubator

Group• Listed as one of top 100 visionaries “discussing the future of the web”

http://www.semanticweb.com/semanticweb100/• Linked Data service for the DCS

– Best Poster at the Extended Semantic Web Conference 2010http://data.dcs.shef.ac.uk

• Tools widely used by the Semantic Web community– FOAF Generator– Social Identity Schema Mapping (SISM) Vocabulary

Page 67: PhD Viva - Disambiguating Identity Web References using Social Data

Matthew Rowe - Disambiguating Identity Web References using Social Data

Questions?

Twitter: @mattroweshowWeb: http://www.dcs.shef.ac.uk/~mroweEmail: [email protected]

M Rowe and F Ciravegna. Disambiguating Identity Web References using Web 2.0 Data and Semantics. In Press for special issue on "Web 2.0" in the Journal of Web Semantics. (2010)

For a condensed version of my thesis: