capturing the ineffable: collecting, analysing, and automating web document quality assessments
TRANSCRIPT
Capturing the Ineffable:
Collecting, Analysing, and Automating
Web Document Quality Assessments
Davide Ceolin, Julia Noordegraaf, Lora Aroyo
• Introduction
• Nichesourcing Web Document Quality Assessments
• User studies
• Conclusion and Future Work
Outlin
e
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
Introduction
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
Web Document Quality Assessment
• Source criticism• Methodological practice from the humanities
• e.g., from the American Library Association:• How was the source located?
• What type of source is it?
• Who is the author and what are the qualifications of the author in regard to the topic that is discussed?
• When was the information published?
• In which country was it published?
• What is the reputation of the publisher?
• Does the source show a particular cultural or political bias?.
• How does it apply to Web sources?
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
Web Document Quality Assessment
What is the quality of each of these documents?
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
Authoritative source ✓
Accurate ✓
Precise ✓
Complete ✓
Neutral (?)
Blog Post (?)
Accurate (?)
Precise (?)
Complete (?)
Neutral ✗
• We adapt source criticism to Web documents & aim at automating the process of quality estimation by:• Gathering quality assessments (mostly from experts).
• Looking for markers (document features) that correlate with them.
Quality and Quality
Markers
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
Objectives
• Analyse the consistency of quality assessments.
• Are quality assessments consistent among users, over time, etc.?
• Analyse user ability to interpret document features.
• Can the users estimate the quality of a document from its sentiment or trustworthiness level?
• Analyse the predictability of quality assessments.
• Can we automatically estimate the quality of a document?
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
Nichesourcing Web Document Quality Assessments
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
• Dataset: documents about vaccinations• Initially, 50 docs, various sources (blogs, authorities, etc.)
• Features• Information (automatically) extracted from documents
using AlchemyAPI & Web of Trust.• Entities, Topics, Sentiment, Emotions, Trustworthiness.
• Quality dimensions• Overall quality, accuracy, completeness, precision,
trustworthiness, readability, neutrality.
Dataset, Features, and Quality
Dimensions
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
• Setup:• 6 documents per participant.• Random selection.• Even distribution of assessments.• Scenario:
Suppose you are asked to write an article about debate on vaccinations triggered by the measles outbreak in 2015 at Disneyland in California.
WebQ: Nichesourcing Web Quality Assessments
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
• Documents are anonymized.• Users choose documents that meet their quality
criteria based on features only.• All feature values are shown, alone and together.
WebQ: Task 1
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
• Read each of the 6 articles.• Assess it.
• Rate completeness, accuracy, etc. • Likert scale 1-5.
• Annotate the article to explain the ratings• Articles are proxied & annotated through AnnotatorJS.
WebQ: Task 2
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
User Studies
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
• User Study 1
• Participants: 20 last-year UvA journalismstudents.
• Duration: 60’.
• User Study 2
• Participants: 20 RMA media scholars.
• Duration: 45’.
• Improvements (learnt from user study 1).
Setup
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
• Data collected:
• 104 (US1) + 47 (US2) assessments.
• 238 (US1) + 89 (US2) annotations.
• No significant difference between Use Cases (Wilcoxon signed-rank test).
• Assessments are assimilable.
• Assessment predictability (SVC)
• Up to 63% accuracy (5-classes)
• Up to 89% accuracy (2-classes)
• Promising predictability. We will try other algorithms.
Results
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
• Highest correlation with overall quality:• Accuracy
• Trustworthiness
• Precision
• Completeness
• Given the task at hand, neutrality is not relevant.
• Weak correlation task 1 - overall quality (task 2).
• Users were mostly unable to interpret those features.
Results
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
Conclusion
Capturing the Ineffable
• We collected Web document quality assessments.• WebQ – Nichesourcing application.• 2 user studies with experts.• Clear defined task.• Controlled dataset.
• We analysed the assessments, and automatedtheir prediction.• The task matters more than subjectivity.• Assessments are quite uniform and coherent.• Features in isolation are not very meaningful.• The application setup is important.
Conclusion
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
• We plan to and are currently working on:• Extending the dataset (currently ~1,500 documents).
• Scaling up the experiments and gathering more assessments.
• Involving laymen via crowdsourcing.
• Extending the analyses.
• Utilising other automated reasoning approaches.
(Current and) Future Work
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
https://qupid-project.net/
Thank you!
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments