in situ evaluation of entity ranking and opinion summarization using kavita ganesan & chengxiang...

16
In Situ Evaluation of Entity Ranking and Opinion Summarization using Kavita Ganesan & ChengXiang Zhai University of Illinois @ Urbana Champaign www.findilike.com

Upload: eleanor-haynes

Post on 23-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

In Situ Evaluation of Entity Ranking and Opinion Summarization

using

Kavita Ganesan & ChengXiang ZhaiUniversity of Illinois @ Urbana Champaign

www.findilike.com

• Preference – driven search engine– Currently works in hotels domain– Finds & ranks hotels based on user preferences:Structured: price, distanceUnstructured: “friendly service”, “clean”, “good views”(Based on existing user reviews) UNIQUE

• Beyond search: Support for analysis of hotels– Opinion summaries – Tag cloud visualization of reviews

What is findilike?

…What is findilike?

• Developed as part of PhD. Work – new system(Opinion-Driven Decision Support System, UIUC, 2013)

• Tracked ~1000 unique users from Jan - Aug ‘13– Working on speed & reaching out to more users

2 Components that can be evaluated through natural user interaction

1

Ranking entities based on unstructured user preferencesOpinion-Based Entity Ranking

(Ganesan & Zhai 2012)

Summarization of reviewsGenerating short phrases summarizing key opinions(Ganesan et. al 2010, 2012)

2

Evaluation of entity ranking

• Retrieval– Interleave results

Balanced interleaving(T. Joachims, 2002)

Base

DirichletLM

BaseA click indicates preference…

Snapshot of pairwise comparison results for entity ranking

A B CA > CB (A Better)

CB > CA (B Better)

CA = CB > 0 (Tie)

CA = CB = 0 Total

DLM Base 30 35 2 5 72 PL2 Base 10 28 3 7 48… … … … … … …

# Queries B is better

Algorithms DirichletLM,

Base, PL2

# Queries A is Better

Snapshot of pairwise comparison results for entity ranking

A B CA > CB (A Better)

CB > CA (B Better)

CA = CB > 0 (Tie)

CA = CB = 0 Total

DLM Base 30 35 2 5 72 PL2 Base 10 28 3 7 48… … … … … … …

Base model better & PL2 not

too good

Base model better, but DLM

not too far behind

Evaluation of review summarization

Randomly mix top Nphrases from two

algorithms

More clicks on phrases from Algo1 vs. Algo2 Algo1 better

ALGO1

ALGO2 Monitor click- through on per entity

basis

Submit code

Performance report

Online Performance

A B CA > CB (A Better)

CB > CA (B Better)

CA = CB > 0 (Tie)

DLM Base 30 35 2

PL2 Base 10 28 3

… … … … …

How to submit a new algorithm?

Mini Testbed

Test on mini test bed

Test Data & Gold Standard

Evaluator(nDCG, ROUGE)

Sample Code

Local performance

Write Java based code

Extend existing code

Implementation

More information about evaluation…

eval.findilike.com

Thanks! Questions?

Links• Evaluation: http://eval.findilike.com• System: http://www.findilike.com• Related Papers: kavita-ganesan.com

References• Ganesan, K. A., C. X. Zhai, and E. Viegas, Micropinion

Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions, Proceedings of the 21st International Conference on World Wide Web 2012 (WWW '12), 2012.

• Ganesan, K. A., and C. X. Zhai, Opinion-Based Entity Ranking, Information Retrieval, vol. 15, issue 2, 2012

• Ganesan, K. A., C. X. Zhai, and J. Han, Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions, Proceedings of the 23rd International Conference on Computational Linguistics (COLING '10), 2010.

• T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’02, NY, 2002.

Evaluating Review Summarization

Mini Test-bed• Base code to extend• Set of sample sentences• Gold standard summary for those sentences• ROUGE toolkit to evaluate the results• Data set based on - Ganesan et. al 2010

Evaluating Entity Ranking

Mini Test-bed• Base code to extend• Terrier Index of hotel reviews• Gold standard ranking of hotels• Code to generate nDCG scores.• Raw unindexed data set for reference

Building a new ranking model

Extend Weighting Model