extracting and ranking product features in opinion documents

Download extracting and ranking product features in opinion documents

If you can't read please download the document

Post on 11-May-2015




1 download

Embed Size (px)


This is a presentation we made in the 2012 Spring Data Mining class of Tsinghua University. The presentation is about a paper by Lei Zhang, Bing Liu, Suk Hwan Lim, Eamonn O’Brien-Strain


  • 1.Extracting and Ranking ProductFeatures in Opinion Documents

2. Retina Display 3-axis gyro & accelerometer A4 CPU Multitask Face Time iBook Antenna GateA Story 3. Clearly knowing the response fromconsumers will help company winmore market share. Consumers could also make correctchoices when shopping.Why mining productfeatures? 4. In recent years, opinion mining has been an active researcharea in NLP. The most important problem is to extractingfeatures from a corpus. HMM, ME, PMI,CRF methods. Double Propagation is a state-of-art unsupervised techniquefor solving this problem, though it has its own significantlimitations.Recent Research 5. Proposed by researchers fromIllinois University and ZhejiangUniversity. Mainly extracts nounfeatures, woks well for medium-size corpora. No additional resources but initialseed opinion lexicon needed.Double Propagation 6. Basic Assumption: Features are nouns/noun phrases and opinion words areadjectives. Dependency Grammar: Describe the dependency relations between words in asentence, including direct relations(a)(b) and indirect relations(c)(d). OpinionThe camera has a good lens.ClassFeature DP Mechanism 7. Non-opinion adjectives may be extracted as opinion words. This will introduce moreand more noise during the extracting process. current+ Nounentire Some important features do not have opinion words modifying them. There is a valley on my mattress. No opinion word modified feature DP Limitations 8. Two-Step feature mining method: Feature Extraction Double Propagation Part-whole pattern No pattern Feature Ranking New angle to solve the noise problem. Use relevance & frequency to rank features.Proposed Methods 9. Three strong clue indicates a correct feature: Modified by multiple opinion words. Could be extracted by multiple part-whole pattern. Combination of the part-whole, no pattern and opinion word modification. Frequent appearing indicates an important feature.Ranking Principles 10. Feature extraction part-whole relation no pattern Feature ranking HITS algorithm consider frequencyProcess 11. Ambiguous / Unambiguous Phrase pattern NP + Prep + CP CP + with + NP NP CP or CP NP Sentence pattern CP Verb NPPart-whole relation 12. no + features no noise, no indentation Exceptions no problem, no offense manually compiled an exception listno Pattern 13. HITS Algorithm hub score / authority score iteration to optimize Apply HITS Algorithm split feature and feature indicator use directed graph compute feature revelanceApply HITS Algorithm 14. Feature Ranking 15. Data SetsCarsMattress Phone LCD# of Sent. 22231323315168 1783Cars and Mattress: product review sites.Phone and LCD: forum sites. Precision@N metric: Percentage of correct features that are among the top N feature candidates in a ranked list. Data Sets & Evaluation Metrics 16. 0.79 0.790.78 0.770.8 0.69 0.680.680.660.7 0.64 0.55 0.540.6 0.560.550.5 0.43Our Recall 0.44DP Recall0.4Our Precision0.30.23DP PrecisionOur Precision DP Precision0.2 DP Recall Our RecallResults of 1000 sentencesRecall & Precision Comparison 17. 0.7 0.7 0.7 0.7 0.67 0.69 0.66 0.70.650.64 0.66 0.620.65 0.58 0.6 0.560.55 0.52Our Recall 0.5 0.5 DP Recall0.42 Our Precision0.45DP Precision Our Precision DP Precision 0.4 DP RecallOur Recall Results of 2000 sentencesRecall & Precision Comparison 18. 0.65 0.660.7 0.67 0.64 0.62 0.650.59 Our Recall0.6 DP Recall 0.55Our Precision 0.510.48DP Precision DP Precision0.5Our PrecisionDP Recall 0.45 Our RecallMattress PhoneResults of 3000 sentencesRecall & Precision Comparison 19. 1 0.940.90.84 0.9 0.810.8 0.76 0.760.7 0.640.68 Our Precision DP Precision0.6 DP Precision Our Precision Precision at top 50Ranking Comparison 20. 0.90.88 0.820.850.85 0.8 0.80.75 0.75 0.730.70.65 0.68 Our Precision0.65 DP Precision0.6 DP Precision Our Precision Precision at top 100Ranking Comparison 21. 0.7Our Precision 0.7DP Precision DP Precision0.65 Our Precision CarsMattress Phone Precision at top 200Ranking Comparison 22. Use part-whole and no patterns to increase recall Rank extracted feature candidates by featureimportance, determined by two factors: Feature relevance Feature frequency(HITS was applied)Conclusion 23. Thank you


View more >