Intelligent Database Systems Lab
Presenter: WU, MIN-CONG
Authors: Yongzheng Zhang , Rajyashree Mukherjee ,
Benny Soetarman
2012, ACM
Concept Extraction for Online Shopping
Intelligent Database Systems Lab
Outlines
MotivationObjectivesMethodologyExperimentsConclusionsComments
1
Intelligent Database Systems Lab
Motivation• In order to provide a more streamlined user
experience in shopping related research, it is
critical for e-commerce sites to accurately
identify what a Web page is talking about.
2
Intelligent Database Systems Lab
Objectives• We investigate two concept extraction methods ACE
and KEA in the online shopping context. We discuss
how to upgrade ACE with major improvements into
ICE.
3
Intelligent Database Systems Lab
Methodology - ACE
ACE
ICE
KEA
5
Intelligent Database Systems Lab
Methodology - ACE
ACE
ICE
KEA
ACE
ICE
KEA
Trem frequency
6
Intelligent Database Systems Lab
Methodology - ACE
ACE
ICE
KEA
ACE
HTML Scorer
TF Scorer
ACE
ICE
KEA
Tokenization
Concept Miner
Concept Derivation
7
Intelligent Database Systems Lab
Methodology - ACE
ACE
ICE
KEA
ACE
HTML Scorer
ACE
ICE
KEA
Tokenization
Concept Miner
Concept Derivation
TF Scorer
8
Intelligent Database Systems Lab
Methodology - ACE
ACE
ICE
KEA
ACE
TF Scorer
ACE
ICE
KEA
Tokenization
Concept Miner
Concept Derivation
HTML Scorer
9
Intelligent Database Systems Lab
ACE
ICE
KEA
Methodology - ICE
ACE
ICE
ACE
ICE
KEA
10
Intelligent Database Systems Lab
ACE
ICE
KEA
Methodology - ICE
ACE
ICE
ACE
ICE
KEA
11
Intelligent Database Systems Lab
ACE
ICE
KEA
Methodology - ICE
ACE
ICE
ACE
ICE
KEA
<a href = “http://buy.ebay.com/cell-phone”>cell phone</a> <a href =“http://www.ebay.com/”>Home</a>
cell phone homecell
phonehome
12
Intelligent Database Systems Lab
ACE
ICE
KEA
Methodology - ICE
ACE
ICE
ACE
ICE
KEA
Baseball
Professional Baseball
Baseball Players
Professional Baseball Players
13
Intelligent Database Systems Lab
Baseball
Professional Baseball
Professional Baseball Players
ACE
ICE
KEA
Methodology - ICE
ACE
ICE
ACE
ICE
KEA
Baseball Players
Baseball Players
14
Intelligent Database Systems Lab
ACE
ICE
KEA
Methodology - ICE
ACE
ICE
ACE
ICE
KEA
Emphasis scorer
ACE
HTML Scorer
TF Scorer
Concept Miner
15
Intelligent Database Systems Lab
ACE
ICE
KEA
Methodology - ICE
ACE
ICE
ACE
ICE
KEA
ACE
HTML Scorer
TF Scorer
Concept Miner
15
overlapping title
Emphasis scorer
Intelligent Database Systems Lab
ACE
ICE
KEA
Methodology - ICE
ACE
ICE
ACE
ICE
KEA
Porter stemming
Baseball baseballs Baseballs
baseball
17
Intelligent Database Systems Lab
ACE
ICE
KEA
Methodology - KEA
ACE
ICE
KEA
ACE
Human authored
TF-IDF
first appearance
Naïve Bayes model
18
Intelligent Database Systems Lab
Experiment
11 Evaluation framework
ACE V.S ICE
T
B
λ
ICE V.S KEA
Document Topic
100 shopping related Web pages Dell, HP, and Canon
19
Intelligent Database Systems Lab
Experiment
11 Evaluation framework
ICE V.S KEA
Document Topic
100 shopping related Web pages Dell, HP, and Canon
ACE V.S ICE
T
B
λ
ACE V.S ICE[B,T]
20
Intelligent Database Systems Lab
Experiment
11 Evaluation framework
Document Topic
100 shopping related Web pages Dell, HP, and Canon
ACE V.S ICE
T
B
λ
ICE V.S KEA
KEA50 Web
pages for training
ICE
precision recall F1-measure
ICE 0.7383 0.8300 0.7815
KEA 0.6583 0.7600 0.7056
21
Intelligent Database Systems Lab
Conclusions• The experimental results demonstrate that ICE
significantly outperforms KEA in concept extraction
for online shopping.
22
Intelligent Database Systems Lab
Comments• Advantages– ICE is an unsupervised method that doesn’t need
to Human-authored keyphrase.• Applications– online shopping, concept extraction, automatic keyphrase extraction.
23