improving health question classification by word location weights
DESCRIPTION
Improving Health Question Classification by Word Location Weights. Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan. Outline. Background Problem definition The proposed approach: WLW Empirical evaluation Conclusion. Background. Categories of Health Questions. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/1.jpg)
Improving Health Question Classification
by Word Location Weights
Rey-Long Liu
Dept. of Medical Informatics
Tzu Chi University
Taiwan
![Page 2: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/2.jpg)
Outline
• Background
• Problem definition
• The proposed approach: WLW
• Empirical evaluation
• Conclusion
2
![Page 3: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/3.jpg)
Background
3
![Page 4: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/4.jpg)
Categories of Health Questions
4
![Page 5: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/5.jpg)
Classification of Health Questions
• Why health questions?– Health questions provide both reliable and
readable health information
• Why classification of health questions?– Given a health question q, retrieve related
questions (and their answers)
5
![Page 6: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/6.jpg)
Problem Definition
6
![Page 7: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/7.jpg)
Goal & Motivation• Goal
– Target: Chinese Health Questions (CHQs)– Contribution: Developing a technique WLW
(Word Location Weight) that estimates the location weights of words in a CHQ based on their locations
• Motivation– Location weights can be used by classifiers (e.g.,
SVM) to improve the classification • Classifying in-space CHQs (cause, diagnosis, process)
• Filtering out-space CHQs (may be whatever)7
![Page 8: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/8.jpg)
Basic Idea
• Those words that are more related to the category of a CHQ tend to appear at the beginning and end of the CHQ
• Examples:如何 (how to)克服 (deal with)緊張 (nervous)的情緒 (mood)? process
嬰兒 (infant)體溫 (body temperature)太低 (too low)怎麼辦 (how to do)? process
8
![Page 9: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/9.jpg)
Related Work
• Recognition of question types (e.g., when, where) – Weakness: Types Intended categories of CHQs
• Classification by parsing– Weakness I: Parsing Chinese is still challenging– Weakness II: CHQs are NOT always well-formed
• Classification by pattern matching– Weakness: Difficult to construct the string patterns
9
![Page 10: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/10.jpg)
The Proposed Approach: WLW
10
![Page 11: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/11.jpg)
Main Challenges
(1) Defining the two weights of a location p in a CHQ q
11
![Page 12: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/12.jpg)
Main Challenges (cont.)
(2) Encoding the location weights of a word w into two features for the underlying classifier
12
![Page 13: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/13.jpg)
Interesting Behaviors of WLW
• A word w in a question q has two features– Fvaluefront and Fvaluerear
– Applicable to different categories and languages (e.g., English)
• When w is far from the front and the rear– Both features reduce to the term frequency (TF) of w– WLW reduces to traditional feature-encoding
approach (using TF as the features)
13
![Page 14: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/14.jpg)
Empirical Evaluation
14
![Page 15: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/15.jpg)
Experimental Design
• CHQs were downloaded from a health information provider– 864 in-space CHQs
• cause (category 1): 313 • diagnosis (category 2): 92 • process (category 3): 459
– 100 out-space CHQs• whatever (general description)
• Five-fold cross validation
15
![Page 16: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/16.jpg)
Underlying Classifiers
• Underlying classifier – The Support Vector Machine (SVM)
classifier
16
![Page 17: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/17.jpg)
Results: Classification of In-Space CHQs
• Evaluation criteria– Micro-averaged F1 (MicroF1)
– Macro-averaged F1 (MacroF1)
17
![Page 18: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/18.jpg)
SVM+WLW is significantly better than SVM
18
![Page 19: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/19.jpg)
Results: Filtering of Out-Space CHQs
• Evaluation criteria– Filtering ratio (FR) =
# out-space CHQs successfully rejected by all categories / # out-space CHQs
– Average number of misclassifications (AM) =
# misclassifications for the out-space CHQs / # out-space CHQs
19
![Page 20: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/20.jpg)
SVM+WLW achieves higher FR and lower AM
20
![Page 21: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/21.jpg)
Conclusion
21
![Page 22: Improving Health Question Classification by Word Location Weights](https://reader033.vdocuments.mx/reader033/viewer/2022051623/568159a7550346895dc70b01/html5/thumbnails/22.jpg)
• Healthcare consumers often read health information on the Internet
• Health questions as the valuable resources for healthcare consumers– Providing both reliable and readable health
information
• Classification of health questions is basis for the retrieval of related questions– cause, diagnosis, process, whatever
• WLW can help SVM to improve the classification of CHQs
22