text mining of electronic news content for economic research
DESCRIPTION
Text Mining of Electronic News Content for Economic Research. Panos Ipeirotis Stern School of Business New York University. “On the Record”: A Forum on Electronic Media and the Preservation of News. Comparative Shopping. Comparative Shopping. Are Customers Irrational?. BuyDig.com gets - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/1.jpg)
Panos IpeirotisPanos Ipeirotis
Stern School of BusinessStern School of Business
New York UniversityNew York University
Text Mining of Electronic News Text Mining of Electronic News Content for Economic ResearchContent for Economic Research
“On the Record”: A Forum on Electronic Media and the Preservation of News
![Page 2: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/2.jpg)
Comparative ShoppingComparative Shopping
![Page 3: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/3.jpg)
Comparative ShoppingComparative Shopping
![Page 4: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/4.jpg)
Are Customers Irrational?Are Customers Irrational?
$11.04 (+1.5%)
BuyDig.com gets
Price Premium(customers pay more than
the minimum price)
![Page 5: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/5.jpg)
Price Premiums @ Amazon Price Premiums @ Amazon
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
-100 -75 -50 -25 0 25 50 75 100
Price Premium
Nu
mb
er
of
Tra
ns
ac
tio
ns Are Customers
Irrational (?
)
![Page 6: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/6.jpg)
Why not Buying the Cheapest?Why not Buying the Cheapest?
You buy more than a product
Customers do not pay only for the product
Customers also pay for a set of fulfillment characteristics
Delivery
Packaging
Responsiveness
…
Customers care about reputation of sellers!
![Page 7: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/7.jpg)
Example of a reputation profileExample of a reputation profile
![Page 8: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/8.jpg)
![Page 9: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/9.jpg)
The Idea in a Single SlideThe Idea in a Single Slide
Conjecture: Price premiums measure reputation
Reputation is captured in text feedback
Our contribution: Examine how text affects price premiums
![Page 10: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/10.jpg)
Decomposing ReputationDecomposing Reputation
Is reputation just a scalar metric?
Previous studies assumed a “monolithic” reputation
Decompose reputation in individual components
Sellers characterized by a set of fulfillment characteristics(packaging, delivery, and so on)
What are these characteristics (valued by consumers?)
We think of each characteristic as a dimension, represented by a noun, noun phrase, verb or verbal phrase (“shipping”, “packaging”, “delivery”, “arrived”)
We scan the textual feedback to discover these dimensions
![Page 11: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/11.jpg)
Decomposing and Scoring ReputationDecomposing and Scoring Reputation
Decomposing and scoring reputation
We think of each characteristic as a dimension, represented by a noun or verb phrase (“shipping”, “packaging”, “delivery”, “arrived”)
The sellers are rated on these dimensions by buyers using modifiers (adjectives or adverbs), not numerical scores
“Fast shipping!”
“Great packaging”
“Awesome unresponsiveness”
“Unbelievable delays”
“Unbelievable price”
How can we find out the meaning of these adjectives?
![Page 12: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/12.jpg)
Measuring ReputationMeasuring Reputation
• Regress textual reputation against price premiums
• Example for “delivery”:– Fast delivery vs. Slow delivery: +$7.95– So “fast” is better than “slow” by a $7.95 margin
![Page 13: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/13.jpg)
Some Indicative Dollar ValuesSome Indicative Dollar Values
Positive Negative
Natural method for extracting sentiment strength and polarity
good packaging -$0.56
Naturally captures the pragmatic meaning within the given context
captures misspellings as well
Positive? Negative?
![Page 14: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/14.jpg)
• Examine changes in demand based on published product reviews
Product Reviews and Product SalesProduct Reviews and Product Sales
“poor lens”
+3%
“excellent lens”
-1%
“poor photos”
+6%
“excellent photos”
-2%
Feature “photos” is two times more important than “lens” “Excellent” is positive, “poor” is negative “Excellent” is three times stronger than “poor”
![Page 15: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/15.jpg)
Feature Weights for Digital CamerasFeature Weights for Digital Cameras
0
0.2
0.4
0.6
0.8
1
1.2
SLRPoint & Shoot
![Page 16: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/16.jpg)
Show me the Money!Show me the Money!
Applications with Electronic News
Political News and Prediction Markets
Financial News and Stock/Option Prices
Broader contribution
Economic data are affected in many contexts by text
Economic data are affected in many contexts by news
![Page 17: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/17.jpg)
Prediction MarketsPrediction Markets
A prediction market is a market for a contract that yields payments based on the outcome of a partially uncertain future event, such as an election.
A contract pays $100 if candidate X wins the election, and $0 otherwise.
When the market price of an X contract is $60, the prediction market believes that candidate X has a 60% chance of winning the election.
![Page 18: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/18.jpg)
Political News and Prediction MarketsPolitical News and Prediction Markets
Hillary Clinton
…To put our money where our mouth is, the signal from the last few days shows that Hillary's market price will edge lower in the next few days/weeks…
Dec 2, 2007
On my blog
![Page 19: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/19.jpg)
And suddenly…And suddenly…
We predicted decline here
Why stop here?
![Page 20: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/20.jpg)
An interesting sequence of emails…An interesting sequence of emails…
Date: Mon, 14 Jan 2008 11:26:27 -0500Subject: Excessive downloading from licensed database
We have received a complaint from ProQuest/Factiva about a massive number of articles (over 10,000 per session) being downloaded from their database to a system at Stern, using IP 128.122.130.34 at the times below (Eastern time).
Date: Mon, 14 Jan 2008 12:16:53 -0500Subject: Excessive downloading from licensed database
Got a call from Jane this morning that Panos has downloaded bulk information from Proquest/Factiva last Thursday 10th (2GB download) and Friday 11th (2.5GB download). This is creating a big issue with NYU libraries and Proquest, with a threat for a bill of up to $250K…
Date: Tue, 15 Jan 2008 15:02:13 -0500Subject: About Factiva…
…it is clear that the interface is meant only for humans, not to download articles for processing with computers…
![Page 21: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/21.jpg)
XML is for humans?XML is for humans?
![Page 22: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/22.jpg)
![Page 23: Text Mining of Electronic News Content for Economic Research](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568147dc550346895db51066/html5/thumbnails/23.jpg)
Some LessonsSome Lessons
• Cannot rely on a commercial for-profit service when research can lead to something competitive
• Need a public, comprehensive repository of archival news
• Allow annotation and tagging from multiple parties to be part of repository
• Build reputational and usage statistics on contributed annotations (to pick the best)