text mining three cases. 2 outline federalist papers svdpdf vaers
TRANSCRIPT
![Page 1: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/1.jpg)
Text Mining
Three Cases
![Page 2: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/2.jpg)
2
Outline Federalist Papers SVDPDF VAERS
http://zlin.ba.ttu.edu/sassrc.rar
http://zlin.ba.ttu.edu/DMTM9.rar
![Page 3: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/3.jpg)
3
Federalist Papers
![Page 4: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/4.jpg)
4
![Page 5: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/5.jpg)
5
Who wrote TheFederalist Papers?
Who wrote TheFederalist Papers?
HamiltonHamilton
STYLOMETRY: Uniquely identify an author based onthe distribution of words in a document.
STYLOMETRY: Uniquely identify an author based onthe distribution of words in a document.
MadisonMadison
![Page 6: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/6.jpg)
About the Data
Alexander Hamilton, James Madison, and John Jay wrote a series of essays in 1787 and 1788 to try to convince the citizens of the state of New York to ratify the new constitution of the United States. These essays are collectively called The Federalist Papers. Copies of the papers in a variety of formats can be found at
http://www.yale.edu/lawweb/avalon/federal/fed.htm, or http://www.constitution.org/fed/federa00.htm.
Of the 85 essays, 51 are attributed to Hamilton, 15 to Madison, 5 to Jay, and 3 to Hamilton and Madison jointly. The 11 remaining essays can be attributed only to Hamilton or Madison. Mosteller and Wallace (1964) used Bayesian statistical techniques to provide evidence that Madison wrote all 11 of the essays of unknown authorship. (The essays in question are numbers 49, 50, 51, 52, 53, 54, 55, 56, 57, 62, and 63.)
6
![Page 7: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/7.jpg)
7
Corpus The Federalist Papers corpus is a collection of 85
essays.
Terms and TokensThe Federalist Papers taken as a whole contain over 190,000 tokens and approximately 8,800 unique tokens.
![Page 8: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/8.jpg)
8
The Federalist Papers Diagram
EM Clustering
Logistic Regression
TERGET: 1 – Madison; 0 – Hamilton; missing - unknown
![Page 9: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/9.jpg)
9
Federalist Papers Clusters
Cluster 1
HamiltonMadisonUnknown
2410
Cluster 2
HamiltonMadisonUnknown
271411
These clusters were obtained using numeric inputs derived from text mining. No author information wasemployed. Of interest is the fact that EM clustering placed all of the unknown essays into the same clusterthat contains 14 of the 15 Madison essays.
![Page 10: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/10.jpg)
10
Logistic Regression Classification of The Federalist Papers
![Page 11: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/11.jpg)
Text Mining Results
By Text Mining, the results of Mosteller and Wallace have been matched.
The predictions in the second column from the right show the strength of the decision.
The record with a predicted value of 0.709119 corresponds to essay 56, so the model thinks that this essay has the weakest association with Madison of all of the unknown essays.
Essay 63, with a predicted value of 0.999691, has the strongest association with Madison.
All of the essays in question have a stronger association with Madison than Hamilton, hence the classification into the Madison category.
11
![Page 12: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/12.jpg)
12
Characteristics of a Document
A document consists of letters words sentences paragraphs punctuation possible structural items: chapters, sections.
The elements of a document can be counted (for example, the number of characters, words,
or sentences) summarized (for example, mean, median, or kurtosis).
![Page 13: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/13.jpg)
13
Comparing Two Documents
Wor
d Size
Sente
nce
Size
Parag
raph
Size
Wor
d Fre
q
Sente
nce
Freq
Parag
raph
Fre
q
Doc 1
Doc 2
![Page 14: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/14.jpg)
14
![Page 15: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/15.jpg)
15
Contingency Table Comparing Essay 1 to Essay 37
continued...
![Page 16: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/16.jpg)
16
Contingency Table Comparing Essay 1 to Essay 37
![Page 17: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/17.jpg)
17
Text Miner Static Analysis
![Page 18: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/18.jpg)
18
Text Miner Interactive Analysis
![Page 19: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/19.jpg)
SVDPDF
19
![Page 20: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/20.jpg)
20
SAS Education Course Descriptions The data represents a collection of 130 course
summaries obtained from http://support.sas.com. The original 130 files were PDF files stored in one
location on an HTTP server. A SAS DATA step was used to read the files from the
server and write them to a local directory. The TMFILTER macro was used to process the PDF
files and store the results as a text field in 130 document records in a SAS data set.
The final SAS data set was modified to accommodate this demonstration and can be found in DMTM9.SASPDF.
![Page 21: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/21.jpg)
21
Static Analysis with SAS Text Miner
![Page 22: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/22.jpg)
22
Text Miner Settings
![Page 23: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/23.jpg)
23
Interactive Results
![Page 24: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/24.jpg)
24
Applications of Concept Lists A company can have specific conceptual goals. For
example, are customers concerned about brand integrity quality price features, styles, and selection availability customer support?
![Page 25: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/25.jpg)
25
Market Research for Quality What terms are most similar to the term “quality”?
– Find Similar– Filter
What documents address quality?– Filter on synonyms and similar terms– Find similar documents
What secondary concepts reflect information on quality?– SVD coefficients– Concept links
![Page 26: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/26.jpg)
26
VAERS
![Page 27: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/27.jpg)
27
VAERS VAERS was created by the Food and Drug
Administration (FDA) and Centers for Disease Control and Prevention (CDC) to receive reports about adverse events that might be associated with vaccines.
No prescription drug or biological product, such as a vaccine, is completely free from side effects. Vaccines protect many people from dangerous illnesses, but vaccines, like drugs, can cause side effects, a small percentage of which may be serious.
VAERS is used to continually monitor reports to determine whether any vaccine or vaccine lot has a higher than expected rate of events.
Department of Health and Human Services, Public Health Service
![Page 28: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/28.jpg)
28
VAERS Data was obtained from http://www.vaers.org/. Data was downloaded in September 2002 as a series
of CSV files. A SAS DATA step was used to read and process the
data. The original data had 131,464 observations and 59
variables. Cleaning and screening reduced the data set to
48,523 observations and 44 variables. The data set has 6 text variables. The original data
had 21, but 15 were sparsely populated.
![Page 29: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/29.jpg)
29
![Page 30: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/30.jpg)
30
VAERS Sample Entries15 mon. male w/ hx of recurrent ear infections & measles
in Feb. 89'. 5Apr89 was given MMR. Within 24 hrs /p vaccine, parents noted hearing deficit, confirmed by physician exam. DEAF
Urticaria, wheezy, & periorbital edema which abated /p administration of subcut. epinephrine, Bendryl IV, Solumendrol IV ASTHMA
Pt experienced chicken pox from head to toe subsequent to receiving one dose of varicella virus vaccine live.
INFECT
![Page 31: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/31.jpg)
31
VAERS Text FieldsSYMPTOM_TEXT: Full text description of the adverse
reaction entered by a medical professional
SYM01: Brief description of primary symptom
SYM02-SYM05: Additional symptoms in decreasing importance
![Page 32: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/32.jpg)
32
VAERS Initial Diagram
![Page 33: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/33.jpg)
33
Equivalent Terms for Patient
![Page 34: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/34.jpg)
34
Property Panel for VAERS Text Miner Analysis
![Page 35: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/35.jpg)
35
Interactive Results
![Page 36: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/36.jpg)
36
Clusters Window
Why only one termwhen five wererequested?
…
![Page 37: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/37.jpg)
37
Cases with Fever
Last 16 entriesout of 98
![Page 38: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/38.jpg)
38
Headache Terms
![Page 39: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/39.jpg)
39
Headache Documents
![Page 40: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/40.jpg)
40
Terms Most Similar to Headache
![Page 41: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/41.jpg)
41
Documents Most Similar to Headache
![Page 42: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/42.jpg)
42
First 11 out of 65 Documents Filtered by Headache Terms
![Page 43: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/43.jpg)
43
VAERS Predictive Modeling Diagram
![Page 44: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/44.jpg)
44
Logistic Regression Model Effects Plot
![Page 45: Text Mining Three Cases. 2 Outline Federalist Papers SVDPDF VAERS](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e4a5503460f94b3db7d/html5/thumbnails/45.jpg)
45
Logistic Regression Lift Plot