text mining sas-l topics
DESCRIPTION
Text Mining SAS-L Topics. Larry Hoyle, Policy Research Institute, University of Kansas. SAS-L topics. Read each weekly topic list from http://www.listserv.uga.edu/archives/sas-l.html Parse topic, HTMLdecode Strip “Re: “ /* strip variations of re: */ - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/1.jpg)
Hoyle paper 019-31
SUGI 31
Text Mining SAS-L Topics
Larry Hoyle, Policy Research Institute, University of Kansas
![Page 2: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/2.jpg)
Hoyle paper 019-31
SUGI 31 SAS-L topics• Read each weekly topic list from
http://www.listserv.uga.edu/archives/sas-l.html
• Parse topic, HTMLdecode
• Strip “Re: “ /* strip variations of re: */
topicRE = prxparse('/^ *[R|r][E|e] *: *(.*)/');
if prxmatch(topicRE, topic) then do;
topic = prxposn(topicRE, 1,topic);
end;
• Proc SQL to aggregate topic counts across weeks
![Page 3: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/3.jpg)
Hoyle paper 019-31
SUGI 31 SAS-L 2005
• 35324 thread/topic lines in the html files• 7081 threads after merging across weeks and a
little cleaning
![Page 4: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/4.jpg)
Hoyle paper 019-31
SUGI 31SAS-L Top Threads in Number of Messages
![Page 5: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/5.jpg)
Hoyle paper 019-31
SUGI 31 Text Miner on the SAS-L topics
![Page 6: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/6.jpg)
Hoyle paper 019-31
SUGI 31
![Page 7: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/7.jpg)
Hoyle paper 019-31
SUGI 31
![Page 8: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/8.jpg)
Hoyle paper 019-31
SUGI 31
![Page 9: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/9.jpg)
Hoyle paper 019-31
SUGI 31
![Page 10: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/10.jpg)
Hoyle paper 019-31
SUGI 31 Largest clusters
![Page 11: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/11.jpg)
Hoyle paper 019-31
SUGI 31 Smaller Clusters
![Page 12: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/12.jpg)
Hoyle paper 019-31
SUGI 31 Message Content
![Page 13: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/13.jpg)
Hoyle paper 019-31
SUGI 31 Web scraping with tmfilteroptions noxwait;
%macro aweek(week=0501a);
x "md C:\ddrive\projects\sugs\sugi31\SASLBOF\posts\&week";x "md C:\ddrive\projects\sugs\sugi31\SASLBOF\filteredposts\&week";
libname sugi31 'C:\ddrive\projects\sugs\sugi31\SASLBOF\datasets';
%tmfilter(dataset=sugi31.SL&week.,dir=C:\ddrive\projects\sugs\sugi31\SASLBOF\posts\&week,destdir=C:\ddrive\projects\sugs\sugi31\SASLBOF\filteredPosts\&week,URL=http://listserv.uga.edu/cgi-bin/wa?A1=ind&week.%NRSTR(&L=sas-l),
depth=1,links=sugi31.SL&week.L,norestrict=' ',
numchars=2000)
%mend aweek;
%aweek(week=0501a);%aweek(week=0501b);
![Page 14: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/14.jpg)
Hoyle paper 019-31
SUGI 31 Parse date and sender
![Page 15: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/15.jpg)
Hoyle paper 019-31
SUGI 31Using a 10% sample of message text
![Page 16: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/16.jpg)
Hoyle paper 019-31
SUGI 31Using a 10% sample of message text
![Page 17: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/17.jpg)
Hoyle paper 019-31
SUGI 31Filter out too common terms, listserv
![Page 18: Text Mining SAS-L Topics](https://reader038.vdocuments.mx/reader038/viewer/2022110101/56812cd1550346895d918c75/html5/thumbnails/18.jpg)
Hoyle paper 019-31
SUGI 31Filter out too common terms, listserv