descriptive analysis of donation amount data and text ...descriptive analysis of donation amount...

14
Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam, Swetha Nallamala & Dr. Goutam Chakraborty, Oklahoma State University ABSTRACT 2016 will be an historical year in the US presidential elections, Hillary Clinton being the first women candidate, comments made by the Trump etc., everything make it to that. Twitter is one of the major social platform widely used to make any comments about this election and the candidates. This paper is a summary of analysis made on people’s reaction for the stand taken by these candidates on different issues. Descriptive statistics is also performed on the donations made to these candidates by the people. Twitter feeds have been collected for the text mining and data of donations has been collected from the Federal Election Commission database. In this paper, we used SAS® Enterprise Miner™ to build models for the analysis of these tweets. SAS® Enterprise Guide™ has also been used for the data cleaning and descriptive statistics. Enterprise Miner™ nodes has helped us to analyze the people’s tweets on different issues. Text cluster and Text topic nodes enabled us to cluster the tweets into similar subjects like the issues, candidate names etc. Guide enabled us to get the statistics on the donations made by the people from different parts of the country. INTRODUCTION When the presidential candidate makes a bold statement on different issues, its response is seen all over the world with social media flooding with messages, news channels repeating the telecast all through the day, debate sessions, etc. Such is the news created by the elections of the most powerful country. We have observed and analyzed the reactions made by the public on different issues of the country. We have also analyzed the statistics of donations made by the public for the candidate’s election campaign. The objective of this paper is to use SAS® Enterprise Miner™ for the analysis of the public tweets and SAS® Enterprise Guide™ for the analysis of the donations made for the campaign. Election campaigns can use this analysis to know whether the public reaction is positive or negative based on the stand taken by the candidates on different issues. DATA ACCESS The data of the donations made to the campaign was collected from the Federal Election Commission database. It contains 1.1 million donations made to both the candidates. The twitter data was scraped from the twitter for text analysis. We have downloaded this data from the twitter from the keywords like Hillary and guns, Trump and guns, Hillary and immigration etc. These tweets were analyzed for this project. We have approximately collected around 17500 tweets for our analysis.

Upload: others

Post on 01-Mar-2020

37 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential

Primaries using SAS® Aditya Jakkam, Swetha Nallamala & Dr. Goutam Chakraborty, Oklahoma State University

ABSTRACT

2016 will be an historical year in the US presidential elections, Hillary Clinton being the first women candidate, comments made by the Trump etc., everything make it to that. Twitter is one of the major social platform widely used to make any comments about this election and the candidates. This paper is a summary of analysis made on people’s reaction for the stand taken by these candidates on different issues. Descriptive statistics is also performed on the donations made to these candidates by the people. Twitter feeds have been collected for the text mining and data of donations has been collected from the Federal Election Commission database.

In this paper, we used SAS® Enterprise Miner™ to build models for the analysis of these tweets. SAS® Enterprise Guide™ has also been used for the data cleaning and descriptive statistics. Enterprise Miner™ nodes has helped us to analyze the people’s tweets on different issues. Text cluster and Text topic nodes enabled us to cluster the tweets into similar subjects like the issues, candidate names etc. Guide enabled us to get the statistics on the donations made by the people from different parts of the country.

INTRODUCTION

When the presidential candidate makes a bold statement on different issues, its response is seen all over the world with social media flooding with messages, news channels repeating the telecast all through the day, debate sessions, etc. Such is the news created by the elections of the most powerful country. We have observed and analyzed the reactions made by the public on different issues of the country. We have also analyzed the statistics of donations made by the public for the candidate’s election campaign. The objective of this paper is to use SAS® Enterprise Miner™ for the analysis of the public tweets and SAS® Enterprise Guide™ for the analysis of the donations made for the campaign. Election campaigns can use this analysis to know whether the public reaction is positive or negative based on the stand taken by the candidates on different issues.

DATA ACCESS

The data of the donations made to the campaign was collected from the Federal Election Commission database. It contains 1.1 million donations made to both the candidates. The twitter data was scraped from the twitter for text analysis. We have downloaded this data from the twitter from the keywords like Hillary and guns, Trump and guns, Hillary and immigration etc. These tweets were analyzed for this project. We have approximately collected around 17500 tweets for our analysis.

Page 2: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

DATA DICTIONARY

Variable Level Description

ID ID This field represents the unique review number

Review Text This variable represents the actual tweet posted by the tweeter

Contbr_nm Name of the contributor donated to the campaign

Contbr_city This variable represents the city from which the donation was made

Contbr_st This variable represents the state from which the donation was made

Contbr_zip Zip code of the place from where the donor donated the amount

Contbr_employer Name or Type of the employer where the donor is working

Contbr_occupation Occupation type of the donor

_TEMG001 Integer The amount of the donation made by the donor

Table 1: Data Dictionary

DATA PREPARATION

The raw donor data extracted has almost 1.1 million rows and 18 variables. The data has multiple donations made by single users. This has been combined by using SQL in SAS® Enterprise Guide™. After combining each individual donations into a single row, our data has 230449 observations for Hillary and 26159 observations for Trump. We removed the unnecessary variables and our final data has a total of 7 variables. Our twitter has been cleaned by removing the multiple urls linked to the tweets. After cleaning we have a total of 17500 tweets for different issues and 2 variables.

DESCRITIVE ANALYSIS

Descriptive statistics has been done on the donor data using SAS® Enterprise Guide™ on both the candidates Hillary Clinton and Donald Trump. This will give us a clear picture on how the donations are made to the candidates from different regions.

HILLARY STATISTICS

The donations received for the Hillary Clinton is mainly from New York and California States. The donations from New York is approximately 71 million dollars and that from the California state is 58 million dollars. The next best states are Florida and Texas with approximately 15 million dollars from each state. To analyze our statistics further we performed our analysis on the number of donors donating from different states.

Page 3: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

Figure 1: State wise donations for the Hillary Clinton campaign

Figure 2: Number of Donors to the Hillary Clinton campaign (State wise)

Page 4: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

From the analysis we got the number of donors from California is more than the number of donors from the New York even though the donations made from New York is more in terms of money. The number of donors from California is 45000.

Figure 3: Summary statistics of Hillary Clinton donations

The maximum donation made from a single donor is approximately 29 million dollars. This donation has come from New York, so this made a huge difference in the contribution sum from state wise to the number of contributors. If we remove the single donor from the New York list then the sum of donation and the number of donors to the Hillary Clinton campaign is highest from California State.

TRUMP STATISTICS

Figure 4: State wise donations for the Donald Trump campaign

The above graph shows the statistics of donations made to the Trump campaign by donors. Major part of the donations are received from Texas, California, Florida, Georgia and New York. Compared to Hillary Clinton the donations percentage from Texas, Florida and Georgia is more to the Trump campaign. New York percentage is less whereas it is one of the top most donated place for Hillary Campaign.

Page 5: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

Figure 5: Number of Donors to the Donald Trump campaign (State wise)

The major part of the donors are from California, New York, Florida and Texas. The number of donors from California is 3313 and that from the Texas is 2998. The total number of donors to the Trump campaign is only 27588 when compared to the 254165 donors for the Hillary Clinton campaign.

Figure 6: Summary statistics of Donald Trump donations

The maximum donation received to the Trump campaign is $10800. The total donation received for the Trump campaign is approximately $15.7 million which when compared to the Hillary’s is far less. The total donations received to the Hillary’s campaign is approximately $300 million and the highest donation received to her is approximately $29 million. The total donations received to the Trump campaign is not even close to the highest amount of donation received to the Hillary campaign.

METHODOLOGY

Figure 7: Text Mining Process

Page 6: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

FILE IMPORT

Since we have the twitter data available in excel spreadsheets, we are using File Import node to import the data into the SAS® Enterprise Miner™. The source folder is pointed to the path where we have all the excel sheets which have specific tweets for each issue in each excel sheet. We pointed to one of the excel sheets and then ran the model. After the final results are attained, we derived the conclusions from the results and then proceeded with the other sheets with other issue related tweets.

TEXT PARSING

After importing the excel file, we added the text parsing node to the File Import node. We used the properties panel with default settings. The text parsing node generates the default term by frequency document which helped us in better understanding of the data. The results comprise of the frequency of the terms and the number of documents it appeared in. These results can help us know the words that found their way many times in these documents.

TEXT FILTER

The Text Filter node is attached to the Text Parsing node. The properties panel settings are slightly altered by deciding the limit of the frequency of the documents (we considered it as 4) a word appeared in, for it to be used in the model. Spell check is also enabled for eliminating any miss spells found in the data and replace them with the correct spellings. Thus we can get more accurate analysis.

Any irrelevant words are eliminated and the words which make sense are retained. Also the synonyms we found in the data are combined together.

CONCEPT LINKS

Text Filter properties panel has a feature to view the interactive filter viewer output. The results from this window can be used to generate concept links for any specific word by selecting desired word and right click and select View Concept Links. In a concept link generated, the word for which we derived the links stands in the middle. All the words which are strongly associated with it are linked to it. The width of the link depicts the strength of the association. By selecting on any associated word, we can view the number of times the association occurred.

Page 7: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

For each individual issue we drew concept links.

GUNS

Figure 8: Generated Concept Links for Hillary (Issue on Guns)

From the above concept links, we can observe that Hillary has a lot of negativity on her stand taken on guns, which was misinterpreted as taking away guns entirely. She was constantly named as a dictator and compared to Hitler & Stalin. People even started finding similarities between Obama and Hillary on their stands taken regarding guns. The term resistance is also strongly associated with Hillary referring that resistance occurs when Hillary bans guns.

Trump on the other hand, have support from the tweets, this can be seen from the strongly linked to the word “support”. People calling out patriots for fight against terrorism with guns. Their tweets are positive to Trump on this aspect. All the other highly associated terms are general terms when we talk about guns like keep, cop etc. This means they are relating it to Police, asking him to keep the guns etc. Vote is strongly associated as people are tweeting “vote for Trump to keep your guns at home.”

Figure 9: Generated Concept Links for Trump (Issue on Guns)

Page 8: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

TAXES

Figure 10: Generated Concept Links for Trump (Issue on Taxes)

People are expressing satisfaction with Hillary Clinton for her stand on taxes, but highly discredited that if she becomes President, there is a greater chance that middle class people get to pay more taxes. The words right, tax are also associated with Hillary.

Trump, on the other hand is constantly challenged by people for his stand on taxes. Hillary Clinton and Tim Kaine show more tax returns, pressuring Donald Trump to do the same. This can be seen from the generated concept link “lower” and “tackle”. He has been termed as a “faker” and he has been consistently asked to propose his running mate.

Figure 11: Generated Concept Links for Hillary (Issue on Taxes)

Page 9: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

Figure 12: Concept Links for Trump

Bunch term found its way in association and when we searched the tweets about it, we found that people are using, “Bunch of Steves” in reference to the mockery by Hillary stating Donald’s advisors as six guys named Steve. People tend to pick up on the names the candidates are calling each other. People are expressing satisfaction with Hillary showing her taxes. But highly discredited that if she comes there is a greater chance that middle class people get to pay more taxes. Trump, on the other hand is constantly challenged by people for his stand on taxes.

IMMIGRATION

Figure 13: Generated Concept Links for Trump (Issue on Immigration)

Page 10: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

A tweet by Ezra Klein, Washington Post columnist about Trump’s supporters being less affected by trade and immigration is retweeted a lot. Thus the handle ezraklein found its way into the concept link. Our research has indicated that Trump’s supporters are not caring much about his policies indeed but racial distinction is being used by him for gaining more attention. Also Melania Trump immigration is causing a lot of stir, which is making people demand Trump to reveal her immigration papers. The word “Hindi” has been strongly linked in the concept link because a Hindu women is supporting the Donald Trump’s statement that “Muslims want to take over the world.” The word here should be “Hindu” but it became “Hindi” because of the VICE columnist mistake of the tweet. This tweet has been retweeted and got a strong link in our concept links.

ECONOMY

Figure 14: Generated Concept Links for Trump (Issue on Economy)

Figure 15: Generated Concept Links for Hillary (Issue on Economy)

Page 11: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

Hillary’s recent statement that she's going to put a lot of coal miners and coal companies out of business fix has become highly controversial and hence the word coal found its way in the concept link. Even after Hillary apologizing that it was a mistake and she did not intend saying so, people did not forget the issue and still tweets are swirling about this issue. The word “American” has been strongly related to both Hillary and Trump because of the stand taken by Trump to produce goods in America rather than outsourcing it.

Trump on the other hand has a phrase “attracting angry white men” strongly associated with it. This shows the people who are angry on outsourcing the resources from America was attracted to the stand taken by Trump on discontinuing it. All the other terms in the association are general terms like job, living which obviously come up when talking about economy

FOREIGN POLICY

Figure 16: Generated Concept Links for Trump (Issue on Foreign Policy)

There are some tweets retweeted showing that they agree with the tweeter and his/her tweet. One of them is from sarahkendzior: Never forget the danger is not just Trump, but the paranoia and violence he cultivates.

Also Trump admiring Putin is causing havoc among voters as they are retweeting the statement. Lizwahl: Trump admires a man that routinely bombs hospitals and uses its media to deny those bombings happened #TrumpPutin

Putin has also admired Trump for his policies and stand taken on different issues.

Page 12: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

GAY MARRIAGE

Figure 17: Generated Concept Links for Trump (Issue on Gay Marriage)

Terms like ban, support are being associated which makes sense as people either tend to support or ban gay marriages. Hillary is called gay marriage protector party and Trump is supported by few and contradicted by few on his stand taken on the issue.

EDUCATION

People retweeted Hillary’s statement "Hard to believe they spent so much time talking about me and no time talking about jobs or education or health care" on Trump. This shows their agreement that Trump needs to address Education issue which is having much importance in “MAKE AMERICA GREAT AGAIN”.

TEXT CLUSTERING

By adding the Text Cluster node to the Text Filter node, we can group similar terms in the data. We used Expectation-Maximization cluster algorithm to generate clusters. We used the clusters obtained for making better sense out of the associations we observed in the concept links. The consolidation of tweets has been explained below:

Page 13: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

Text Clusters Explanation

+Isis, +Obama, +lie, +support, +weak, +hillary Explains the opposition to Hillary and comparing her to Obama

+patriot, +protest, +trump, +gun Explains the Trump and Guns related cluster

+free, +healthcare, +job, +promise Cluster with healthcare related terms

+gay marriage, +protection, +support, +openly Gay marriage positive cluster

+gays, +ban, +vote Gay marriage negative cluster

Table 2: Explanations of clusters

TEXT TOPIC

Text topic node is connected to the text filter node in SAS® Enterprise Miner™. This node enables to combine the terms into topics for further analysis. We set the properties to generate 7 topics. We used the results from these to better understand the associations that are observed in the concept links.

CONCLUSION

Overall, we found that people are using twitter as a platform for expressing their dissatisfaction on the stand taken by the candidates on certain issues. After analyzing the twitter data on each issue independently, we found out that there are certain issues raising against the candidates which can cause a toll in their numbers during election. Hillary needs to address her stand on guns and the coal miners issue as people are interpreting her statements in a way causing negative impact on her campaign. Trump needs to address his stand on immigration and taxes as people really want him to be transparent in these issues. Overall, gay marriage, veterans and education issues are neutral in both the candidates favor as people are not entirely opposing either of their policies on these issues.

FUTURE SCOPE

We want to analyze and predict based on the donation data and twitter feeds that how public is responding by donating funds to the campaign based on the stands taken by respective candidates? This is our future scope for this project.

REFERENCES

1) Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS® by Goutam Chakraborty, Murali Pagolu, Satish Garla.

2) SAS® Institute Inc. 2014. Getting Started with SAS® Text Miner 13.2. Cary, NC: SAS® Institute Inc.

ACKNOWLEDGEMENTS

We thank WUSS committee for giving us this wonderful opportunity to present our paper. We also thank Dr. Goutam Chakraborty for his continuous guidance and support.

Page 14: Descriptive Analysis of Donation Amount Data and Text ...Descriptive Analysis of Donation Amount Data and Text Mining of Tweets on Presidential Primaries using SAS® Aditya Jakkam,

CONTACT INFORMATION

Your comments and questions are valued and encouraged. Please contact us at:

Aditya Jakkam Oklahoma State University Phone: 405-334-7573 Email: [email protected] Aditya Jakkam is a graduate student enrolled in Business Analytics at the Spears Scholl of Business, Oklahoma State University. He has work experience of three years as a Programmer Analyst at Cognizant Technology Solutions, India. He is a SAS® Certified Base Programmer, SAS® Certified Statistical Business Analyst and also holds Google Analytics certification. He has the SAS® and Oklahoma State University Data Mining Certificate. He has a co-authored paper presentation in SCSUG conference in 2016 and have a poster presentation at the SAS® Analytics conference in 2016.

Swetha Nallamala Oklahoma State University Phone: 405-780-5437 Email: [email protected] Swetha Nallamala is a graduate student enrolled in Masters in Business Analytics at the Spears School of Business, Oklahoma State University. She has worked as a Business Analyst Intern at MaxQ Research LLC for three months during summer 2016. She has over two and half year experience as a Senior Systems Engineer at Infosys Limited, India. She is a SAS® Certified Base Programmer, SAS® Certified Statistical Business Analyst, SAS® Certified Advanced Programmer and also holds Google Analytics certification. She has a co-authored paper presentation in SCSUG conference in 2016 and have a poster presentation at the SAS® Analytics conference in 2016.

Dr. Goutam Chakraborty Oklahoma State University Email: [email protected] Dr. Goutam Chakraborty is Ralph A. and Peggy A. Brenneman professor of marketing and founder of SAS® and OSU data mining certificate and SAS® and OSU marketing analytics certificate at Oklahoma State University. He has published many journals such as Journal of Interactive Marketing, Journal of Advertising Research, Journal of Advertising, Journal of Business Research, etc. He has over 25 Years of experience in using SAS® for data analysis. He is also a Business Knowledge Series instructor for SAS®.

SAS® and all other SAS® Institute Inc. product or service names are registered trademarks or trademarks of SAS® Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.