revenue & employment analysis of international students in usa using pyhive
TRANSCRIPT
![Page 1: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/1.jpg)
Revenue & employment Analysis of International Students in USA
Team Members: Priyanka Kale, Apekshit Bhingardive, Aditya VermaGuide: Dr. Jongwook Woo
24th Annual Student Symposium, CSULA26th February 2016
![Page 2: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/2.jpg)
What is Big Data?
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.
It's not the amount of data that's important. It's what we do with the data that matters.
Machine Learning: big data often doesn't ask why and simply detects patterns.
Digital footprint: big data is often a cost-free byproduct of digital interaction.
![Page 3: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/3.jpg)
Purpose of Analysis
To develop a system which will assist us to determine the revenue generated by international students.
Examining the relationship between new international enrollments and institutional income at public colleges, universities and professional organizations in the US.
![Page 4: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/4.jpg)
Continued..
To understand the effects of increased international student enrollment on net revenue generation in US
Find out the income from Universities
Predict the impact of international students on revenue generation
Predict employment opportunities in the US
![Page 5: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/5.jpg)
• Basic formula for calculating economic Benefit
![Page 6: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/6.jpg)
Analysis is done using:
Analysis on huge data is done using the Hadoop File system (HDFS)
Hadoop environment using Horton Sandbox on Azure
Using Python and HIVE [Pyhive] – iPython Notebook
HUE
Google Fusion tables
WEKA Framework
![Page 7: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/7.jpg)
Loading data into HDFS: File has been uploaded using Hadoop command line
Interface
![Page 8: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/8.jpg)
Hortonworks Sandbox configuration
Number of nodes: 3 Size : Basic A4 with 8 cores 14 Gb memory
![Page 9: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/9.jpg)
Creating tables in HUE from existing data
![Page 10: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/10.jpg)
Connecting HIVE through Python Using Ipython notebook for writing the python
code
Embedding HiveQL inside python code.
![Page 11: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/11.jpg)
Executing the Hive script from python code:
![Page 12: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/12.jpg)
Visualizing data with Graphs
Alabam
a
Alask
a
Arizon
a
Arkan
sas
Califo
rnia
Color
ado
Connec
ticut
Delawar
e
Distric
t of C
olumbia
Feder
ated
State
s of M
icron
esia
Florid
a
Georg
iaGua
mHaw
aii
Idaho
Illinois
Indian
aIow
a
Kansa
s
Kentu
cky
Louisi
anaMain
e
Marsh
all Is
lands
Maryla
nd
Massa
chus
etts
Michiga
n
Minnes
ota
Mississ
ippi
Missou
ri
Monta
na
Nebra
ska
Nevad
a
New H
amps
hire
New Je
rsey
New M
exico
New Yor
k
North
Caro
lina
North
Dak
otaOhio
Oklaho
ma
Oregon
Palau
Pennsy
lvania
Puerto
Rico
Rhode I
sland
South
Caroli
na
South
Dak
ota
Tenn
esse
eTe
xas
$0.00
$5,000,000,000.00
$10,000,000,000.00
$15,000,000,000.00
$20,000,000,000.00
$25,000,000,000.00
TOTAL EARNING FROM FEES
![Page 13: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/13.jpg)
Major earning states
California; 9.55%
New York; 10.84%
Pennsylvania; 7.36%
Percentage of total income
CaliforniaNew YorkPennsylvania
![Page 14: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/14.jpg)
Visualizing Data in Google Fusion Tables
![Page 15: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/15.jpg)
Supervised Learning using Classification:
WEKA framework has been used to classify the states depending on there total value of earnings.
UserClassifier Algorithm provided by WEKA tool has been used to generate graph of classification.
Final outcome of the Hive script executed in python has been processed using above mentioned algorithm.
![Page 16: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/16.jpg)
Continued.. The class color differentiate the states into categories : For instance New York lies in orange color zone with being the among the top revenue generating state
![Page 17: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/17.jpg)
Value Proposition:
International Students mobility trends: By 2017, the global middle class is projected to increase its spending on educational products and services by nearly 50 percent.
Institutions can take this growth into consideration!
United States a more welcoming nation!
![Page 18: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/18.jpg)
Predictive Modelling:
![Page 19: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/19.jpg)
Employment Analysis – How ? Finding data where international student work after their graduation
Based on the number students employed in current and past years
Number of employers hiring international students in every filed of the grad study [Job positions]
![Page 20: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/20.jpg)
References :
https://nces.ed.gov/ipeds/datacenter/
https://github.com/priya708/Project-528
https://gitlab.com/Addylad/Project528BigData/tree/47b3e6469bff4e9b7cbe0d743cb8ad9520dbb786/DataSource
https://cwiki.apache.org/confluence/display/Hive/Tutorial
https://hortonworks.com/tutorials
http://www.nafsa.org/
![Page 21: Revenue & Employment Analysis of International Students in USA using PyHive](https://reader035.vdocuments.mx/reader035/viewer/2022070603/5872111e1a28ab3f188b500d/html5/thumbnails/21.jpg)
Thank You!