leveraging artificial intelligence and big data to create value...leveraging artificial intelligence...

Post on 29-Sep-2020

21 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Leveraging Artificial Intelligence and Big Data to Create Value

Director, INSITE Center for Business Intelligence and Analytics

Anheuser-Busch Professor of MIS, Entrepreneurship & Innovation

Professor of Computer Science

Eller College of Management

Email: ram@eller.arizona.edu

Dr. Sudha Ram

August 19, 2020EROSS-2020

BIG DATA: From Petabytes to ZettaBytes

2

Meaning of “BIG”

Meaning of “BIG”

5

Big Data – Traditionally Defined

VOLUME VARIETY

VELOCITYVERACITY

VALUE

Diverse Sources of Data

Many Different Sources generating Data

An Internet Minute

PARADIGM SHIFT

PARADIGM SHIFT!“Datafication” of the

world

Sensors embedded in Physical Objects

IP Protocol based communication

Health Internet of Things

Paradigm Shift

Temporal and Spatial Dimensions

Billions of Users and Objects

Leaving Massive Traces of Activity

“Laboratory” for understanding the pulse

of humanity

12

QUEST for the HOLY

GRAIL

Predicting the Future

13

INSITE Center for Business Intelligence and Analytics

• Interdisciplinary Research Center at University of Arizona

• www.insiteua.org

14

Creating a Smarter/Better World• Data Science and Network Science• Visualizations Using Time and Space• Scalable techniques for network analysis and graph mining• Predictive Modeling • Train students in Data science• Work on interesting research projects with industry partners to

solve real world problems

15

RESEARCH PROJECTS• Health Care• Education• News Media/Journalism• Crowdfunding• Crowdsourcing• Internet of Things and Wearable devices• Social Media

SOCIAL IMPLICATIONS

16

Leveraging Data Science• Define a problem/challenge• Identify signals • Use data science methods • Solve the problemRepurposing

Data is Key

17

PREDICTION MODELSPredict Emergency Department Visits in near Real Time

Using Big Data

Freshman Retention Prediction

COVID-19 Research

18

Leverage Big dataBig Data not just about volume

• Social media

• Internet search

• Environmental sensors

• Wearable sensors

• Spatial and Temporal Dimensions

• Fine Grained - Spatial/Temporal

19

Focus on Asthma• 25 million people affected in the United States

• 2 million emergency department (ED) visits

• 0.5 million hospitalizations

• 3,500 deaths

• 50 billion dollars in medical costs annually

• 11 million missed school days every year

• 14 million missed work days every year

Source: CDC Reports (2011, 2012)

20

Pediatric asthma ER Visits, USA, 2011

21

Our Research ObjectiveDevelop Robust Models to predict Asthma Related Emergency Department Visits in near Real Time Using Big DataPartner: Parkland Center for Clinical InnovationJoint work with Wenli Zhang, Dr. Yolande Pengetenze, Max Williams, funded in part by Parkland Center for Clinical Innovation

22

Leverage Big dataBig Data not just about volume

• Social media

• Internet search

• Environmental sensors

• Wearable sensors

• Spatial and Temporal Dimensions

• Fine Grained - Spatial/Temporal

23

EXTRACTING SIGNAL from Noisy DataTrue asthma related tweets Not actually related to asthma

24

Asthma Related Tweets

25

Asthma Related Tweets

26

Asthma Keywords

Asthma

Inhaler

Sneezing

Runny Nose

Wheezing

27

Asthma Keywords

Asthma

Inhaler

Sneezing

Runny Nose

Wheezing

28

Asthma-Related Stream

Twitter Asthma Stream - United States

Asthma related tweets, United States, (Asthma stream, 11 Oct, 2013 – 31 Dec, 2013)

29

Extracting Signals

1. Tweets indicating awareness of disease, E.G., “Hope I don’t get an asthma attack again today..”

2. Using disease as rhetoric, e.G., “He is so cute I think I got asthma”

Distinguish tweets that are relevant to asthma from tweets that mentioned asthma in an irrelevant context.

30

Emergency Room Visits and Tweets

31

Air Quality Sensor Data• Identify and include AQI data from a specific

geographic region.

• Collected pollution data from 27 air quality sites around the Dallas area.

• Selected sites closest to the zip codes of the ED asthma patients in our ED visits dataset. Using this data, we calculated daily average AQI for our model.

32

Pollutants• CO: Carbon monoxide• NO2: Nitrogen dioxide • O3: Ozone • Pb: Lead• PM2.5: Atmospheric particulate matter, diameter of 2.5 micrometres

or less • PM10: Atmospheric particulate matter, diameter of 10 micrometres

or less • SO: Sulfur monoxide

33

EPA Pollution Sensor Data and Emergency Visits

34

Prediction Models Using Streaming Data

• Air Quality Sensor data streams• Tweets• Google Trends search data• Machine Learning Techniques to predict

number of ED visits per day with high accuracy

35

Best Predictors

Successfully predicted with 80% accuracy

• # of asthma tweets

• CO

• NO2

• PM2.5

36

USEFUL for Public Health NOTIFICATION

I. Epidemiologic surveillance of asthma disease activity in the community, e.g., the department of health and human services (DHHS)

II. Stakeholders notifications of community-level asthma-disease activity and risk factors

37

Hospital/ED Preparedness

Predicting asthma ED visits and staffing ED consequently

38

Targeted Patient InterventionsTargeted patient interventions using patient address and geo-localization data for tweets. E.g., patient alerts about asthma risks and counseling for preventive methods.

39

ContributionsPromising ResultsDemonstrate the utility and value of linking big data from diverse sources in developing predictive models for non-communicable diseasesSpecific focus on asthmaRelevant for other chronic conditions – Diabetes, Cardiac problems, Obesity

40

Internet of Things and Big Data

Big Data for Improving EducationInternet of Things: Smart Cards, WifiLogs, Mobile Apps

41

BUILDING A SMARTER CAMPUS

Combining Network Science and Machine Learning

42

Societal Challenge: Student RetentionProactive Prediction is very ImportantSocial Science theories indicate:• Social Interactions• Regularity of Routine

ObjectivePredict freshman retention at individual levelMake proactive prediction before knowing first term GPALearn students’ behavioral patterns from their CatCardtransactionsProvide actionable suggestions for retention management

BIG DATAInstitutional Student Dataset

~ 7000 full-time registered freshmen, 6500 are left after removing international students for whom SAT scores or high school GPAs were not available479 (7.37%) drop-out after Fall and 843 (12.98%) drop-out at the end of Spring

SmartCard Transaction Dataset1.8 million transactions made by freshmen from Aug 2012 thru May 2013271 different locations include restaurants, vending machines, printers, parking, labs.

Behavior and Interactions

46

Patterns and Differences

Movement and Behavior

COMPUTATIONAL and NETWORK SCIENCE APPROACH

Fills gaps in behavioral and extant data-driven approachesNew prediction approach

CatCard transactions implicit social networks and spatial sequences

Proactive predictionPredicting retention beforethe end of 1st semester with 90% recall

COVID-19 Related Research Projects

49

What is Contact Tracing?Digital vs. Manual Methods Three Different methods

a. Manual contact Tracingb. Manual with Digital assistance from Prompted Mobility Pathway

aka Memory Joggerc. Digital: BlueTooth App for exposure notification

50

51

Memory Jogger using Wifi Logs

Working with Jeremy Frumkin, Research and Discovery Technologies

Using Wifi network logs with Catcard data to support strategic efforts related to congestion tracking on campus and managing campus foot traffic

Understanding Movement Patterns among Campus spacesComplementing app-based and manual contact tracing efforts with

the additional insights that can be gained through the wifi logs.Design a Memory Jogger – prompted Mobility pathway tool to

enhance manual contact tracing

53

Traffic/Crowd Analysis

Select Date: Feb 3, 2020

Time 8 am-9 am

Building

User types

Traffic on campus between8am and 9amTop ten traffic spots visualized and compared with selected building (in red)

Comparison of hourly Traffic in selected building

To compare the three methods for Contact Tracing and Exposure notification. How do the three contact tracing approaches differ in their outcomes such

as timeliness and coverage of contacts and other metrics? How do these methods complement each other and what are their relative

strengths and weaknesses? How do these methods perform overall in preserving privacy while allowing

for comprehensive contact tracing? What are the tradeoffs? How acceptable are these three strategies to the community and what is

an effective path to deploying comprehensive contact tracing?

55

56

Some General Lessons

• Need for complex techniques? • Is causality really necessary for prediction?• What level of accuracy is good?• Working with your stakeholders is important• Research is very important in training next

generation scientists, end users, students, others

57

Some General Lessons• Focus on defining the problem carefully • Out of the Box thinking• Big Data: Don’t think of it as a single very large dataset • Repurpose and combine different types of data• Exploit the granularity of data especially the spatial and

temporal features: Machine learning and network science

• Extracting Signal from Noise

Good News

58

McKinsey in 2015: predicted that by 2020 the number of data science jobs in the United States alone will exceed 500,000, but there will be fewer than 200,000 available data scientists to fill these positions. Globally, demand for data scientists was projected to exceed supply by more than 50 percent by 2020.

IBM today: Annual demand for the fast-growing new roles of data scientist, data developers, and data engineers will reach nearly 700,000 openings by 2020.

59

CONCLUSION

• PARADIGM SHIFT• BIG DATA HAS A LOT OF HIDDEN VALUE• LET’S LEVERAGE IT USING AI TO CREATE A

BETTER WORLD!

60

QUESTIONS??

TEDx Talk:

http://tedxtucson.com/portfolio/sudha-ram/www.insiteua.org

Email: ram@eller.arizona.eduTwitter: @sudharam

top related