big data challenges: data management, analytics & security - bigdatachallenges_aaas_2014.pdfbig...

16
Big Data Challenges: Data Management, Analytics & Security Ivo D. Dinov Statistics Online Computational Resource University of Michigan www.SOCR.umich.edu

Upload: others

Post on 23-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Big Data Challenges: Data Management, Analytics & Security

Ivo D. Dinov

Statistics Online Computational Resource University of Michigan

www.SOCR.umich.edu

Page 2: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Big Data Challenges • Availability, Sharing, Aggregation and Services • Classical Data Science vs. Innovative Big Data Science

– Amateur Scientists vs. “Experts” – Data Scientists vs. Practitioners – Domain-specific vs. Trans-disciplinary knowledge

• Commercial vs. Open-source Resourceome • Rapid Big Data Evolution • Big Data IT proliferation • Big Data Security risks • Centralization won’t work in Big Data Space • Big Data is incredibly time, space, protocol, context dependent!

Page 3: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Big Data Characteristics

* Mixture of quantitative & qualitative estimates Dinov et al. (2013)

Page 4: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Availability, Sharing, Aggregation & Services • Cisco: "By the end of 2012, the number of mobile-connected

devices [exceeded] the number of people on Earth” • There will be over 10 billion mobile-connected devices in 2016;

i.e., there will be 1.3 mobile devices per capita

U.S. Bureau of Labor Statistics M

cKinsey Global Institute

Perc

ent G

row

th

Big Data Value Potential Index

Bubble Size ~ Relative size of GDP Industry Sector Computer & Electronic Products Information Services Manufacturing Admin, support & waste management Transportation & Warehousing Wholesale Trade Professional Services Healthcare Providers Real Estate and Rental Finance and Insurance Utilities Retail Trade Government Accomodation & Food Arts & Enterntainment Corporate Management Other Services Construction Education Services Natural Resources

Page 5: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Amateur Scientists vs. “Experts”

• Democratization of Big Data Science • Doctorate studies/certification is not mandatory nor does it

guarantee appropriate Big Data expertise • Lower barriers of entry • Demand for constant “Continuing Education” and self-training • Dichotomy between theoretical and empirical sciences • Differences between fundamental knowledge and

experimental skills (big data properties closely approximate core scientific principles)

Page 6: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Big Data Science

Medical Sciences Social Sciences Environmental Sciences ....

Math/Stats Physics Biology Chemistry ....

Engineering Computer Science Bioinformatics Biomath/Biostats ....

Domain-specific vs. Trans-disciplinary knowledge

Page 7: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Commercial vs. Open-source Resourceome

• There is an explosion of open-data-science resources – www.data.gov – www.ncbi.nlm.nih.gov/gap

• Spawning of a number of industries and enterprises blending proprietary and open-source data, code, documentation, expert-support, infrastructure and services

• Big Data to Knowledge: www.BD2K.org • Google Cloud Platform (GCP) • Amazon Web Services (AWS)

Page 8: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Commercial vs. Open-source Resourceome

Page 9: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Rapid Big Data Evolution

• Millions of Grass-Roots initiatives addressing Big Data Challenges

• Big Data complexities require truly innovative, collaborative, trans-disciplinary solutions

• Increase of Data complexity – Sources – Heterogeneity – Datum-elements – Incongruent sampling

Page 10: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Data Scientists vs. Practitioners

• Modelers, Engineers, (Applied) Users • No one user completely understands the entire pipeline of data

provenance, processing protocols, analytic strategies, or results interpretation

• Black-boxes …. – Accuracy – Privacy concerns – Consistency – Infrastructure

Page 11: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Big Data Security Risks

• Big Data Fusion provides enormous opportunities … and presents significant challenges • Privacy, security and legal concerns, authenticity, accuracy,

consistency, reliability, availability • Healthcare

– The cloud services enable sharing big data – Significant security and privacy concerns exist, – Health Insurance Portability and Accountability Act (HIPAA) – EMR/EHR Federal, state and local regulations/policies (IRBMED)

• Genetics • Viral - Dual-use research of concern (DURC), 10.1126/science.1223995

– de novo synthesis of polio virus, the Australian mousepox experiment, the Penn State aerosolization study

Page 12: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Kryder’s law: Exponential Growth of Data

Dinov, et al., 2013

Gryo_Byte

Cryo_Short

Cryo_Color0

2E+15

4E+15

6E+15

1 µm10 µm

100 µm1mm

1cm

Gryo_ByteCryo_ShortCryo_Color

Neuroimaging(GB)

Genomics_BP(GB)Moore’s Law (1000'sTrans/CPU) 0

5000000

10000000

15000000

1985-19891990-1994 1995-1999 2000-20042005-2009

2010-20142015-2019

(estimated)

Neuroimaging(GB)Genomics_BP(GB)Moore’s Law (1000'sTrans/CPU)

Increase of Imaging Resolution

Data volume Increases faster than computational power

Page 13: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Alzheimer’s Case Study: Stable-MCI vs. MCI-Converters

Goals predictive-power of combinations of biomarkers and imaging derivative measures to provide reliable predictors of conversion from MCI to Alzheimer’s disease

Data MCI converters to AD (24-month period) and stable non-converters; matched for age, gender, handedness, education level Imaging (sMRI), Behavioral, Clinical, Neuropsychiatric, Biological data

Approach Qualitative Exploratory Data Analysis and Quantitative Statistical Analysis (morphometric imaging correlates with clinical and genetics markers)

MCI = Mild Cognitive Impairment (prelude to dementia of Alzheimer’s type)

Page 14: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data
Page 15: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Subject

Demo-graphics

Gene-tics Clinical Neuroimaging …

Index

Age

Kg

Sex

APO

E A

1

APO

E A

2

NPI

SCO

RE

MM

SE

GD

TO

TAL

CDR

FAQ

TO

TAL

L Gy

rus R

ectu

s BL

L

Supe

rior

Occ

ipita

l Gyr

us B

L R

Fusif

orm

Gy

rus B

L

L Ca

udat

e BL

R Ca

udat

e BL

L Pu

tam

en B

L

R Pu

tam

en B

L

1 65 59 F 3 4 0 23 1 0.5 7 1695 3976 8363 1296 1992 1749 2776 …

2 73 93 M 3 3 7 19 1 1 8 1333 6016 13290 835 2137 2290 4327 …

... ... ... ... ... ... ... ... ... ... ... … … … … … … … …

N 64 63 F 3 3 3 29 6 0.5 2 2237 6887 16109 1223 2222 2525 4110 …

Alzheimer’s Case Study: Stable-MCI vs. MCI-Converters

Page 16: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data

Classification Results Using Baseline Data

True State (Dx at 24 month follow up)

Converter Stable Total Hierarchical Clustering

Prediction Ana (7 Regions)

Converter TP FP TP+FP Stable FN TN FN+TN Total TP+FN FP+TN N

Metric Value

Top 7 Regions Top 20 Regions Sensitivity 0.81 1.0 Specificity 0.61 0.87 Power to detect Converters 0.91 1.0

Accuracy 0.70 0.93

Alzheimer’s Case Study: Stable-MCI vs. MCI-Converters