the outlook for big data
DESCRIPTION
NFAIS 2012 Annual Conference. The Outlook for Big Data. Chris Greer Information Technology Laboratory National Institute of Standards and Technology. Article I, Section 8: The Congress shall have the power to… fix the standard of weights and measures. Mission: - PowerPoint PPT PresentationTRANSCRIPT
The Outlook for Big Data
Chris GreerInformation Technology Laboratory
National Institute of Standards and Technology
NFAIS 2012 Annual Conference
Article I, Section 8: The Congress shall have the power to…fix the standard of weights and measures
• National Bureau of Standards established by Congress in 1901
• Designated the National Institute of Standards and Technology in 1988
Mission:To promote U.S. innovation and industrial competitiveness by advancing measurement science, standards, and technology in ways that enhance economic security and improve our quality of life.
Tech
nolo
gy D
evel
opm
entIT Measurement and Testing
Mathematical and Statistical Analyses for Measurement Science
Modeling and Simulation for Measurement Science
IT Standards Development and Deployment
Big Data - Definition
• Data mass– Volume, velocity, and/or complexity
• Data-enabled analytics– Correlation and inference analyses enabled by
data mass
Data mass and/or data analytics that are beyond the capacity of your
current system
Source: John Gantz, IDC Corporation, The Expanding Digital Universe
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
1,000,000
2005 2006 2007 2008 2009 2010
Big Data - Volume
Information
Available StoragePet
abyt
es W
orld
wid
e
Big Data - Volume
Source: IDC Corporation, Worldwide Information Growth Ticker, Feb 2012
Big Data - Velocity
• Sloan Digital Sky Survey• 140 Terabytes, year 2000 to present
• LSST – Large Synoptic Survey Telescope• Expect 140 Terabytes every 5 days
• Square Kilometer Array• Expect 140 Terabytes every 3 sec
LSST:“Suspended between its vast mirrors will be a three billion-pixel sensor array, which on a clear winter night will produce 30 terabytes of data. In less than a week this remarkable telescope will map the whole night sky …. And then the next week it will do the same again … building up a database of billions of objects and millions of billions of bytes.”
Nature 440:383
Big Data - ComplexityCombining Structured and Unstructured Data
The Department of Defense’s ARPANET project, launched in 1966 to explore methods for “resource sharing among computers”, initially connected 4 nodes. Today’s Internet links more than 2.2 billion users over more than 200,000 networks worldwide; with 14 new users added every second.
Big Data – Volume, Velocity, and Complexity
Big Data - Analytics
Source: Gary Anthes, Communications of the ACM, Vol. 52 No. 11
Writing in a recent issue of the journal Science, Hod Lipson and Michael Schmidt describe how they programmed a computer to take unstructured and imperfect lab measurements from swinging pendulums and mechanical oscillators and, with just the slightest initial direction - and no knowledge of physics, mechanics, or geometry - derive equations representing fundamental laws of nature.
The End of Theory: The Data Deluge Makes the Scientific Method Obsolete
Google's founding philosophy is that we don't know why this page is better than that one: If the statistics … say it is, that's good enough. No semantic or causal analysis is required. That's why Google can translate languages without actually "knowing" them (given equal corpus data, Google can translate Klingon into Farsi as easily as it can translate French into German).
- Chris Anderson Wired Magazine 06.23.08
Big Data - Analytics
Recommendations:
• Design and organize for data agility
• Treat data as assets
Over the next decade, the number of servers (virtual and physical) worldwide will grow by a factor of 10, the amount of information managed by enterprise [and cloud] datacenters will grow by a factor of 50, and the number of files the datacenter will have to deal with will grow by a factor of 75, at least.
Design and Organize for Data Agility
J. Gantz and D. Reinsel, Extracting Value from Chaos, IDC Corp., June 2011
NFAIS 2012 Annual Conference
Do the capabilities of your current, in-house IT systems meet the big data needs of your organization?
1. Yes
2. No
Source: Frontiers in Plant Science, SA Goff et al., 25 Jul 2011; www.iplantcollaborative.org
The iPlant platform helps researchers use tools and data more easily and efficiently. It provides sustainable access to high performance computing, interoperable software analysis, and large data sets.
Design and Organize for Data Agility
I.B.M., seeing an opportunity in data-hunting services,
created a Business Analytics and Optimization Services
group in April. The unit will tap the expertise of the more
than 200 mathematicians, statisticians and other data
analysts in its research labs — but that number is not
enough. I.B.M. plans to retrain or hire 4,000 more
analysts across the company.
Design and Organize for Data Agility
S. Lohr, New York Times, Aug. 5, 2009
NFAIS 2012 Annual Conference
Does your organization employ any mathematicians or statisticians?
1. Yes
2. No
Treat Data as Assets
• Organizational Data Policy
• Data Management Plans
• Risk Management Plans
• Designed-in Information Security
NFAIS 2012 Annual Conference
Does your organization have a formal data management plan describing preservation, access, and use policies?
1. Yes
2. No