hi-tech barbecue/grilling @high mountain. datafy everything: what's next in digital life daniel...

Post on 19-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Hi-Tech Barbecue/Grilling @high mountain

Datafy Everything: What's Next in Digital Life

Daniel HaoTien Leedanieleewww@gmail.com

http://danieleewww.yolasite.com/2015-mgb070.php

Datafication: a process of “taking all aspects of life and turning them into data”

• Google’s augmented-reality glasses datafy the gaze

• Twitter datafies stray thoughts• LinkedIn datafies professional networks• Facebook datafies social activities• Pandora/Spotify datafies music feeling and

sensibility• Amazon datafies shopping

Once we datafy things, we can transform their purpose and turn theinformation into new forms of value.

What Behind Datafication

• Statistically N=small random sampling(carefully curated data) to approaching N=all (some messiness)

• Data – from some to all – from clean to messy– from causation to correlation -this represents a move

away from always trying to understand the deeper reasons behind how the world works to simply learning about an association among phenomena and using that to get things done.

Datafying through the Air: Go-and-Fly

• http://www.airware.com/aerial-information-platform

• https://www.youtube.com/watch?v=6ZjwgSwXfMQ

Benefit of Telematics

• Greenhouse Gas Reduction• Telematics outputs drive UPS’s planning,

training, and maintenance activities.• Mileage Reduction- Multi-million gallons of

gasoline saving yearly.• Fuel and Emissions Efficiency• Operational Improvement-- Even tiny operational

improvements from telematics data can cut millions of miles from the total.

Saving small per and earning big total amount

Eg. Outbreak Early Warning

http://www.google.org/

This serves as a reminder that predictions are only probabilities and are not always correct, especially when the basis for the prediction -- Internet searches -- is in a constant state of change and vulnerable to outside influences, such as media reports. Still, big data can hint at the general direction of an ongoing development, and Google’s system did just that.

The fact that Google decided not to update the model for 2012-13, and subsequently the model performed poorly in 2012-13, suggests that the procedure for deciding when an update is necessary may need to be reworked.

Datafication of Posteriors• When a person is seated, the contours of the body, its

posture, and its weight distribution can all be quantified and tabulated.

• Car Seat IDs Driver’s Rear End: Mr. Koshimizu, a mechanical engineering associate professor at the Advanced Institute of Industrial Technology in Tokyo, has developed an ultra-sensitive sheet that sometime down the line could make the contours of a driver’s rear end an integral part of a car’s security system.

FAST (Future Attribute Screening Technology)

Prediction Technology: Risk Prediction• http://www.ubicna.com/en/technology/

• All frauds and misconducts differ, there is similarity in their progression from development to emergence

• Risk prediction can be applied to all kinds of misconduct cases. i.e. Cartel, FCPA (bribe), information leakage, research misconduct, etc.

Application of Prediction Technology:

Browser FingerprintingIn the past, clearing cookies after each session or selecting your browser’s “Do Not Track” setting could prevent third-party tracking. But the advent of browser fingerprinting makes it very difficult to prevent others from monitoring your online activities. The diagram outlines how an online advertising network can track the sites you visit using fingerprinting.

Browser Fingerprinting

• Collecting identifying information about unique characteristics of the individual computers people use. Under the assumption that each user operates his or her own hardware, identifying a device is tantamount to identifying the person behind it.

• Unique characteristics including user’s screen size, time zone, browser plug-ins, and set of installed system fonts.

• Users continue to be fingerprinted even if they have checked “Do Not Track” in their browser’s preferences.http://spectrum.ieee.org/computing/software/browser-fingerprinting-and-the-onlinetracking-arms-race

The future belongs to the companies and people that

turn data into products/services

The Historical S, T & A Co-evolution Process Perspective: Age of Data Science

Courtesy of Byeongwon Park 2007

NBIC: Nanotechnology, Biotechnology, Information Technology, Cognitive Science

More stories here!

Big Data

• A data set(s) with characteristics (e.g. volume, velocity, variety, variability, veracity, etc.) that for a particular problem domain at a given point in time cannot be efficiently processed using current/existing/established/traditional technologies and techniques in order to extract valu

The value drivers of big data for enterprise

Big Data Market Forecast

Roles in Big Data Ecosystem• Data Provider: introduces new data or information feeds into the ecosystem — Big

Data Application Provider: executes a life cycle (collection, processing, dissemination) controlled by the system orchestrator to implement specific vertical applications requirements and meet security and privacy requirements

• Big Data Framework Provider: establishes a computing fabric (computation and storage resources, platforms, and processing frameworks) in which to execute certain transformation applications while protecting the privacy and integrity of data

• Data Consumer: includes end users or other systems who utilize the results of the Big Data Application Provider

• System Orchestrator: defines and integrates the required data application activities into an operational vertical system

• Security and Privacy: the role of managing and auditing access to and control of the system and the underlying data including management and tracking of data provenance

• Management: the overarching control of the execution of a system, the deployment of the system, and its operational maintenance

Data Engineering

Data Science Process

http://www.youtube.com/watch?v=xbecGJlODPg

The data scientist is involved in every part of this process

Big Data Paradigm • Consists of the distribution of data systems across horizontally coupled

independent resources to achieve the scalability needed for the efficient processing of extensive data sets.

• With the new Big Data Paradigm, analytical functions can be executed against the entire data set or even in real-time on a continuous stream of data. Analysis may even integrate multiple data sources from different organizations. For example, consider the question “What is the correlation between insect borne diseases, temperature, precipitation, and changes in foliage”. To answer this question an analysis would need to integrate data about incidence and location of diseases, weather data, and aerial photography.

• The Big Data paradigm has other implications from these technical innovations. The changes are not only in the logical data storage, but in the parallel distribution of data and code in the physical file system and direct queries against this storage.

Ref. http://www.iso.org/iso/big_data_report-jtc1.pdf

Big Data Engineering

• Which is the storage and data manipulation technologies that leverage a collection of horizontally coupled resources to achieve a nearly linear scalability in performance.

• New engineering techniques in the data layer have been driven by the growing prominence of data types that cannot be handled efficiently in a traditional relational model. The need for scalable access in structured and unstructured data has led to software built on name-value/key-value pairs or columnar (big table), documentoriented, and graph (including triple-store) paradigms.

Data Lifecycle

• The shift in thinking causes changes in the traditional data lifecycle. One description of the end-to-end data lifecycle categorizes the steps as collection, preparation, analysis and action. Different big data use cases can be characterized in terms of the data set characteristics at-rest or in-motion, and in terms of the time window for the end-to-end data lifecycle. Data set characteristics change the data lifecycle processes in different ways, for example in the point of a lifecycle at which the data are placed in persistent storage. In a traditional relational model, the data are stored after preparation (for example after the extract-transform-load and cleansing processes). In a high velocity use case, the data are prepared and analysed for alerting, and only then is the data (or aggregates of the data) given a persistent storage. In a volume use case the data are often stored in the raw state in which it was produced, prior to the application of the preparation processes to cleanse and organize the data. The consequence of persistence of data in its raw state is that a schema or model for the data are only applied when the data are retrieved, known as schema on read.

Data Engineering …more

• A third consequence of big data engineering is often referred to as “moving the processing to the data, not the data to the processing”. The implication is that the data are too extensive to be queried and transmitted into another resource for analysis, so the analysis program is instead distributed to the data-holding resources; with only the results being aggregated on a different resource. Since I/O bandwidth is frequently the limited resource in moving data, another approach would be to embed query/filter programs within the physical storage medium.

Machine Learning and Feature Engineering

• http://www.slideshare.net/dato-inc/overview-of-machine-learning-and-feature-engineering

Case Studies…

• TBD

Working with data at scaleMaking data tell its story

The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be……..

Assignment for next week

• Case study and discussion: “Data Science and Machine Learning” search and prepare 15min. presentation and 10min. Q&A

top related