data and society lecture 6: big data - computer sciencebermaf/data course 2015/data and society...

40
Fran Berman, Data and Society, CSCI 4967/6963 Data and Society Lecture 6: Big Data 3/20/15

Upload: vuongtuyen

Post on 25-Aug-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Data and Society Lecture 6: Big Data

3/20/15

Page 2: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Announcements

• Section 2 Paper /Mini-proposal due April 3 (Details in syllabus on website)

– If you haven’t started yet, start now …

– Please send Fran your .pdf by the beginning of class on 4/3

• If you’re interested in your grade so far, come talk to Fran. (Office hours: 1-2 or by appt.)

• Ways to improve your grade:

– Do an extra credit op-ed (5 points, one per customer)

– Do a “do-over” Data Roundtable (time permitting, best 3 out of 4)

Page 3: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Today (3/20/15)

• Lecture 6: Big Data

• L5 Data Roundtable (Karl, Sumit, Yusri, Miguel, Oskari)

3

Page 4: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

You are here

Section Theme Date First “half” Second “half”

Section 1: The Data Ecosystem -- Fundamentals

January 30 Class introduction; Digital data in the 21st Century (L1)

Data Roundtable / Fran

February 6 Data Stewardship and Preservation (L2)

L1 Data Roundtable / 5 students

February 13 Data and Computing (L3) L2 Data Roundtable / 6 students

February 20 Colin Bodel, Time Inc. CTO Guest Lecture and Q&A

L3 Data Roundtable / 5 students

Section 2: Data and Innovation – How has data transformed science and society?

February 27 Section 1 Exam Data and the Health Sciences (L4)

March 6 Paper preparation / no class

March 13 Data and Entertainment (L5) L4 Data Roundtable / 6 students

March 20 Big Data Applications (L6) L5 Data Roundtable / 5 students

Section 3: Data and Community – Social infrastructure for a data-driven world

April 3 Data in the Global Landscape (L7) Section 2 paper due

L6 Data Roundtable / 6 students

April 10 Bulent Yener Guest Lecture, Data Privacy / Bad guys on the Internet (L8)

L7 Data Roundtable / 5 students

April 17 Data and the Workforce (L9) L8 Data Roundtable / 6 students

April 24 Mike Schroepfer, Facebook CTO Guest Lecture and Q&A

May 1 Data Futures (L10) L9 Data Roundtable / 5 students

May 8 Section 3 Exam L10 Data Roundtable / 5 students

You are here

Page 5: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Lecture 6: Big Data

Page 6: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Outline

• Big data defined

• Big data from the industry perspective

• Big data from the government perspective

• Big data examples: – Predictive Analytics in Retail

– Public Health

• Big data challenges from the academic perspective

• Data Roundtable

Page 7: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

What is big data?

• Wikipedia: “Broad term for data sets so large or complex that traditional data processing applications are inadequate.”

• McKinsey: “Datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze”

• O’Reilly Radar: “Data that exceeds the processing capacity of conventional database systems. The data that is too big, moves too fast, or doesn’t fit the structures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”

Page 8: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

What does big data tell us?

• Big data is often noisy, dynamic, heterogeneous. Inter-

related and untrustworthy. Why do we find it useful?

– General statistics obtained from frequent patterns and

correlation analysis can disclose more reliable hidden patterns

and knowledge

– Interconnected big data forms large heterogeneous

information networks, with which information redundancy can

be explored to compensate for missing data, cross check

conflicting cases, validate trustworthy relationships, disclose

inherent clusters, and uncover hidden relationships and

models.

Page 9: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

About Big Data [Strata]

• Value of big data: analytical use, enabling new products

• Ways that big data impacts infrastructure

– Volume: big data calls for scalable storage and a distributed approach to querying

– Velocity: big data infrastructure must adapt to the speed of the input and the need for quick analysis and turnaround. Need for stream processing technologies

– Variety: Source data often “messy”, non-homogeneous, unstructured. Infrastructure must organize and find meaning from it.

Page 10: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

McKinsey’s take on Big Data

Page 11: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

How is big data useful?

• (Big) data is being used by virtually every industry and is being used to boost/improve production

• Big data contributing to new ways of creating value:

– Creating transparency

– Enabling experimentation to discover needs, expose variability and improve performance

– Segmenting populations to customize actions

– Replacing / supporting human decision making with automated algorithms

– Supporting new business models, products, services

• Big data becoming a competitive advantage and means of industry growth

• Big data enabling substantial growth in productivity and customer satisfaction.

• Big data enabling new insights and discoveries

Page 12: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Page 13: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Capitalizing on big data – easy or hard?

Road blocks to capturing the value of big data

• Need for data policy – privacy, security, intellectual property and liability (ownership, rights, fair use, etc.)

• Need for new and evolving systems, technologies and techniques for managing and leveraging big data

• Need for new practice, policy and infrastructure to gain/provide access to data

• Need to evolve / change industry structure and culture

Page 14: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Government’s take on Big Data

• White House review on Big Data focused on broader implications of big data technologies – great opportunities and potential harms

• Key areas to monitor for opportunities and challenges:

– Preserving privacy values – protection of personal information in the marketplace

– Educating robustly and responsibly – opportunity to enhance learning opportunities, digital literacy and skills while protecting personal data usage

– Big data and discrimination – preventing new modes of discrimination that some uses of big data may enable

– Law enforcement and security – ensuring responsible use in law enforcement, public safety and national security

– Data as a public resource – using data to improve the delivery of public services and investing in research and technology that will further the power of big data

Page 15: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

White House Big Data report policy recommendations

Page 16: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Not a magic bullet: big data has limits

• (From “Eight (No, Nine!) Problems with Big Data”, NY Times). Limitations of big data:

– “… although big data is very good at detecting correlations, …, it never tells us which correlations are meaningful”

– “ … big data can work well as an adjunct to scientific inquiry but rarely succeeds as a wholesale replacement.”

– “ … many tools that are based on big data can be easily gamed.”

– “ … even when the results of a big data analysis aren’t intentionally gamed, they often turn out to be less robust than they initially seem.”

– “ … whenever the source of information for a big data analysis is itself a product of big data, opportunities for vicious cycles abound [echo chamber effect].”

– “ … risk of too many correlations.”

– “ … big data is prone to giving scientific-sounding solutions to hopelessly imprecise questions.”

– “ …big data is at its best when analyzing things that are extremely common, but often falls short when analyzing things that are less common.”

– “ … the hype.”

Page 17: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Big Data in Action

Page 18: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Big data in Retail: Predictive Analytics

• Retailers highly interested in the buying habits of their customers: what you like, what you need, which coupons will help draw you to their store, etc.

• Retailers also use highly sophisticated models of human behavior: buying behavior, formation of habits, etc. to help determine how to best draw customers

• Many hiring statisticians, mathematicians, data scientists to improve the bottom line through strategic marketing, including Target

Page 19: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Predictive Analytics at Target

• Target develops profile of customer information for each customer

– Information indexed by a unique guest ID number: credit card information, name, email address, purchases, demographic information as available, etc.

– Information is collected by Target or bought from other sources (information available includes ethnicity, job history, magazines you read, if you’ve declared bankruptcy or gotten divorce, what kinds of topics you talk about online, etc.)

• Retailers know that at major life events, old routines fall apart and usual brand loyalties and buying habits are in flux: graduating from college, birth of a child, moving to a new town, etc.

• Target wanted to focus on life event of having a child

– New parents will develop new buying routines for diapers, toys, lotion, baby food, clothes, etc.

– If Target can change the buying habits of new parents before the birth of the baby, they are pre-competitive and can win big

Page 20: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Marketing to Pregnant Women

• Target statistician Andrew Pole analyzed data from customers who had signed up in Target’s baby registry

• Analyses identified ~25 products that, when analyzed together, contributed to a “pregnancy prediction” score. Score also estimated due date.

• Target used pregnancy prediction score and estimated due date to identify which target customers to send baby product coupons to and when

• Anecdote:

Page 21: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Minimizing the “creepiness factor”

• Target realized that focused coupons were less creepy when mixed with others.

– Also potentially in gray area with respect to health information and privacy

• Company began to mix baby products with other things (e.g. lawn mowers, wineglasses, etc.)

– Customers found this less creepy and used the baby coupons

Page 22: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Personalized marketing

• Soon after the new ad campaign, Target’s “Mom and Baby” sales greatly increased and grew over time ($44B in 2002 to $67B in 2010)

• Similar data mining approach being used in many, many stores and businesses: department stores, Facebook, Google, etc.

• Key issues about privacy remain and your rights within the burgeoning market for data about you are yet to be sorted out.

Page 23: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Big Data in Health: Should you vaccinate?

Page 24: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Project Tycho 1

• Project goal is to advance the use of public health data for the improvement of public health

• Focus areas:

– Acquisition of new data

• Developing partnerships with scientists, funding and public health agencies to add or connect new historical and current datasets to the system.

• Developing partnership agreements, data use agreements and engaging with contributors to minimize privacy and ownership concerns.

– Development of data infrastructure

• Developing data processing and warehouse infrastructure as well as new algorithms to digitize, standardize, integrate, and store public health data.

• Combining automated processes and manual verification.

Page 25: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Project Tycho 2

• Project goal is to advance the use of public health data for the improvement of public health

• Focus areas:

– Data analytics • Focusing on spatial and temporal statistics and data mining methods for hypothesis

generating research and classical epidemiological and statistical methods.

• Creating data visualizations to reveal population level patterns of disease spread that help demonstrate disease causality.

– Advocacy • Engaging in advocacy for better data availability and better tools for data use in

public health training and education

• Target audiences: education, policy makers, general public

Page 26: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Big data, vaccinations, and disease: Project Tycho

https://www.youtube.com/watch?v=Kn9OJy1BPDo

Page 27: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

How do they get results?

• NEJM: Vaccination programs have prevented more than 100 million cases of serious contagious disease in the U.S. since 1924*

– Focus of paper: polio, measeles, rubella, mumps, hepatitus A, diphtheria, pertussis (whooping cough)

• Methodology:

– Ingest and data preparation: 88M reports of individual cases of disease from different sources.

• Most of data entry done by Digital Divide Data, social enterprise that provides jobs and technology training to young people in Cambodia, Laos, and Kenya.

• Information put into spreadsheets for making tables, then sorted and standardized.

– Open Access: Data for 56 diseases available on the Project Tycho website. Searchable by disease, year, location.

* “Contagious Diseases in the United States from 1888 to the present“, New England Journal of

Medicine, http://www.febrilnotropeni.net/newsfiles/3777NEJMms1215400.pdf

Page 28: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Project Tycho Data – counts and standardization • Database represents 126 years of disease reporting.

• Data organized as counts (number of cases or deaths due to a disease in a specific location and time period).

• Available data categorized at 3 levels based on the type of counts

– Level 1 is recent data from NEJM

– Level 2 only includes counts reported in a standard format

– Level 3 includes all different types of counts

• Level 3 is the largest number but requires the most standardization and judgment to be useful.

Page 29: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

More Project Tycho Particulars

• Sources of current data

– Public Health Reports from PubMed Central

– Morbidity and Mortality Weekly Report from the Hathi Trust Digital Library

– U.S. Centers for Disease Control MMWR Past Volumes

– U.S.Centers for Disease Control Stacks

• Project funding: NIH, Bill and Melinda Gates Foundation

• Why “Project Tycho”? Tycho Brahe was Danish nobleman whose careful, detailed astronomical observations were the basis for Johannes Kepler’s work on the laws of planetary motion

Page 30: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Big Data from an Academic perspective: Research challenges in optimizing the potential of the data analysis pipeline [big data whitepaper]

Page 31: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Challenges in the pipeline - 1 • Data acquisition and recording challenges:

– Need to define data filters so that useful data is not discarded

– Need on-line analysis techniques that can process streaming data on the fly

– Need to automatically generate the right metadata to describe what data is recorded and how it is being measured

– Need to incorporate and update data provenance throughout the data analysis pipeline

• Information extraction and cleaning challenges:

– Processes and technologies must be developed to extract and appropriately structure data (e.g. health data from transcribed notes, images, sensors, etc.)

– Data must be cleaned to promote its validity. Need well-understood error models to account for presence of inaccurate data.

Page 32: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Challenges in the pipeline -- 2

• Data integration, aggregation and representation challenges:

– Data integration needed to make sure that data structure and semantics can be expressed in forms that are computer understandable, and then robotically resolvable.

– Need to develop technologies that enable databases of different designs to interoperate, so that information can be used in concert

• Query processing, data modeling and analysis challenges:

– Data mining needs integrated, cleaned, trustworthy and efficiently accessible data, declarative query and mining interfaces, scalable mining algorithms, adequate computing environments.

– Need to scale complex query processing techniques to TBs while enabling interactive response times to support interactive analysis

– Need to coordinate between database systems (which host the data and provide SQL querying) and analytics packages (that perform various forms of non-SQL processing such as data mining and statistical analyses)

Page 33: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Challenges in the pipeline -- 3

• Interpretation challenges:

– All interpretations involve underlying assumptions and technologies and analysis process prone to error. Interpretation of the meaning of the results must be based on adequate supplementary information that describes assumptions, models, error handling, etc.

– Data visualizations that can convey results in a useful way, especially when users can drill down to understand areas or data points of interest.

• Heterogeneity challenges:

– Efficient representation, access and analysis of semi-structured data. Design that provides flexibility (e.g. medical records for each hospital stay, test, patient require different system approaches)

– Incompleteness and errors in data – what are the right probabilistic approaches to minimize?

Page 34: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Challenges in the pipeline -- 4

• Scale challenges:

– Cloud computing aggregates multiple disparate workloads with varying performance goals into very large clusters. Scheduling algorithms need to adapt to optimization per program and per platform.

– Newer storage technologies do not favor sequential I/O performance over random I/O performance as much. Provides an opportunity for rethinking how storage subsystems fro data processing systems designed and queried.

• Timeliness challenges:

– Need for timeliness of analysis. Some applications (e.g. detection of credit card fraud) ideally should be done in real time but can rely on staged partial results

– Flexible index structures needed to permit finding qualifying elements for common searches quickly with growing data volumes and tight time limits

Page 35: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Challenges in the pipeline -- 5

• Privacy challenges:

– Technical and social infrastructure to protect privacy

– Approaches to allow sharing of private data while limiting disclosure and ensuring sufficient utility of the shared data

– Differential privacy

– Lack of clarity in rights, ownership, permission to disseminate or ability to share private data

• Human collaboration challenges:

– Many patterns easily detectable by humans but hard for computers. Many big data analyses will need a human in the loop.

– Big data analysis system must support input from multiple human experts and shared exploration of results. System has to integrate and coordinate multiple, diverse inputs.

– Crowd-sourcing approaches that facilitate correction of errors. System must also be able to derive summary meaning from disparate analyses (e.g. given restaurant reviews, do you go or not?) Issues of uncertainty and error become more pronounced in crowd-sourced environment.

Page 36: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Lecture 6 Sources

• “Challenges and Opportunities with Big Data” a community white paper developed by leading researchers across the U.S., http://www.cra.org/ccc/files/docs/init/bigdatawhitepaper.pdf

• “Jenny McCarthy’s Dangerous Views”, the New Yorker, http://www.newyorker.com/tech/elements/jenny-mccarthys-dangerous-views

• “Big data: The next frontier for innovation, competition and productivity”, Report from the McKinsey Global Institute, http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation

• “What is big data?” O’Reilly Radar, http://radar.oreilly.com/2012/01/what-is-big-data.html • “Contagious Diseases in the United States from 1888 to the present“, New England Journal of Medicine,

http://www.febrilnotropeni.net/newsfiles/3777NEJMms1215400.pdf • Project Tycho website, http://www.tycho.pitt.edu/ • “The Vaccination Effect: 100 Million Cases of Contagious Disease Prevented”, The New York Times,

http://bits.blogs.nytimes.com/2013/11/27/the-vaccination-effect-100-million-cases-of-contagious-disease-prevented/

• “Eight (No, Nine!) Problems with Big Data”, NY Times, http://www.nytimes.com/2014/04/07/opinion/eight-no-nine-problems-with-big-data.html

• “Big Data: Seizing Opportunities, Preserving Values”, White House Report, http://www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1_2014.pdf

• “How Target figured out a teen girl was pregnant before her father did,” Forbes, http://www.paulding.k12.ga.us/userfiles/1795/Classes/17159/How%20Target%20Figured%20Out%20A%20Teen%20Girl%20Was%20Pregnant%20Before%20Her%20Father%20Did%20-%20Forb.pdf

• “How Companies Learn your secrets”, The New York Times, http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html?pagewanted=all&_r=0

Page 37: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Data Roundtable

Page 38: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

April 10: L7 Data Roundtable

• “Facebook’s privacy policy breaches European law, report finds”, The Guardian, http://www.theguardian.com/technology/2015/feb/23/facebooks-privacy-policy-breaches-european-law-report-finds (Juan Poma)

• “Australia Tops OECD’s Better Life Index”, Wall Street Journal, http://www.wsj.com/articles/SB10001424052702303610504577419320948930402 (Miguel Inoa-Lantigua)

• “Africa: Data Gaps Make Malnutrition Too Easy to Ignore“, SciDev.net http://allafrica.com/stories/201503171418.html (Alex Karcher)

• “In China, an Open Data Movement is Starting to Take Off,” TechPresident, http://techpresident.com/news/wegov/24940/China-Open-Data-Movement-Starting-Take-Off (Kate McGuire)

Page 39: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

April 3: Big Data (L6) Data Roundtable

• “Big data: Welcome to the petacentre”, Nature, http://www.nature.com/news/2008/080903/full/455016a.html (Lars Olsson)

• “Police Push the Limits of Big Data Technology”, Datanami, http://www.datanami.com/2014/07/31/police-push-limits-big-data-technology/ (Sumit Munshi)

• “Big Data: Big Obstacles”, Chronicle of Higher Education, http://chronicle.com/article/Big-Data-Big-Obstacles/151421/ (Karl Appel)

• “How pro teams are using data analytics to draft better players,” Financial Post, http://business.financialpost.com/2013/09/03/pro-sports-teams-turning-to-data-anlaytics-to-fill-seats/?__lsa=88c9-3dab (Dennis Fogerty)

• “The big deal about “big data” – your guide to what the heck it actually means”, Ars Technica, http://arstechnica.com/information-technology/2015/02/the-big-deal-about-big-data-your-guide-to-what-the-heck-it-actually-means/ (READ THIS)

Page 40: Data and Society Lecture 6: Big Data - Computer Sciencebermaf/Data Course 2015/Data and Society Lecture 6... · Fran Berman, Data and Society, CSCI 4967/6963 Announcements • Section

Fran Berman, Data and Society, CSCI 4967/6963

Today: L5 (Data and Entertainment) Data Roundtable

• “Management Secrets of the Grateful Dead”, The Atlantic http://www.theatlantic.com/magazine/archive/2010/03/management-secrets-of-the-grateful-dead/307918/ (Karl Appel)

• “The Shazam Effect”, The Atlantic, http://www.theatlantic.com/magazine/archive/2014/12/the-shazam-effect/382237/?single_page=true (Sumit Munshi)

• “At Disney Parks, a Bracelet Meant to Build Loyalty (and Sales)”, The New York Times, http://www.nytimes.com/2013/01/07/business/media/at-disney-parks-a-bracelet-meant-to-build-loyalty-and-sales.html?pagewanted=all (Yusri Jamaluddin)

• “Dancing Data”, re/code, http://recode.net/2014/01/28/dancing-data/ (Miguel Inoa-Lantigua)

• “Here’s How Piracy Hurts Indie Film”, Indiewire, http://www.indiewire.com/article/guest-post-heres-how-piracy-hurts-indie-film-20140711 (Oskari Rautiainen)