data and society lecture 9: data and ethics

Fran Berman, Data and Society, CSCI 4370/6370

Data and SocietyLecture 9: Data and Ethics

4/13/18


Announcements 4/13

• Office hours today 1:30-2:00

• Wednesday class on 4/18 starts at 8:30

• Make sure you sign up and do 2 presentations by the end of the semester.

• Please make sure you attend final classes as needed for your full participation grade.

• Check what you think your grades are (attendance, op-ed, and presentation scores) with Fran during office hours. You are responsible for being sure that these are accurate.


Discussion article for April 20

• “Senators propose legislation to protect the privacy of users’ online data after Facebook hearing” The Verge, https://www.theverge.com/2018/4/12/17231718/facebook-data-privacy-law-klobuchar-kennedy-mark-zuckerberg

https://www.theverge.com/2018/4/12/17231718/facebook-data-privacy-law-klobuchar-kennedy-mark-zuckerberg


Wednesday Section Friday lecture

First Half of Class Second Half of Class Assignments

January 17 : NO class January 19 L!: CLASS INTRO AND LOGISTICS Presentation Model / Op-Ed Instructions

Op-Ed instructions

January 24: NO class January 26 L2: BIG DATA 1 4 Presentations

January 31: NO class February 2 L3: BIG DATA 2 -- IoT 4 Presentations

February 7: NO class February 9 L4: DATA AND SCIENCE 4 Presentations Op-Ed due Feb. 9

February 14: 5 Presentations

February 16 L5: DATA AND HEALTH / LESLIE McINTOSH GUEST SPEAKER

4 Presentations Op-Ed drafts returned Feb. 21


February 23 L6: DATA STEWARDSHIP AND PRESERVATION

4 Presentations Research Paper instructions


March 2 CLASS CANCELED DUE TO SNOW

March 7 : 5 Presentations March 9: NO CLASS / PAPER PREPARATION Op-Ed Final due March 7

March 14: Spring Break March 16 SPRING BREAK

March 21: NO class March 23: NO CLASS / PAPER PREPARATION

March 28: 4 Presentations

March 30 L7: INFRASTRUCTURE 4 Presentations Research Paper due March 28

April 4: NO class April 6 L8: DATA RIGHTS, POLICY, REGULATION 4 Presentations

April 11: 4 Presentations April 13 L9: DATA AND ETHICS 4 Presentations

April 18: 4 Presentations April 20 L10: DATA AND COMMUNICATION 4 Presentations

April 25: NO class April 27 L11: DATA FUTURES 4 Presentations


Lecture 9: Data and ethics


Data and Ethics

• How does Ethics apply to data?

• Ethical dilemmas in today’s world– Ethical use of data by people

• Icelandic Health Data System

• Havasupi Indians and Informed Consent

– Ethical use of data by machines

• Tay chatbot

• Ted talk


Data Ethics

• Multiple areas for development of “data ethics”

– Data collection and handling (e.g. generation, recording, curation, processing, dissemination, sharing)

– Data algorithms (e.g. AI, artificial agents, machine learning, robots)

– Data practice (e.g responsible innovation, programming, hacking, professional codes)

• (Note that these boundaries are somewhat artificial, most issues have aspects of all 3 …)

Data collection and handling

Data practice

Data algorithms


Key issues: Collection and handling

• Protection / use of big data in biomedical research and social sciences

– What should be open and what should be private? Under what circumstances?

– When / how to de-identify; how to ensure that identification of groups doesn’t lead to identification of individuals (wrt ageism, ethnicism, sexism)

• Trust – benefits, opportunities, risks and challenges associated with data

– Role of transparency in fostering trust –when / what / who / how


Key issues: Data algorithms

• How to develop “smart applications” that are unbiased?

• How to guide the behavior of learning applications and autonomous systems?

• How to minimize the risk of unanticipated negative outcomes?

• Whose ethics should applications represent?


Key issues: Data Practice• What is the moral responsibility, accountability,

and liability of designers, programmers, companies, consumers?

• What should the guidelines, checks and balances of responsible innovation be?

• What should we consider ethical development and usage?

• How do we trade-off the ethics of dealing with data about individuals and the ethics of dealing with data about groups?

• Who needs to consent to the use of data, when and how?

• What does it mean for data to be private and under what circumstances?


Asimov’s Laws of Robotics

Over 50 years ago, Isaac Asimov introduced laws of robotics (introduced in the 1942 short story "Runaround" (included in the 1950 collection I, Robot).

0. A robot may not harm humanity, or, by inaction, allow humanity to come to harm.

1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.

2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.

Image: https://en.wikipedia.org/wiki/File:I_Robot_-_Runaround.jpg

https://en.wikipedia.org/wiki/Runaround_(story)

https://en.wikipedia.org/wiki/I,_Robot


Update needed for today’s AI

Information from https://thenextweb.com/artificial-intelligence/2018/02/23/are-asimovs-laws-of-robotics-still-good-enough-in-2018/

“According to report by Cambridge Consultants, titled “AI: Understanding And Harnessing The Potential,” there are five key areas that rules for AI should address:

1. Responsibility: There needs to be a specific person responsible for the effects of an autonomous system’s behaviour. This is not just for legal redress but also for providing feedback, monitoring outcomes and implementing changes.

2. Explainability: It needs to be possible to explain to people impacted (often laypeople) why the behaviour is what it is.

3. Accuracy: Sources of error need to be identified, monitored, evaluated and if appropriate mitigated against or removed.

4. Transparency: It needs to be possible to test, review (publicly or privately), criticiseand challenge the outcomes produced by an autonomous system. The results of audits and evaluation should be available publicly and explained.

5. Fairness: The way in which data is used should be reasonable and respect privacy. This will help remove biases and prevent other problematic behaviour becoming embedded. “

https://thenextweb.com/artificial-intelligence/2018/02/23/are-asimovs-laws-of-robotics-still-good-enough-in-2018/


Google guidelines for machines that team (from Google Research Blog, https://research.googleblog.com/2016/06/bringing-precision-to-ai-safety.html , 6/21/16)

“We’ve outlined five problems we think will be very important as we apply AI in more general circumstances. These are all forward thinking, long-term research questions -- minor issues today, but important to address for future systems:

• Avoiding Negative Side Effects: How can we ensure that an AI system will not disturb its environment in negative ways while pursuing its goals, e.g. a cleaning robot knocking over a vase because it can clean faster by doing so?

• Avoiding Reward Hacking: How can we avoid gaming of the reward function? For example, we don’t want this cleaning robot simply covering over messes with materials it can’t see through.

• Scalable Oversight: How can we efficiently ensure that a given AI system respects aspects of the objective that are too expensive to be frequently evaluated during training? For example, if an AI system gets human feedback as it performs a task, it needs to use that feedback efficiently because asking too often would be annoying.

• Safe Exploration: How do we ensure that an AI system doesn’t make exploratory moves with very negative repercussions? For example, maybe a cleaning robot should experiment with mopping strategies, but clearly it shouldn’t try putting a wet mop in an electrical outlet.

• Robustness to Distributional Shift: How do we ensure that an AI system recognizes, and behaves robustly, when it’s in an environment very different from its training environment? For example, heuristics learned for a factory workfloor may not be safe enough for an office.”

https://research.googleblog.com/2016/06/bringing-precision-to-ai-safety.html


Ethics in Data Collection and Handling: the Icelandic Health Sector Database


Icelanders• Iceland has a population of ~349,000 and is the most

sparsely populated country in Europe.

• Iceland provides universal health care to its citizens and spends a fair amount on health care, ranking 11th

in health care expenditures as a percentage of GDP and 14th in spending per capita.

– Health care system is ranked 15th in performance by the World Health Organization.

• Ethnically homogeneous. Most Icelanders descendants of Germanic and Gaelic (Celtic) settlers.– 91% Icelandic– 4% Polish– 5% Other

• Iceland has extensive genealogical records dating back to the late 17th century and fragmentary records extending back to the 9th century.

Source: Wikipedia articles on Iceland, Icelanders


Whole Country Health Data

• 1996: deCODE Genetics corporation (private) founded to identify human genes associated with common diseases using populations studies and apply the knowledge gained to guide the development of candidate drug treatments.

• Company lobbied government on the 1998 Health Sector Database Act with the intention of creating a national biological database (the Icelandic Health Sector Database [HSD]) to store health information which could be used for research.

• deCODE won the bidding process to build the database.

• Act allowed company to use data for profit but required protection of privacy.


HSD Act• Act authorized transfer of all medical record data to the licensed

company (deCODE) for commercial development without the express consent of individuals, “presuming consent” (assume consent unless opt-out).– Information on deceased individuals would be automatically included.

– Icelanders would have 6 months from the construction of the database to opt out unconditionally.

• Act specified encryption architecture for health information and fees for non-commercial access to the database.

• Act provided virtual control, use, and ownership of (government-supported) health data to government.


deCODE Genetics

• Focus of deCODE was to use the HSD to identify human genes associated with common diseases using population studies, and apply the knowledge gained to guide the development of candidate drugs.

• deCODE made strong economic, political, cultural and health arguments to make their case.

• HSD never built although data was collected from roughly 90,000 volunteers.


HSD effectively killed by Icelandic Supreme Court in 2003

• Gudmundsdóttir vs. Iceland case concerned woman who asked that her deceased father’s information NOT be transferred to the HSD. Request was denied based on “presumed consent” and case went to court.

• Woman wanted to prevent her father’s genetic data from being in the database as it would have made it more possible to infer hereditary characteristics that would have applied to herself

– Because Icelandic population is so homogeneous, it is possible to “impute” the DNA makeup of other citizens, including those who never participated in the studies.

• “Vague limits” set by the HSD Act inadequately provided for the protection of the woman’s constitutional right to privacy, rendering the HSD effort dead.


Many problems with the HSD• deCODE failed to reach deals with the key partners needed to build the HSD,

including the Icelandic Data Protection Commission

• Issues around ethics of individual health privacy vs. open access to scientific data

• Issues around a for-profit commercial company stewarding the data and making money off it

• Issues around insufficient specification for infrastructure to support privacy

• Issues around lack of informed consent

– By June 2001, around 20,000 (7%) Icelanders had opted out of the HSD.

• deCODE still retained volunteer data and went through a series of corporate iterations.


Scientific progress: Results from the data

• Although the HSD was never built, deCODE pursued traditional genome-wide association studies to try to identify genetic changes contributing to common diseases

• deCODE data used for discoveries about genes that increase risk for kidney disease, cancer, lupus, vascular disease, schizophrenia, osteoporosis, etc.

– One result identified a gene that protects against Alzheimer’s

– DeCODE identified mutations in BRCA2 that convey sharply increased risk of breast and ovarian cancers.

Information from http://www.els.net/WileyCDA/ElsArticle/refId-a0005180.html


Transition

deCODE founded in 1996; filed for

bankruptcy in 2009

Saga Investments LLC purchased

deCODE services and assets in 2010

Amgen purchased deCODE in 2012,

spun off NextCODEHealth in 2013

NextCODE acquired by WuXi

PharmaTech in 2015

• DeCODE a commercial failure. Company went bankrupt in 2009.– Continued as private company (NextCODE) and was bought

by Amgen in 2012. No compensation given to Icelanders.

• Services and assets of deCODE went through many transitions:


Ethics of informed consent in the U.S.: Havasupai Indians and Arizona State University

• Havasupai Indians gave informed consent for researchers from Arizona State University to take DNA samples to ascertain genetic clues to the tribes’ high rate of Type 2 diabetes.

– Broad consent gave permission to “study the causes of behavioral / medical disorders”; tribe understood it to be focused on diabetes.

• Collected blood samples used to study other things including mental illness and theories of the tribe’s geographical origins that contradict original stories.

• Geneticist claimed that she had broader permission, but ASU agreed to pay $700,000 to the tribe’s 41 members and return the samples.

• Scientific benefit vs. individual control: Can data / samples donated for one purpose be legitimately used for another?


Ethics in Data Algorithms


How do algorithms become unethical?

• By default, algorithms don’t understand the context in which they act nor the ethical consequences of their decisions.

• Predictions of machine learning algorithm come from generalizing training sets of example data.

• Ethics can be provided through an explicit mathematical formulae, pruning of outputs, human monitoring, etc.

Based on information from http://theconversation.com/ethics-by-numbers-how-to-build-machine-learning-that-cares-85399

http://theconversation.com/ethics-by-numbers-how-to-build-machine-learning-that-cares-85399


How algorithms can produce unethical decisions• Algorithm may try to minimize mistakes averaged over all the

training data. This may have different “inaccuracies” for different people / groups, e.g. minority groups.

• Algorithm may provide the “best guess” but may have varying levels of confidence in different options.

• Algorithm may learn to predict from historical data that reflects particular biases. New behaviors may also have these biases.

• Ethical considerations may conflict. For example, algorithmic approaches that increase ethical considerations for one group may decrease ethical considerations for another.

Based on information from http://theconversation.com/ethics-by-numbers-how-to-build-machine-learning-that-cares-85399



Microsoft’s Tay

• Tay (thinking about you) – artificial intelligence chatbotoriginally released by Microsoft via Twitter on 3/23/16.

• Tay was developed to mimic language patterns of a 19 year old American girl and learn by interacting tiwhthuman users over Twitter

• Tay “learned” to post inflammatory and offensive tweets through its twitter account.

• Tay retweeted more than 96,000 times; service shut down 16 hours after its launch.


Tay chatbot• Tay uses a combination of AI and editorial

written by a team of staff including improvisational comedians

– Relevant, publicly available data that has been anonymised and filtered is its primary source.

• Tay in most cases was only repeating other users’ inflammatory statements, but it also learned from those interactions.

• Need to set up language filters and an environment for both what to say and what not to say …

• “If you’re not asking yourself ‘how could this be used to hurt someone’ in your design/engineering process, you’ve failed.” Zoe Quinn

Tweets from https://gizmodo.com/here-are-the-microsoft-twitter-bot-s-craziest-racist-ra-1766820160, https://www.pcworld.com/article/3048157/data-center-cloud/the-internet-turns-tay-microsofts-millennial-ai-chatbot-into-a-racist-bigot.html

https://gizmodo.com/here-are-the-microsoft-twitter-bot-s-craziest-racist-ra-1766820160

https://www.pcworld.com/article/3048157/data-center-cloud/the-internet-turns-tay-microsofts-millennial-ai-chatbot-into-a-racist-bigot.html


Ethical responsibility• What is a company’s ethical responsibility?

– How do you design / engineer / monitor to maximize positive outcomes and minimize negative outcomes?

– Which ethics? -- how do you decide who it’s OK to offend?– What is your liability in the case of harm?– When do you pull the plug?

• What is society’s ethical responsibility?– Legal / policy framework so that business / technology protects /

promotes the public good– Development of a governance environment to enforce laws and policy

and ensure accountability– Development of a framework for trading off the ethics of promoting the

individual over promoting society


https://ed.ted.com/on/h01kSlpF

Data, Algorithms and Practice: Ethics of Algorithms (9 min)

https://ed.ted.com/on/h01kSlpF


Lecture 9 Sources• “The Internet turns Tay, Microsoft’s millennial AI chatbot, into a racist bigot”, PCWorld,


• “Tay AI Chatbot gets a crash course in racism from Twitter,”, The Guardian, https://www.theguardian.com/technology/2016/mar/24/tay-microsofts-ai-chatbot-gets-a-crash-course-in-racism-from-twitter

• Tay, Icelandic Health Database Act, Iceland, Wikipedia

• “How to Build Ethical Algorithms”, The Conversation, http://theconversation.com/ethics-by-numbers-how-to-build-machine-learning-that-cares-85399

• “What is Data Ethics?”, http://rsta.royalsocietypublishing.org/content/374/2083/20160360

• Havasupai and informed consent, http://www.nytimes.com/2010/04/22/us/22dna.html?pagewanted=all

• “Genome and Nation: Iceland’s Helath Sector Database and its Legacy”, Innovations, https://www.researchgate.net/publication/24089987_Genome_and_Nation_Iceland's_Health_Sector_Database_and_its_Legacy


https://www.theguardian.com/technology/2016/mar/24/tay-microsofts-ai-chatbot-gets-a-crash-course-in-racism-from-twitter


http://rsta.royalsocietypublishing.org/content/374/2083/20160360

http://www.nytimes.com/2010/04/22/us/22dna.html?pagewanted=all

https://www.researchgate.net/publication/24089987_Genome_and_Nation_Iceland's_Health_Sector_Database_and_its_Legacy


Presentations


Discussion article for Today

• “Will Democracy Survive Big Data and Artificial Intelligence?” Scientific American, https://www.scientificamerican.com/article/will-democracy-survive-big-data-and-artificial-intelligence/

https://www.scientificamerican.com/article/will-democracy-survive-big-data-and-artificial-intelligence/


Presentation articles for April 18

• “Self-driving Uber car kills pedestrian in Arizona, where robots roam”, NY Times, https://www.nytimes.com/2018/03/19/technology/uber-driverless-fatality.html[Tim W]

• “Should your driverless car hit a pedestrian to save your life?”, NY Times, https://www.nytimes.com/2016/06/24/technology/should-your-driverless-car-hit-a-pedestrian-to-save-your-life.html?action=click&contentCollection=Business%20Day&module=RelatedCoverage&region=EndOfArticle&pgtype=article [Sam S-F]

• “Think Facebook can manipulate you? Look out for Virtual Reality,” The Conversation, https://theconversation.com/think-facebook-can-manipulate-you-look-out-for-virtual-reality-93118 [Zimo X]

• “New initiative examines the ethics of research using pervasive data”, The National Law Review, https://www.natlawreview.com/article/new-initiative-examines-ethics-research-using-pervasive-data [Madison W]

https://www.nytimes.com/2018/03/19/technology/uber-driverless-fatality.html

https://www.nytimes.com/2016/06/24/technology/should-your-driverless-car-hit-a-pedestrian-to-save-your-life.html?action=click&contentCollection=Business%20Day&module=RelatedCoverage&region=EndOfArticle&pgtype=article

https://theconversation.com/think-facebook-can-manipulate-you-look-out-for-virtual-reality-93118

https://www.natlawreview.com/article/new-initiative-examines-ethics-research-using-pervasive-data


Presentation articles for April 20• “The trouble with quitting Facebook is that we like Facebook,” NY Times,

https://fivethirtyeight.com/features/the-trouble-with-leaving-facebook-is-that-we-like-facebook/ [Tae P]

• “How does Internet use affect well-being?”, Phys. Org, https://phys.org/news/2018-02-internet-affect-well-being.html [Chandler M]

• “With this DNA dating app, you swab, then swipe for love”, Wired, https://www.wired.com/story/with-this-dna-dating-app-you-swab-then-swipe-for-love/ [John L]

• “A combination of personality traits might make you more addicted to social networks,” ScienceDaily, https://www.sciencedaily.com/releases/2018/03/180312084911.htm [Jie C]

https://fivethirtyeight.com/features/the-trouble-with-leaving-facebook-is-that-we-like-facebook/

https://phys.org/news/2018-02-internet-affect-well-being.html

https://www.wired.com/story/with-this-dna-dating-app-you-swab-then-swipe-for-love/

https://www.sciencedaily.com/releases/2018/03/180312084911.htm


Presentation articles for April 27 – LAST OPPORTUNITY FOR PRESENTATIONS• “How close are we to a black mirror-style digital afterlife?“, The Guardian,

https://www.theguardian.com/tv-and-radio/2018/jan/09/how-close-are-we-black-mirror-style-digital-afterlife [Richard L]

• “The Internet of Things could drown our environment in gadgets”, Wired, https://www.wired.com/2014/06/green-iot/ [Matthew M]

• “To invade homes, tech is trying to get in your kitchen”, NY Times, https://www.nytimes.com/2018/03/25/technology/smart-homes-tech-kitchen.html?rref=collection%2Fsectioncollection%2Ftechnology&action=click&contentCollection=technology&region=rank&module=package&version=highlights&contentPlacement=2&pgtype=sectionfront [Nathalie P]

• “Network security in the age of the Internet of Things”, ComputerWeekly, http://www.computerweekly.com/feature/Network-security-in-the-age-of-the-internet-of-things [Justin T]

https://www.theguardian.com/tv-and-radio/2018/jan/09/how-close-are-we-black-mirror-style-digital-afterlife

https://www.wired.com/2014/06/green-iot/

https://www.nytimes.com/2018/03/25/technology/smart-homes-tech-kitchen.html?rref=collection/sectioncollection/technology&action=click&contentCollection=technology&region=rank&module=package&version=highlights&contentPlacement=2&pgtype=sectionfront

http://www.computerweekly.com/feature/Network-security-in-the-age-of-the-internet-of-things


Presentation articles for Today• “The Follower Factory,” New York Times,

https://www.nytimes.com/interactive/2018/01/27/technology/social-media-bots.html [Wei P.]

• “Is it too late for big data ethics?” Forbes, https://www.forbes.com/sites/kalevleetaru/2017/10/16/is-it-too-late-for-big-data-ethics/#4fd4e33f3a6d [Daniel C]

• “Your Roomba already maps your home. Now the CEO plans to sell that map,” USA Today, https://www.usatoday.com/story/tech/nation-now/2017/07/25/roomba-plans-sell-maps-users-homes/508578001/[Michelle H]

• “Racist, sexist AI could be a bigger problem than lost jobs,” Forbes, https://www.forbes.com/sites/parmyolson/2018/02/26/artificial-intelligence-ai-bias-google/#fd91bbf1a015 [Halley F]

https://www.nytimes.com/interactive/2018/01/27/technology/social-media-bots.html

https://www.forbes.com/sites/kalevleetaru/2017/10/16/is-it-too-late-for-big-data-ethics/#4fd4e33f3a6d

https://www.usatoday.com/story/tech/nation-now/2017/07/25/roomba-plans-sell-maps-users-homes/508578001/

https://www.forbes.com/sites/parmyolson/2018/02/26/artificial-intelligence-ai-bias-google/#fd91bbf1a015

data and society lecture 9: data and ethics

Documents