data and society data and research 1 –lecture 15

22
Fran Berman, Data and Society, CSCI 4370/6370 Data and Society Data and Research 1 – Lecture 15 10/26/20

Upload: others

Post on 25-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Data and SocietyData and Research 1 – Lecture 15

10/26/20

Page 2: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Today (10/26/20)

• Personal Essay #3 Assignment• No reading for next class

• Lecture

• Student Presentations

2

Page 3: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Personal Essay 3 – Data and Elections

• Social media (Twitter, Facebook, TikTok, etc.) is playing an increasingly important role as a means of providing information (and sometimes misinformation) to targeted groups of voters.

• If you were CEO of a social media company (feel free to choose one), what policy would you adopt to ensure that your platform is not being used to unduly manipulate voters? What do you think the broader repercussions of that policy would be?

• DUE: Send .docx to Fran by 11:59 p.m. on November 4

• WORDCOUNT: 550-650 words

Page 4: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Date Topic Speaker Date Topic Speaker

8-31 Introduction Fran 9-3 The Data-driven world Fran

9-10 Data and COVID-19 - models Fran 9-14 Data and COVID-19 – contact tracing Fran

9-17 Data and the Opioid Crisis Liz Chiarello 9-21 Data and Privacy - Intro Fran

9-24 Data and Privacy – Differential Privacy and the Census

Fran 9-28 Data and Privacy – Anonymity Fran

10-1 Data and Privacy - Law Fran 10-5 Digital rights in the EU and China Fran

10-8 Data and Elections 1 Fran 10-12 NO CLASS – Columbus / Indigenous Peoples’ Day

Fran

10-15 Data and Elections 2 Fran 10-19 Data and Elections 3 Fran

10-22 Data and Elections 4 Todd Rogers 10-26 Data and Research 1 / PERSONAL ESSAY 3

Fran

10-29 Data and Research 2 Josh Greenberg

11-2 Data and Discrimination 1 / PERSONAL ESSAY 3 DUE NOV 4

Fran

11-5 Data and Discrimination 2 / BRIEFING (TEAMS OF 2)

Fran 11-9 Data and the IoT 1 Fran

11-12 Data and the IoT 2 Fran 11-15 Data and Ethics Fran

11-23 Cybersecurity / BRIEFING DUE /PERSONAL ESSAY 4 or OP-ED

Bruce Schneier

11-30 Data Infrastructure Fran

12-3 Data Science / PERSONAL ESSAY OR OP-ED DUE

Fran 12-7 Data Careers Kathy Pham

12-10 Wrap-up Fran

Page 5: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Reading for today

• “The AI Gaydar Study and the Real Dangers of Big Data”, The New Yorker, https://www.newyorker.com/news/daily-comment/the-ai-gaydar-study-and-the-real-dangers-of-big-data

Page 6: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Lecture 15 – Data and Research 1

• What is good science? The Gaydar Study

Page 7: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Paper for Discussion Today

https://www.gsb.stanford.edu/faculty-research/publications/deep-neural-networks-are-more-accurate-humans-detecting-sexual

Page 8: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

The ”Gaydar” Study [Wang and Kosinski, 2017]

• Science Problem: Can a machine be used to detect sexual orientation, and how does it compare with human detection of sexual orientation?

• Approach: Deep neural network algorithm used to extract features from 35K facial images of gay and heterosexual men and women. Features processed and entered into a logistic regression to classify sexual orientation.

• Results: Algorithm could correctly distinguish between gay and heterosexual men in 81% of the cases and between gay and heterosexual women in 74% of the cases. Human judges achieved lower accuracy: 61% for men and 54% for women. (one facial image per person, algorithm accuracy increases with 5 facial images)

Page 9: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Physiognomy – science of judging one’s character from their facial characteristics

• Dates back to ancient China and Greece.

• Cesare Lombroso (founder of criminal anthropology) believed that criminals could be identified by their facial features.

• Physiognomy now universally rejected as a mix of superstition and racism disguised as science

• However, some scientific work focusing on links between facial structure and character still being explored.

– In particular, work exploring correlation between pre- and post-natal hormonal levels, facial appearance and other physical characteristics with behavior and character.

Page 10: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Theoretical basis for this study

• Approach based on Prenatal Hormone Theory (PHT) of sexual orientation: “same-gender orientation stems from the underexposure of male fetuses or the over-exposure of female fetuses to androgens that are responsible for sexual differentiation”.– PHT predicts that gay men and women develop more gender atypical

facial features than their heterosexual counterparts.

• Previous studies show mixed support for gender atypicalityof facial features of gay men and women.– Previous studies used relatively small sample sizes– Also difficult to define what is “masculine” and what is “feminine”

Page 11: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Study Data Set • Data used from public images on on-line dating sites

– 36+K men/130+K images and 38+K women/170+K images used in study– Individuals classified heterosexual/homosexual dependent on “men seeking

women/men”, “women seeking men/women”– Respondents were not asked for consent other than through T&A on sites– Data from largely white, U.S. constituency, ages 18-40

• Data needed to be normalized: Images vary in quality, facial expression, head orientation, background, etc.– Facial recognition algorithm extracts key features– Derived dataset contained 50%/50% gay/straight men and 53%/47%

gay/straight women

• Subsequent data was also obtained from Facebook websites popular among gay men according to “Facebook Audience Insights” platform and “interested in” field of user’s FB profiles

Page 12: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Data preparation and normalization

• Raw data taken from Facebook and dating sites (some faces had multiple images), run through a data preparation algorithm to normalize

– Faces normalized using a set of “landmarks” to record the contour and features of the face as well as parameters providing the orientation of the face and head in space

• Scientists removed images from data set containing multiple faces, partially hidden faces, overly small faces, faces not facing the camera directly

• Amazon Turk workers verified that faces were adult, Caucasian, fully visible, of the gender reported

Page 13: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Assessment

• Derived set given to neural network algorithm (AI algorithm) and to humans from Amazon Mechanical Turk workers (human processing) for assessment

• Neural network algorithm used logistic regression model trained to classify sexual orientation using 500 values extracted from images in derived dataset

– Algorithm minimizes the effect of (“transient features”) background, lighting, head orientation, contrast, etc.

• Algorithm trained to classify sexual orientation on a test set and then on the larger set. Experiments included faces with one image and faces with multiple images

Page 14: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Results (many and complex, see paper)

• Algorithm could correctly distinguish between gay and heterosexual individuals the majority of the time:– Gay and straight men in 81% of the cases – Gay and straight women in 74% of the cases.

• Human judges achieved lower accuracy: – 61% for men – 54% for women

• Multiple studies on the data – based on multiple images and other variations of parameters.

• “Our results provide strong support for the PHT, which argues that same-gender sexual orientation stems from the underexposure of male fetuses and overexposure of female fetuses to prenatal androgens responsible for the sexual differentiation of faces, preferences, and behavior.”

Page 15: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Limitations of the Study (per Authors)

• Images were non-standardized (varying quality, head orientation, facial expression)

• Images obtained from a dating website –might be particularly revealing of sexual orientation

• Some data might be misleading (data not correct, user bisexual, etc.)

• Insufficient number of non-white gay subjects

Page 16: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Reaction from the Scientific Community and General Public

https://www.nbcnews.com/feature/nbc-out/controversial-ai-gaydar-study-spawns-backlash-ethical-debate-n801026; https://theoutline.com/post/2228/that-study-on-artificially-intelligent-gaydar-is-now-under-ethical-review-michal-kosinski?zd=1&zi=w35isyhh;

https://news.sky.com/story/remember-that-ai-gaydar-googlers-say-its-bunk-11206505; https://www.vox.com/science-and-health/2018/1/29/16571684/michal-kosinski-artificial-intelligence-faces

Page 17: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Authors’ response

• Authors on the study: – “Our findings suggest that publicly available data and conventional

machine learning tools could be employed to build accurate sexual orientation classifiers.”

– “Additionally, given that companies and governments are increasingly using computer vision algorithms to detect people’s intimate traits, our findings expose a threat to the privacy and safety of gay men and women.”

– “We did not create a privacy-invading tool, but rather showed that basic and widely used methods pose serious privacy threats.”

• Full study: https://www.gsb.stanford.edu/sites/gsb/files/publication-pdf/wang_kosinski.pdf

Page 18: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Class discussion

• Is the Gaydar Study good science?

– Is it OK to explore controversial theories?

– Was the data and derived data representative and high quality?

– Even if it is not illegal, is it ethical to use public data in this way?

– Was the methodology scientifically solid?

– Were the results meaningful?

– What are broader implications of the study?

Page 19: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Presentations

Page 20: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

• Presentations for October 29

– “A Field Guide for the 21st Century”, The Atlantic, https://www.theatlantic.com/science/archive/2019/12/audubon-field-guide-21st-century/604141/ (Inwon)

– “Uncovering the Secrets of ‘Girl with a Pearl Earing’”, New York Times, https://www.nytimes.com/2018/02/26/arts/design/girl-with-a-pearl-earring-mauritshuis.html?action=click&module=RelatedLinks&pgtype=Article (Tabitha)

• Presentations for November 2

– “Ocean’s Hidden Heat Measured with Earthquake Sounds”, Science, https://www.sciencemag.org/news/2020/09/ocean-s-hidden-heat-measured-earthquake-sounds

– “Is Everybody Doing … OK? Let’s Ask Social Media”, New York Times, https://www.nytimes.com/2020/10/12/style/self-care/social-media-.html

• Presentations for November 5

– “How to Deal with a Crisis of Misinformation”, New York Times, https://www.nytimes.com/2020/10/14/technology/personaltech/how-to-deal-with-a-crisis-of-misinformation.html (Josh)

– “Civil Rights Groups Say If Facebook Won't Act On Election Misinformation, They Will”, NPR, https://www.npr.org/2020/09/25/916782712/civil-rights-groups-say-if-facebook-wont-act-on-election-misinformation-they-wil (Jagrati)

Page 21: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Need Volunteers – Presentations for November 9

• “The Case Against Smart Baby Tech,” Vox, https://www.vox.com/recode/2020/2/26/21152920/ibaby-hacking-smart-baby-monitors-bitdefender (Anna)

• “Hackers can shine lasers at your Alexa Device and do bad, bad things to it,” Popular Mechanics, https://www.popularmechanics.com/technology/security/a29689494/hackers-lasers-alexa-google-home/ (Steven)

Page 22: Data and Society Data and Research 1 –Lecture 15

Fran Berman, Data and Society, CSCI 4370/6370

Presentations for Today

• “Early Data Shows African Americans Have Contracted and Died of Coronavirus at an Alarming Rate”, ProPublica, https://www.propublica.org/article/early-data-shows-african-americans-have-contracted-and-died-of-coronavirus-at-an-alarming-rate (Jeffrey)

• “More than half of Black-owned businesses may not survive COVID-19”, National Geographic, https://www.nationalgeographic.com/history/2020/07/black-owned-businesses-may-not-survive-covid-19/ (Ted)