data and society data and research 1 –lecture 15
TRANSCRIPT
Fran Berman, Data and Society, CSCI 4370/6370
Data and SocietyData and Research 1 – Lecture 15
10/26/20
Fran Berman, Data and Society, CSCI 4370/6370
Today (10/26/20)
• Personal Essay #3 Assignment• No reading for next class
• Lecture
• Student Presentations
2
Fran Berman, Data and Society, CSCI 4370/6370
Personal Essay 3 – Data and Elections
• Social media (Twitter, Facebook, TikTok, etc.) is playing an increasingly important role as a means of providing information (and sometimes misinformation) to targeted groups of voters.
• If you were CEO of a social media company (feel free to choose one), what policy would you adopt to ensure that your platform is not being used to unduly manipulate voters? What do you think the broader repercussions of that policy would be?
• DUE: Send .docx to Fran by 11:59 p.m. on November 4
• WORDCOUNT: 550-650 words
Fran Berman, Data and Society, CSCI 4370/6370
Date Topic Speaker Date Topic Speaker
8-31 Introduction Fran 9-3 The Data-driven world Fran
9-10 Data and COVID-19 - models Fran 9-14 Data and COVID-19 – contact tracing Fran
9-17 Data and the Opioid Crisis Liz Chiarello 9-21 Data and Privacy - Intro Fran
9-24 Data and Privacy – Differential Privacy and the Census
Fran 9-28 Data and Privacy – Anonymity Fran
10-1 Data and Privacy - Law Fran 10-5 Digital rights in the EU and China Fran
10-8 Data and Elections 1 Fran 10-12 NO CLASS – Columbus / Indigenous Peoples’ Day
Fran
10-15 Data and Elections 2 Fran 10-19 Data and Elections 3 Fran
10-22 Data and Elections 4 Todd Rogers 10-26 Data and Research 1 / PERSONAL ESSAY 3
Fran
10-29 Data and Research 2 Josh Greenberg
11-2 Data and Discrimination 1 / PERSONAL ESSAY 3 DUE NOV 4
Fran
11-5 Data and Discrimination 2 / BRIEFING (TEAMS OF 2)
Fran 11-9 Data and the IoT 1 Fran
11-12 Data and the IoT 2 Fran 11-15 Data and Ethics Fran
11-23 Cybersecurity / BRIEFING DUE /PERSONAL ESSAY 4 or OP-ED
Bruce Schneier
11-30 Data Infrastructure Fran
12-3 Data Science / PERSONAL ESSAY OR OP-ED DUE
Fran 12-7 Data Careers Kathy Pham
12-10 Wrap-up Fran
Fran Berman, Data and Society, CSCI 4370/6370
Reading for today
• “The AI Gaydar Study and the Real Dangers of Big Data”, The New Yorker, https://www.newyorker.com/news/daily-comment/the-ai-gaydar-study-and-the-real-dangers-of-big-data
Fran Berman, Data and Society, CSCI 4370/6370
Lecture 15 – Data and Research 1
• What is good science? The Gaydar Study
Fran Berman, Data and Society, CSCI 4370/6370
Paper for Discussion Today
https://www.gsb.stanford.edu/faculty-research/publications/deep-neural-networks-are-more-accurate-humans-detecting-sexual
Fran Berman, Data and Society, CSCI 4370/6370
The ”Gaydar” Study [Wang and Kosinski, 2017]
• Science Problem: Can a machine be used to detect sexual orientation, and how does it compare with human detection of sexual orientation?
• Approach: Deep neural network algorithm used to extract features from 35K facial images of gay and heterosexual men and women. Features processed and entered into a logistic regression to classify sexual orientation.
• Results: Algorithm could correctly distinguish between gay and heterosexual men in 81% of the cases and between gay and heterosexual women in 74% of the cases. Human judges achieved lower accuracy: 61% for men and 54% for women. (one facial image per person, algorithm accuracy increases with 5 facial images)
Fran Berman, Data and Society, CSCI 4370/6370
Physiognomy – science of judging one’s character from their facial characteristics
• Dates back to ancient China and Greece.
• Cesare Lombroso (founder of criminal anthropology) believed that criminals could be identified by their facial features.
• Physiognomy now universally rejected as a mix of superstition and racism disguised as science
• However, some scientific work focusing on links between facial structure and character still being explored.
– In particular, work exploring correlation between pre- and post-natal hormonal levels, facial appearance and other physical characteristics with behavior and character.
Fran Berman, Data and Society, CSCI 4370/6370
Theoretical basis for this study
• Approach based on Prenatal Hormone Theory (PHT) of sexual orientation: “same-gender orientation stems from the underexposure of male fetuses or the over-exposure of female fetuses to androgens that are responsible for sexual differentiation”.– PHT predicts that gay men and women develop more gender atypical
facial features than their heterosexual counterparts.
• Previous studies show mixed support for gender atypicalityof facial features of gay men and women.– Previous studies used relatively small sample sizes– Also difficult to define what is “masculine” and what is “feminine”
Fran Berman, Data and Society, CSCI 4370/6370
Study Data Set • Data used from public images on on-line dating sites
– 36+K men/130+K images and 38+K women/170+K images used in study– Individuals classified heterosexual/homosexual dependent on “men seeking
women/men”, “women seeking men/women”– Respondents were not asked for consent other than through T&A on sites– Data from largely white, U.S. constituency, ages 18-40
• Data needed to be normalized: Images vary in quality, facial expression, head orientation, background, etc.– Facial recognition algorithm extracts key features– Derived dataset contained 50%/50% gay/straight men and 53%/47%
gay/straight women
• Subsequent data was also obtained from Facebook websites popular among gay men according to “Facebook Audience Insights” platform and “interested in” field of user’s FB profiles
Fran Berman, Data and Society, CSCI 4370/6370
Data preparation and normalization
• Raw data taken from Facebook and dating sites (some faces had multiple images), run through a data preparation algorithm to normalize
– Faces normalized using a set of “landmarks” to record the contour and features of the face as well as parameters providing the orientation of the face and head in space
• Scientists removed images from data set containing multiple faces, partially hidden faces, overly small faces, faces not facing the camera directly
• Amazon Turk workers verified that faces were adult, Caucasian, fully visible, of the gender reported
Fran Berman, Data and Society, CSCI 4370/6370
Assessment
• Derived set given to neural network algorithm (AI algorithm) and to humans from Amazon Mechanical Turk workers (human processing) for assessment
• Neural network algorithm used logistic regression model trained to classify sexual orientation using 500 values extracted from images in derived dataset
– Algorithm minimizes the effect of (“transient features”) background, lighting, head orientation, contrast, etc.
• Algorithm trained to classify sexual orientation on a test set and then on the larger set. Experiments included faces with one image and faces with multiple images
Fran Berman, Data and Society, CSCI 4370/6370
Results (many and complex, see paper)
• Algorithm could correctly distinguish between gay and heterosexual individuals the majority of the time:– Gay and straight men in 81% of the cases – Gay and straight women in 74% of the cases.
• Human judges achieved lower accuracy: – 61% for men – 54% for women
• Multiple studies on the data – based on multiple images and other variations of parameters.
• “Our results provide strong support for the PHT, which argues that same-gender sexual orientation stems from the underexposure of male fetuses and overexposure of female fetuses to prenatal androgens responsible for the sexual differentiation of faces, preferences, and behavior.”
Fran Berman, Data and Society, CSCI 4370/6370
Limitations of the Study (per Authors)
• Images were non-standardized (varying quality, head orientation, facial expression)
• Images obtained from a dating website –might be particularly revealing of sexual orientation
• Some data might be misleading (data not correct, user bisexual, etc.)
• Insufficient number of non-white gay subjects
Fran Berman, Data and Society, CSCI 4370/6370
Reaction from the Scientific Community and General Public
https://www.nbcnews.com/feature/nbc-out/controversial-ai-gaydar-study-spawns-backlash-ethical-debate-n801026; https://theoutline.com/post/2228/that-study-on-artificially-intelligent-gaydar-is-now-under-ethical-review-michal-kosinski?zd=1&zi=w35isyhh;
https://news.sky.com/story/remember-that-ai-gaydar-googlers-say-its-bunk-11206505; https://www.vox.com/science-and-health/2018/1/29/16571684/michal-kosinski-artificial-intelligence-faces
Fran Berman, Data and Society, CSCI 4370/6370
Authors’ response
• Authors on the study: – “Our findings suggest that publicly available data and conventional
machine learning tools could be employed to build accurate sexual orientation classifiers.”
– “Additionally, given that companies and governments are increasingly using computer vision algorithms to detect people’s intimate traits, our findings expose a threat to the privacy and safety of gay men and women.”
– “We did not create a privacy-invading tool, but rather showed that basic and widely used methods pose serious privacy threats.”
• Full study: https://www.gsb.stanford.edu/sites/gsb/files/publication-pdf/wang_kosinski.pdf
Fran Berman, Data and Society, CSCI 4370/6370
Class discussion
• Is the Gaydar Study good science?
– Is it OK to explore controversial theories?
– Was the data and derived data representative and high quality?
– Even if it is not illegal, is it ethical to use public data in this way?
– Was the methodology scientifically solid?
– Were the results meaningful?
– What are broader implications of the study?
Fran Berman, Data and Society, CSCI 4370/6370
Presentations
Fran Berman, Data and Society, CSCI 4370/6370
• Presentations for October 29
– “A Field Guide for the 21st Century”, The Atlantic, https://www.theatlantic.com/science/archive/2019/12/audubon-field-guide-21st-century/604141/ (Inwon)
– “Uncovering the Secrets of ‘Girl with a Pearl Earing’”, New York Times, https://www.nytimes.com/2018/02/26/arts/design/girl-with-a-pearl-earring-mauritshuis.html?action=click&module=RelatedLinks&pgtype=Article (Tabitha)
• Presentations for November 2
– “Ocean’s Hidden Heat Measured with Earthquake Sounds”, Science, https://www.sciencemag.org/news/2020/09/ocean-s-hidden-heat-measured-earthquake-sounds
– “Is Everybody Doing … OK? Let’s Ask Social Media”, New York Times, https://www.nytimes.com/2020/10/12/style/self-care/social-media-.html
• Presentations for November 5
– “How to Deal with a Crisis of Misinformation”, New York Times, https://www.nytimes.com/2020/10/14/technology/personaltech/how-to-deal-with-a-crisis-of-misinformation.html (Josh)
– “Civil Rights Groups Say If Facebook Won't Act On Election Misinformation, They Will”, NPR, https://www.npr.org/2020/09/25/916782712/civil-rights-groups-say-if-facebook-wont-act-on-election-misinformation-they-wil (Jagrati)
Fran Berman, Data and Society, CSCI 4370/6370
Need Volunteers – Presentations for November 9
• “The Case Against Smart Baby Tech,” Vox, https://www.vox.com/recode/2020/2/26/21152920/ibaby-hacking-smart-baby-monitors-bitdefender (Anna)
• “Hackers can shine lasers at your Alexa Device and do bad, bad things to it,” Popular Mechanics, https://www.popularmechanics.com/technology/security/a29689494/hackers-lasers-alexa-google-home/ (Steven)
Fran Berman, Data and Society, CSCI 4370/6370
Presentations for Today
• “Early Data Shows African Americans Have Contracted and Died of Coronavirus at an Alarming Rate”, ProPublica, https://www.propublica.org/article/early-data-shows-african-americans-have-contracted-and-died-of-coronavirus-at-an-alarming-rate (Jeffrey)
• “More than half of Black-owned businesses may not survive COVID-19”, National Geographic, https://www.nationalgeographic.com/history/2020/07/black-owned-businesses-may-not-survive-covid-19/ (Ted)