superficial data analysis

Post on 01-Nov-2014

55 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Superficial data analysis

TRANSCRIPT

1

SUPERFICIALDATA ANALYSIS

ExploringMillions of Social

Stereotypes

Presentators Nguyen Dao Tan Bao

Cao Dinh QuiPham Huy Thanh

InstructorProf. Lothar Piepmeyer

Superficial Data Analysis

2

How the stereotypes and our appearances influence the way we are perceived ?

The answers were found by analyzing facts in large pool of data collected from diverse group of people

Let us tell you the story about

FaceStat.com

from Brendan O’Connor & Lukas Biewald

We love the story

4

How do we perceive AGE, GENDER, INTELLIGENCE, AND ATTRACTIVENESS ?

WHAT INSIGHT CAN WE EXTRACT from millions of anonymous opinions?

Collect the data

FaceStat runs on an SQL database.

Judgment of user is taken and saves as a set of (face ID,

attribute, judgment) triples.

Exploring the relationships between different types of perceived attributes.

5

Collect the data

Example question :

“How old do I look?”

Look at age judgments’ value and count how many times each value occurs and order by this count

We have 10 million rows of data that can be extracted from the database

6

1st number is the frequency count.

2nd string is the response value

Clean up the data

Problematic of data:

“How old do I look?”

Error: \r\n

Outliers: rare values

Format of user responses: Text instead of number

7

1st number is the frequency count.

2nd string is the response value

Clean up the data

Challenges in

“preprocessing data”

Mapping from multiple-choice responses to numerical values: “very trustworthy” vs “not to be trusted”

Aggregate results from multiple people into a single description of a face. problematic of data

Missing values

8

1st number is the frequency count.

2nd string is the response value

9

Expected data

10

Toolkits for Data Analysis

11 age correlations

Further Investigate

Distribution of age values:“Outliers”

Remove the outliers: Select rows with age less than 100

12

Figure 17-3 : Initial histogram of Face age Data

Further Investigate

13

Figure 17-4 : Histogram of cleaned Face age Data

Age, Attractive and Gender

14

Figure 17-5 : Scatterplot of attractiveness versus age,

colored by gender.

Pink: Female

Blue : Male

Age, Attractive and Gender

15

Figure 17-6 : smoothed scatterplots for attractiveness

versus age, one plot per gender.

Age, Attractive and Gender

“ How does age affect attractiveness ? “

• We compute 95% confidence intervals

• Fit a loess curve to help visualize aggregate patterns in this noisy sequential data

16

Age, Attractive and Gender

“ How does age affect attractiveness ? “

17

Figure 17-7 :

smoothed scatterplots for attractiveness versus age, one plot per gender.

Age, Attractive and Gender

“ How does age affect attractiveness ? “

• Women are generally judged as more attractive than men across all ages except babies.

• Babies are found to be most attractive, but the attractiveness drops until around age 18 after which it rises and peaks around age 27 After that, attractiveness drops until around age 50

18

Attributes Correlations

How about the others ?

Attributes Correlations

How about the others ?

Intelligence

Attributes Correlations

How about the others ?

Intelligence

Weight

Attributes Correlations

How about the others ?

Intelligence

Weight

Trustworthy

Attributes Correlations

How about the others ?

Intelligence

Weight

Trustworthy

Outfit

Attributes Correlations

How about the others ?

Intelligence

Weight

Trustworthy

Outfit

Wealth

We can do with the same step ...

Attributes Correlations

We can do with the same step ...

Or …..

We can put everything in a big picture

Attributes Correlations

We use R language to make

Pearson Correlation Matrix

Attributes Correlations

Woman are judged more intelligent than men

Woman are judged more likely to win a dog fight

Dress size is weakly correlated to weight

Attributes Correlations

LOOKING

AT

THE TAGS

Describe me in one word : ………………………………………..

Describe me in one word : ………FREE FORM TAGS……..

Describe me in one word : ………FREE FORM TAGS……..

So complicated to process

Describe me in one word : Cute !

Describe me in one word : Pls, call me - 091231512

Describe me in one word : abc xyz aK&*$(#k,,fh..

Let’s use R language to examine the tags !

THE TAGS

Most common tags ?

THE TAGS

Least Common Tags ?

THE TAGS

THE TAGS

“cute” and “Cute” can be merged ?

THE TAGS

“hot” and “HOT!!!” have differentsemantic content !!!

“cute” and “Cute” can be merged ?

THE TAGS

“hot” and “HOT!!!” have differentsemantic content !!!

“cute” and “Cute” can be merged ?

Unknown language !?

290,000 unique tags

out of 2.4 million total.

.

THE TAGS

290,000 unique tags

out of 2.4 million total.

The top 1,000 unique tags

have 1.4 million occurrences

.

THE TAGS

How do the tags

fit in

with the rest of our data?.

THE TAGS

WHICH WORDS

ARE

GENDERED ?

Which description tags

are most characteristic of

male or female ?

GENDERED WORDS

Which description tags

are most characteristic of

male or female ?

GENDERED WORDS

Ex : handsome

Which description tags

are most characteristic of

male or female ?

GENDERED WORDS

Ex : handsome

Which description tags

are most characteristic of

male or female ?

GENDERED WORDS

Ex : handsome, makeup

Which description tags

are most characteristic of

male or female ?

GENDERED WORDS

Ex : handsome, makeup

Which description tags

are most characteristic of

male or female ?

GENDERED WORDS

Ex : handsome, makeup, shopping

Which description tags

are most characteristic of

male or female ?

GENDERED WORDS

Ex : handsome, makeup, shopping

Which description tags

are most characteristic of

male or female ?

GENDERED WORDS

Ex : handsome, makeup, shopping, gamer

Which description tags

are most characteristic of

male or female ?

GENDERED WORDS

Ex : handsome, makeup, shopping, gamer

How to do ?

Count the wordsthat occur most often for men or for women ?

Score tags by the ratio of occurrences between genders

GENDERED WORDS

Score tags by the ratio of occurrences between genders

How characteristic a tag T is for gender G

GENDERED WORDS

GENDERED WORDS

For male

For female

GENDERED WORDS

What are the typical types of people in our data?

CuteLoser

flirty

fratboy

Playeridiot

Data Mining

Supervised Learning

Unsupervised learning

Association Rules

Clustering

…Classificatio

nRegressio

n

CLUSTERING

Decision

Tree

Have a target

attribute

DON’T have a target

attribute

Labelled data

Unlabelled data

Definition: grouping together objects that are similar to each other

Applications:-Marketing segmentation-Business-Healthcare-Document retrieve-Etc…

CLUSTERING

K-MEANS CLUSTERING

The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, where k < n.

How the K-Mean Clustering algorithm works?

A Simple example showing the implementation of k-means algorithm

(using K=2)

Step 1:Initialization: Randomly we choose following two centroids

(k=2) for two clusters.In this case the 2 centroid are: m1=(1.0,1.0) and

m2=(5.0,7.0).

Step 2:

Thus, we obtain two clusters containing:

{1,2,3} and {4,5,6,7}.

Their new centroids are:

Step 3:

Now using these centroids we compute the Euclidean distance of each object, as shown in table.

Therefore, the new clusters are:

{1,2} and {3,4,5,6,7}

Next centroids are: m1=(1.25,1.5) and m2 = (3.9,5.1)

Step 4 :

The clusters obtained are:

{1,2} and {3,4,5,6,7}

Therefore, there is no change in the cluster.

Thus, the algorithm comes to a halt here and final result consist of 2 clusters {1,2} and {3,4,5,6,7}.

PLOT

Per-face Data

K=6 clusters and 8 attributes

Blue custer

Green cluster

Red cluster

Turquoise cluster

Orange

cluster

Purple cluster

Conclusion

The data shows people hold some familiar stereotypes.

Let’s data speak it self.

Q&A

top related