users sleeping time analysis based on micro-blogging data

27
USERS SLEEPING TIME ANALYSIS BASED ON MICRO-BLOGGING DATA Haoran Yu, Guangzhong Sun, Min Lv

Upload: una

Post on 25-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

USERS SLEEPING TIME ANALYSIS BASED ON MICRO-BLOGGING DATA. Haoran Yu, Guangzhong Sun, Min Lv. Outline. Introduction Data Methods Find the longest inactive time span to discover sleeping time from dense data Statistics method in discovering sleeping time pattern from sparse data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

USERS SLEEPING TIME ANALYSIS BASED ON MICRO-BLOGGING DATA

Haoran Yu, Guangzhong Sun, Min Lv

AtlasWang
不用红色也不用把标题弄这么暗吧...首页色调太灰感觉不醒目,建议标题白色即可
Page 2: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Outline• Introduction• Data• Methods

• Find the longest inactive time span to discover sleeping time from dense data• Statistics method in discovering sleeping time pattern from sparse data

• United time series by month• Subsequence statistics

• Experimental Results • General result of sleeping time• Further analysis

• Detect time zone of Sina Weibo users• Detection on movement between different time zones

• Related Applications• Conclusion

Page 3: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Introduction(1)• Goals

• Inferring the sleeping time of Sina Weibo users• Enable general location(time zone) prediction

Page 4: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Introduction(1)• Goals

• Inferring the sleeping time of Sina Weibo users• Enable general location(time zone) prediction

• Motivation• The increasing availability of user-generated contents.

• User-generated posts• Time series of activities

Page 5: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Introduction(1)• Goals

• Inferring the sleeping time of Sina Weibo users• Enable general location(time zone) prediction

• Motivation• The increasing availability of user-generated contents.

• User-generated posts• Time series of activities

• Users’ activities reflect their sleeping time• Posts: Goodnight, Sleep, or Guess How Much I Love You, etc• Time series: Active time & Inactive time

Mid-night

Sleep

Sleep

Sleep

20:05 Beijing Time

23:51 Beijing Time

13:42 Beijing Time – 00:42 Central Time

Page 6: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Introduction(1)• Goals

• Inferring the sleeping time of Sina Weibo users• Enable general location(time zone) prediction

• Motivation• The increasing availability of user-generated contents.

• User-generated posts• Time series of activities

• Users’ activities reflect their sleeping time• Posts: Goodnight,  Sleep, or Guess How Much I Love You• Time series: Active time & Inactive time

• Sleeping time is tightly related to location (Gender, Age, job, etc may have influence as well)(Sleep at night)

Page 7: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Counting Sheep with Path Data Science – Aug 23, 2012By Path Data Team

http://blog.path.com/post/30041197400/counting-sheep-with-path-data-science

Page 8: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Counting Sheep with Path Data Science – Aug 23, 2012By Path Data Team

http://blog.path.com/post/30041197400/counting-sheep-with-path-data-science

Page 9: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Art related works

Economy & finance

Different time to go to bed

Different time to wake up

Different active degree

Page 10: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

User live in western China

User live in western Europe

Page 11: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Introduction(1)• Goals

• Inferring the sleeping time of Sina Weibo users• Enable general location(time zone) prediction

• Motivation• The increasing availability of user-generated contents.

• User-generated posts• Time series of activities

• Users’ activities reflect their sleeping time• Posts: Goodnight,  Sleep, or Guess How Much I Love You• Time series: Active time & Inactive time

• Sleeping time is tightly related to location (Gender, Age, job, etc may have influence as well)(Sleep at night)

Page 12: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Introduction(1)• Goals

• Inferring the sleeping time of Sina Weibo users• Enable general location(time zone) prediction

• Motivation• The increasing availability of user-generated contents.

• User-generated posts• Time series of activities

• Users’ activities reflect their sleeping time• Posts: Goodnight,  Sleep, or Guess How Much I Love You• Time series: Active time & Inactive time

• Sleeping time is tightly related to location (Gender, Age, job, etc may have influence as well)(Sleep at night)

• Significance• Detect users with fake location (Distinguish real person and robots)• The change of location => advertisement

Page 13: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Introduction(2)• Difficulty & Challenges

• User are not always active => data are sparse• Aspects other then location => influence the sleeping time

• Contribution and insights• A new open problem related to the relationship between sleeping time and

micro-blogging activities is presented• Preliminary simple solutions are presented in this paper. Based on

experiment results on real data set.• A data set of users’ time series of activities on Weibo is collected and open

to the public.

AtlasWang
这块没有prior work是吧?有的话还是提一下好 有比较才能看出contribution嘛
Page 14: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Data• Source: Sina Weibo

• Sina Weibo (http://weibo.com/), as a micro-blogging platform, akin to a hybrid of Twitter and Facebook; it is used by well over 30% of 513 million users (Up to the end of year 2011) in China. It has a similar market penetration that Twitter has established out of China.

• User type:• Verified users with daytime work jobs• > 500 fans and >100 posts

• Total number of users: 844• Total number of posts: 1, 579, 623• Form of Data

• => each corresponds to an action of user (temporarily only posts)

UID Timestamp

Page 15: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Methods: Find the longest inactive time span to discover sleeping time

from dense data• IDEAL CONDITIONS: people sleep for a long time once a day and

keep active in the rest of time. • -> can't find out two or more long inactive span • -> D = {d1, d2,… , dn-1} determined by every two points can be defined

• Therefore, a final span (d') ̅ representing the average sleeping time of a user can be figured out by calculating the average lower bound and upper bound of all result spans.

t1 t2 t3 t4 tn-2 tn-1 tn

Page 16: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Methods: Statistics method in discovering sleeping time pattern

from sparse data • IDEAL CONDITIONS -> Less than 1% of Weibo users can

satisfy• Statistics

• 1440-minutes-period (60 X 24)• Daily data of time series are united together by a

section(week/month/quarter…)

Day 1Day 2…Day n

Page 17: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Find the longest inactive time span to discover sleeping time from dense data- United time series by month• In order to solve this problem in this case, daily data of the time series are

united by month for example, as new sequences Tm = {tm1, tm2 … tmn}.

• In each of the time series data of month m, the lengths of spans Dm = {dm1, dm2, … ,dmn} determined by every two points can be defined as:

• long inactive spans can be easily obtained from the time series data of the corresponding month.

tm1 tm2 tm3 tm4 … tm(n-2) t m(n-1) tmn

dmn dm1 dm2 dm3 … dm(n-2) dm(n-1) dmn

Page 18: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Find the longest inactive time span to discover sleeping time from dense data- Subsequence statistics• ! The sparsity of time series data is different from

different users.->The chosen length of the section for doing statistics cannot match all the circumstances.

• Subsequence statistics• Figure out all inactive spans of time• Move start point & end point of subsequence• Make every subsequence a ideal condition

• Accuracy+ Efficiency –->Cost too much time

Page 19: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Further analysis: User with fake or unclear location label

• Sina Weibo users label their location in the profile without verification. Some of them label the location as 'unknown' or as fake locations.

• 12 users are judged to have obvious fake location labeled in their profiles. • Fake location within the real time zone can’t been detected

(Beijing & Shanghai will not viewed as different location here)

• The List are as follows:

9775521, 1025882825, 1092620107, 1191220232, 1194431627, 1215914717, 1222524197, 1228151340, 1686658697, 1686719832, 1697717391, 1713138055

Page 20: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Labels his location as 'The United States,

Oversea' (GMT-8~GMT-5)

Verified User.Xiaosong Gao , a

famous musician of mainland China

Page 21: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

2009

11

2009

12

2010

01

2010

02

2010

03

2010

04

2010

05

2010

06

2010

07

2010

08

2010

09

2010

10

2010

11

2010

12

2011

01

2011

02

2011

03

2011

04

2011

05

2011

08

2011

09

2011

10

2011

11

2011

12

2012

01

2012

02

2012

03

-6

-4

-2

0

2

4

6

8

10

12 Start End

Start average End average

Month

Slee

ping

tim

e of

Xia

oson

g G

ao

The result shows that he sleeps at about 0 a.m. (GMT+8) and wakes up at about 8 a.m. (GMT+8). The corresponding time zone is GMT+7. Therefore, the location Mr. Gao labels himself is fake according to the difference between the location in his profile and the detected time zone.

Page 22: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Further analysis: Detection on movement between different time zones• There are users who study abroad or have business overseas. They travel far, but they won't change the label of location in the profile page frequently.

• 33 users are likely to have obvious movement (move from locations in different time zones)• Movement within a time zone can’t been detected• The List are as follows:

1025722364, 1025882825, 1043898503, 1059500987, 1061252391, 1094551220, 1120630263, 1152571540, 1159282133, 1191220232, 1193715355, 1214305597, 1216517652, 1217907573, 1218288163, 1221281087, 1222524197, 1222838993, 1229162931, 1682771764, 1689243304, 1689647122, 1693141730, 1693775877, 1695248214, 1697033804, 1697370827, 1702062500, 1703580823, 1706608200, 1711754322, 1711962552, 1712058193

Page 23: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Labels his location as ‘Brazil, Oversea‘ (GMT-4~GMT-2)

Verified User.Xiaoqian Liu, a reporter of CCTV Brazil

Page 24: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

2010

08

2010

09

2010

10

2010

11

2010

12

2011

01

2011

02

2011

03

2011

04

2011

05

2011

06

2011

07

2011

08

2011

09

2011

10

2011

11

2011

12

2012

01

2012

02

2012

03

-10

-5

0

5

10

15

20 Start End

Start average End average

Month

Slee

ping

tim

e of

Xia

oqia

n Li

u

The result indicates that, before Sept. 2011, he goes to bed at about 0 a.m. (GMT+8) and wakes up at about 6 a.m. (GMT+8). The corresponding time zone is GMT+8~GMT+9. After Sept. 2011, the time he starts sleeping change to about 9 a.m. (GMT+8) and the sleep period end at 4 p.m. (GMT+8). The corresponding time zone is GMT-3.5 ~ GMT-2.5. Thus, the obvious movement is detected and the time of this change can be estimated.

Page 25: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Related  Applications• The study can be extended to other data sets.

• Web visiting logs on the web server• Users or robots?• IP address of users or IP address of proxy/VPN

• Data of user login activates• ‘Don’t lock my account when I use a proxy please…’

• May be useful to the recommendation• Users who share a same sleeping time is more likely to meet on

social network. Why not recommendation their feeds first?• Watch your health!

Page 26: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Conclusion• A new open problem related to the relationship between

sleeping time and micro-blogging activities is presented.

• This study associates the pattern of sleep and the time zone users live in.

• For such a problem above, preliminary simple solutions are presented in this paper. Based on experiment results on real data set, the proposed solutions are proved both efficient and effective.

Page 27: USERS SLEEPING TIME ANALYSIS BASED ON  MICRO-BLOGGING DATA

Thanks! A data set of is open to the public.

The URL is: http://www.haoranyu.com/research/weibosleep

Questions Welcome