death and edits
Post on 12-Jul-2015
245 Views
Preview:
TRANSCRIPT
Death and edits
Miles Lincoln
LIS590MT
Wikipedia!
You probably already know
what it is!
Bursts in social networks
Bursts of edits on Wikipedia in particular
When do those occur?
What can we learn by looking at
spikes in edit frequency?
How have edit spikes changed over Wikipedia’s ten
years of existence?
Does the size of an edit spike correlate to anything?
Bursts in other social networks
Google Trends
Celebrity deaths!
Revision history
Revision history
But first…
We need to process the data so that we can answer that
question
Perl
Regular Expressions (Regex)
Perl script uses regular expressions to find and
output matching pieces of text.
In this case, I am pulling out dates in Wikipedia’s
day month year format and re-writing them in a
more machine-readable MM/DD/YYYY format.
11/08/2011
Data manipulation
Copy/pase the revision history of wiki
pages into a text document which I
feed to my perl script
Results in lists consisting of one date
per edit that occurred on that date
Copying/pasting isn’t super
elegant, but I haven’t gotten
LWP/useragent stuff to work yet
Excel!
Throw my lists of dates into a pivot table, which
shows me the frequency that each date occurs
Some vlookup magic allows me to combine
these edit frequencies of individual actors into one big list covering every day from 6/1/2001 to
the present
Et Voila!
Problems
9 actors over 10 years means close to 100k cells
Excel is not built for speed
Matlab might work better
What does the data look like over
time?
6/1-5/31 from 2001 (when Wikipedia’s current edit no.’s
begin) to 2010 (when all of the bursts have settled down)
6/1/2001-5/31/2002
0
0.2
0.4
0.6
0.8
1
1.2
6/1/01 7/1/01 8/1/01 9/1/01 10/1/01 11/1/01 12/1/01 1/1/02 2/1/02 3/1/02 4/1/02 5/1/02
Series1
Series2
Series3
Series4
Series5
Series6
Series7
Series8
Series9
6/1/2002-5/31/2003
0
2
4
6
8
10
12
14
6/1/02 7/1/02 8/1/02 9/1/02 10/1/02 11/1/02 12/1/02 1/1/03 2/1/03 3/1/03 4/1/03 5/1/03
Series1
Series2
Series3
Series4
Series5
Series6
Series7
Series8
Series9
6/1/2003-5/31/2004
0
5
10
15
20
25
30
6/1/03 7/1/03 8/1/03 9/1/03 10/1/03 11/1/03 12/1/03 1/1/04 2/1/04 3/1/04 4/1/04 5/1/04
Series1
Series2
Series3
Series4
Series5
Series6
Series7
Series8
Series9
6/1/2004-5/31/2005
0
10
20
30
40
50
60
6/1/04 7/1/04 8/1/04 9/1/04 10/1/04 11/1/04 12/1/04 1/1/05 2/1/05 3/1/05 4/1/05 5/1/05
Series1
Series2
Series3
Series4
Series5
Series6
Series7
Series8
Series9
6/1/2005-5/31/2006
0
5
10
15
20
25
30
6/1/05 7/1/05 8/1/05 9/1/05 10/1/05 11/1/05 12/1/05 1/1/06 2/1/06 3/1/06 4/1/06 5/1/06
Series1
Series2
Series3
Series4
Series5
Series6
Series7
Series8
Series9
6/1/2006-5/31/2007
0
5
10
15
20
25
30
35
40
45
50
6/1/06 7/1/06 8/1/06 9/1/06 10/1/06 11/1/06 12/1/06 1/1/07 2/1/07 3/1/07 4/1/07 5/1/07
Series1
Series2
Series3
Series4
Series5
Series6
Series7
Series8
Series9
6/1/2007-5/31/2008
0
50
100
150
200
250
300
350
400
6/1/07 7/1/07 8/1/07 9/1/07 10/1/07 11/1/07 12/1/07 1/1/08 2/1/08 3/1/08 4/1/08 5/1/08
Series1
Series2
Series3
Series4
Series5
Series6
Series7
Series8
Series9
6/1/2008-5/31/2009
0
10
20
30
40
50
60
70
80
6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09
Series1
Series2
Series3
Series4
Series5
Series6
Series7
Series8
Series9
6/1/2009-5/31/2010
0
20
40
60
80
100
120
140
160
180
200
6/1/09 7/1/09 8/1/09 9/1/09 10/1/09 11/1/09 12/1/09 1/1/10 2/1/10 3/1/10 4/1/10 5/1/10
Series1
Series2
Series3
Series4
Series5
Series6
Series7
Series8
Series9
Series10
Spike sizes over the years
0
50
100
150
200
250
300
350
400
2002 2003 2004 2005 2006 2007 2008 2009
Series2
Let’s take a closer look at the more
interesting actors
Actors #4-9 6/1/2008-5/31/2009
0
10
20
30
40
50
60
70
80
6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09
Series1
Series2
Series3
Series4
Series5
Series6
Actors #4-9 6/1/2008-5/31/2009 -log
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09
Series1
Series2
Series3
Series4
Series5
Series6
One actor at a time ~10 years
Actor #1 DoD: 6/27/2001 -edits/day
0
2
4
6
8
10
12
14
6/28/01 6/28/02 6/28/03 6/28/04 6/28/05 6/28/06 6/28/07 6/28/08 6/28/09 6/28/10 6/28/11
Series1
Actor #1 –log(edits)/day
0
0.2
0.4
0.6
0.8
1
1.2
6/28/01 6/28/02 6/28/03 6/28/04 6/28/05 6/28/06 6/28/07 6/28/08 6/28/09 6/28/10 6/28/11
Series1
Actor #7 -edits/day
0
10
20
30
40
50
60
70
80
90
100
9/24/03 9/24/04 9/24/05 9/24/06 9/24/07 9/24/08 9/24/09 9/24/10 9/24/11
Series1
Actor #7 –log(edits)/day
0
0.5
1
1.5
2
2.5
9/24/03 9/24/04 9/24/05 9/24/06 9/24/07 9/24/08 9/24/09 9/24/10 9/24/11
Series1
Actor #8 -edits/day
0
50
100
150
200
250
300
350
400
12/10/03 12/10/04 12/10/05 12/10/06 12/10/07 12/10/08 12/10/09 12/10/10
Series1
Actor #8 –log(edits)/day
0
0.5
1
1.5
2
2.5
3
12/10/03 12/10/04 12/10/05 12/10/06 12/10/07 12/10/08 12/10/09 12/10/10
Series1
Actor #9 –edits/day
0
20
40
60
80
100
120
140
160
180
200
2/28/04 2/28/05 2/28/06 2/28/07 2/28/08 2/28/09 2/28/10 2/28/11
Series1
Actor #9 –log(edits)/day
0
0.5
1
1.5
2
2.5
2/28/04 2/28/05 2/28/06 2/28/07 2/28/08 2/28/09 2/28/10 2/28/11
Series1
If we tweak the data to take
importance into consideration…
Average gross, adjusted for inflation*
Only available for a small amount of actors chosen in the
sample set
Taken from boxofficemojo.com
Extremely reliable source
Actor #8 vs. Actor #9
0
50
100
150
200
250
300
350
400
1 9
17
25
33
41
49
57
65
73
81
89
97
10
5
11
3
12
1
12
9
13
7
14
5
15
3
16
1
16
9
17
7
18
5
19
3
20
1
20
9
21
7
22
5
23
3
24
1
24
9
25
7
26
5
27
3
28
1
28
9
29
7
30
5
31
3
32
1
32
9
33
7
34
5
35
3
36
1
ledger
swayze
Actor #8 vs. Actor #9 (adjusted)
0
50
100
150
200
250
300
350
400
1 9
17
25
33
41
49
57
65
73
81
89
97
10
5
11
3
12
1
12
9
13
7
14
5
15
3
16
1
16
9
17
7
18
5
19
3
20
1
20
9
21
7
22
5
23
3
24
1
24
9
25
7
26
5
27
3
28
1
28
9
29
7
30
5
31
3
32
1
32
9
33
7
34
5
35
3
36
1
ledger
swayze adjusted
Actor #8 Vs. Actor #9 (adjusted)
0
0.5
1
1.5
2
2.5
3
1
10
19
28
37
46
55
64
73
82
91
10
0
10
9
11
8
12
7
13
6
14
5
15
4
16
3
17
2
18
1
19
0
19
9
20
8
21
7
22
6
23
5
24
4
25
3
26
2
27
1
28
0
28
9
29
8
30
7
31
6
32
5
33
4
34
3
35
2
36
1
ledger log
swayze adjusted log
The same data on Google trends
-10 days to +40 days (log)
0
0.5
1
1.5
2
2.5
3
1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950
coburn log
peck log
brando log
davis log
palance log
goulet log
ledger log
swayze log
Other things I should consider
Age at death
Cause of death
Were they still acting?
Future directions
New sample of Wikipedia pages
Need to compare more contemporary pages
Need new metrics for comparison
Better workflows
Thanks!
Questions?
http://www.slideshare.net/mlincol2/informetrics
top related