1/26 why-a-graph why-a-graph a presentation at the university of bratislava bertil thorslund ...
Post on 20-Dec-2015
215 views
TRANSCRIPT
1/26
Why-a-grapha presentation at the
University of Bratislava
Bertil Thorslund
www.datora.se
+46-705-49 45 17
2/26
Who am I?
• Educated in sociology rather than statistics
• Fascinated by figures and data
• Hot on making information available,thus Internet
• Newly retired after 15 years in Social Insurance
3/26
4/26
5/26
What data?Swedish Stock ExchangeSwedish sickness insurance
What software?MS Office
ExcelPowerPoint
6/26
Graphs vs Diagrams
01 000 0002 000 0003 000 0004 000 0005 000 0006 000 0007 000 0008 000 0009 000 000
10 000 000
1998
01
1998
07
1999
01
1999
07
2000
01
2000
07
2001
01
2001
07
2002
01
2002
07
2003
01
2003
07
2004
01
2004
07
2005
01
2005
07
2006
01
2006
07
2007
01
2007
07
2008
01
2008
07
2009
01
2009
07
2010
01
2010
07
Social insurance
Sickness insurance Pensions Family benefits
7/26
Two branches in statistics
Estimation of error Knowing it all
Gallup polls Number ofsicklisted
8/26
Estimation of error
9/26
Estimation of error
March
April
95 %
95 %Difference? Yes
10/26
Estimation of error
March
April
95 %
95 %Difference? No
11/26
An example of a linear (or curve) graph
This was yesterday NYSE opens at 15:30
Describe the developments during the day (Nov 16th 2010)
12/26
What would the price of H&M shares be if developments would have been likestock exchange in general? When was the largest amount of shares bought and sold?
13/26
Sicklisting, netdays payed during a 12-month periodSweden
0
20 000 000
40 000 000
60 000 000
80 000 000
100 000 000
120 000 000
2001
-12
2002
-06
2002
-12
2003
-06
2003
-12
2004
-06
2004
-12
2005
-06
2005
-12
2006
-06
2006
-12
2007
-06
2007
-12
2008
-06
2008
-12
2009
-06
2009
-12
2010
-06
2010
-12
2011
-06
2011
-12
net
day
s
Payed to women
59,6%
Data until July 2010
14/26
Measurements need to be
• Reliablecan be repeated
• Validwhat you intendedrelevant
15/26
Choosing the best graph type
Bar graph or histogram
0
20 000 000
40 000 000
60 000 000
80 000 000
100 000 000
120 000 000
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
16/26
0
20 000 000
40 000 000
60 000 000
80 000 000
100 000 000
120 000 000
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Net days paid to sicklisted persons
Legends, everything to identify what is shown
Values are ’jumping’from one bar to the next
17/26
Stacked bar graph
0
20 000 000
40 000 000
60 000 000
80 000 000
100 000 000
120 000 000
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Men
Women
Net days paid to sicklisted persons
18/26
0
10 000 000
20 000 000
30 000 000
40 000 000
50 000 000
60 000 000
70 000 000
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Women
Men
Grouped bar graphLegends important
Net days paid to sicklisted persons
19/26
Net days paid to sicklisted persons
What are the changes observed?
Increase in the late 90-tiesPeaking in 2002Persistent decrease after 2002Same pattern for men and women
But there is possibly a change that is more visible in another variation of a bar graph
0
10 000 000
20 000 000
30 000 000
40 000 000
50 000 000
60 000 000
70 000 000
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Women
Men
20/26
0%
20%
40%
60%
80%
100%
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Men
Women
Net days paid to sicklisted persons
Bar graph with fractions
All through this increase – decrease the proportion men/womenremains unchangedAnd what is it that you see in this graph?
21/26
Popular graph in presentation of survey dataany other suggestion for valid data?
0% 20% 40% 60% 80% 100%
My work tasks
My boss
My wage development
My career possibilities
not so contended
neutral
very contended
22/26
Net days paid to sicklisted persons
Bar graph – horizontal axis other than time
0
5 000 000
10 000 000
15 000 000
20 000 000
25 000 000
30 000 000
35 000 000
40 000 000
Region
1
Region
2
Region
3
Region
4
Region
5
Region
6
Region
7
Region
8
Region
9
Region
10
Region
11
Region
12
Region
13
Region
14
Region
15
Region
16
Region
17
Region
18
Region
19
Region
20
Region
21
You only see that the regions differ in size
23/26
Net days paid to sicklisted persons
Bar graph – horizontal axis other than timeMake sure values are ’comparable’ indexes/percentagesPresented here is the value for 2009 as a percentage of the value 2002 (the peak year)
0%
10%
20%
30%
40%
50%
60%
Region
1
Region
2
Region
3
Region
4
Region
5
Region
6
Region
7
Region
8
Region
9
Region
10
Region
11
Region
12
Region
13
Region
14
Region
15
Region
16
Region
17
Region
18
Region
19
Region
20
Region
21
What region has the least decrease. Which have the biggest?
24/26
Net days paid to sicklisted persons
01 000 0002 000 0003 000 0004 000 0005 000 0006 000 0007 000 0008 000 0009 000 000
10 000 000
1998
01
1998
07
1999
01
1999
07
2000
01
2000
07
2001
01
2001
07
2002
01
2002
07
2003
01
2003
07
2004
01
2004
07
2005
01
2005
07
2006
01
2006
07
2007
01
2007
07
2008
01
2008
07
2009
01
2009
07
2010
01
2010
07
Monthly data -> smaller ’jumps’ from one reading to the next -> line graph
But, in this case, new information isn’t much.
25/26
Net days YearMonth 2006 2007 2008 2009 201001 5 644 695 5 239 557 4 496 565 3 672 634 2 802 97302 5 584 749 4 992 476 4 266 754 3 474 681 2 641 49503 5 507 037 4 826 415 4 018 454 3 386 635 2 577 02504 5 430 565 4 922 322 4 396 018 3 434 322 2 803 82205 5 595 094 4 869 841 4 068 182 3 197 541 2 787 85006 5 537 153 4 774 286 3 967 066 3 355 602 2 827 56707 5 169 002 4 636 689 3 733 712 3 114 203 2 701 93608 5 285 729 4 715 844 3 513 358 2 982 793 2 663 31309 5 055 910 4 408 490 3 661 944 2 968 837 2 787 67910 5 171 331 4 475 319 3 756 819 3 044 56411 5 376 693 4 497 882 3 647 082 3 091 71812 5 171 803 4 254 215 3 671 000 3 131 939
What can you make out from those data?
26/26
Net days paid to sicklisted persons
0
1 000 000
2 000 000
3 000 000
4 000 000
5 000 000
6 000 000
Octobe
r
Novem
ber
Decem
ber
Janu
ary
Febru
ary
Mar
chApr
ilM
ayJu
ne July
Augus
t
Septe
meb
r
2006/2007
2007/2008
2008/2009
2009/2010In this decrease every monthhas a lower value than last year
So, what is it that you see in this graph
27/26
Net days YearMonth 2006 2007 2008 2009 201001 5 644 695 5 239 557 4 496 565 3 672 634 2 802 97302 5 584 749 4 992 476 4 266 754 3 474 681 2 641 49503 5 507 037 4 826 415 4 018 454 3 386 635 2 577 02504 5 430 565 4 922 322 4 396 018 3 434 322 2 803 82205 5 595 094 4 869 841 4 068 182 3 197 541 2 787 85006 5 537 153 4 774 286 3 967 066 3 355 602 2 827 56707 5 169 002 4 636 689 3 733 712 3 114 203 2 701 93608 5 285 729 4 715 844 3 513 358 2 982 793 2 663 31309 5 055 910 4 408 490 3 661 944 2 968 837 2 787 67910 5 171 331 4 475 319 3 756 819 3 044 56411 5 376 693 4 497 882 3 647 082 3 091 71812 5 171 803 4 254 215 3 671 000 3 131 939
The message is much more obvious in the graph, right?
28/26
RehabilitationSickness cash benefitInvalidity pension
An illhealth measure – distribution by benefit
Make an estimation ofwhat percentage arethose three ingredients
Think of it as minutes.Example: 12 minutes wouldbe 12/60 = 20 %
30/26
The area of the circle representsthe number of people.
If the radius is increased by 10 percenthow much bigger is the circle?
If the number of people is increased by 30 percent how much bigger should the radius be?
31/26
0,0%
25,0%
50,0%
75,0%
100,0%
125,0%
150,0%1-14 days
15-28 days
29-59 days
60-89 days
90-179 days
180-364 days
1-2 yrs
2-3 yrs
3-4 yrs
4-5 yrs
5-6 yrs
6+ yrs
Spiderweb graph
Many variables (measurements) at the same time
Intuitive interpretation but rather demanding
32/26
Nomogram – a very special kind of graph
33/26
And now onto new ideas of what you can do with graphs.
Admire the ideas of prof. Hans Rosling and the software created by his son and daughter-in-law.
www.gapminder.org