trove crowdsourcing behaviour paul hagonusers corrections 23,000 68,000,000 monday, 4 march 13...
TRANSCRIPT
![Page 1: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/1.jpg)
Trove crowdsourcing behaviour Paul Hagon@paulhagon
@TroveAustralia
Monday, 4 March 13
![Page 2: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/2.jpg)
Crowdsourcing profile
Look beyond the numbers
Monday, 4 March 13
2 things I want you to take away today.
![Page 3: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/3.jpg)
2,328,207
www.nla.gov.au
Monday, 4 March 13
Visits to www.nla.gov.au in 2011-2012. Is this a lot? Is it not very much. You don’t know.
![Page 4: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/4.jpg)
67%Monday, 4 March 13
How does your perception of that number change when I say that 67% of our visitors spent less than 30 seconds on our site?
![Page 5: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/5.jpg)
27%
14%
12%
9%
8%
8%
5%
4% 3.5% 3%
2.2%
1.6%
1.5%
1.4%
0.9%
0.7%
0.7%
1%1%
0.6%0.5%
0.6%
0.3%
Monday, 4 March 13
Does it change further when we measure where people click & you start to get another idea of how your site is used. Doing studies like this lead to the recent redesign of our website.
![Page 6: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/6.jpg)
Monday, 4 March 13
Hive from National Archives of Australia - http://transcribe.naa.gov.au
![Page 7: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/7.jpg)
Monday, 4 March 13
What’s on the menu from the New York Public Library, where users can transcribe menus - http://menus.nypl.org
![Page 8: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/8.jpg)
Monday, 4 March 13
British library georeference, where users can place maps over Google Earth to give coordinates - http://www.bl.uk/maps/
![Page 9: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/9.jpg)
Monday, 4 March 13
Flickr Commons where institutions can upload photos for people to add comments & tags. - http://www.flickr.com/commons
![Page 10: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/10.jpg)
Monday, 4 March 13
And Trove from the National Library of Australia with their newspaper corrections. - http://trove.nla.gov.au/newspaper
![Page 11: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/11.jpg)
Search
Monday, 4 March 13
A bit of background. When digitising text, if it just goes up, then it’s the equivalent of a browse interface of the physical object. In the case of newspapers you need to know the title, the date, the page etc. Not a good experience. So we need to search...
![Page 12: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/12.jpg)
Search
OCR
Monday, 4 March 13
Search needs text so in this case we need to apply OCR over the digitised text. But OCR isn’t perfect so.
![Page 13: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/13.jpg)
Search
OCRCorrection
Monday, 4 March 13
We can improve the OCR by adding in human correction - crowdsourcing.
![Page 14: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/14.jpg)
Search
OCRCorrection
Monday, 4 March 13
In turn, this improves search
![Page 15: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/15.jpg)
0
400,000
800,000
1,200,000
1,600,000
1803 1819 1835 1851 1867 1883 1899 1915 1931 1947 1963 1979
OCR corrections Articles digitised
Monday, 4 March 13
OCR correction levels. There’s a relatively high OCR correction rate for articles. Human correction is the “icing on the cake”
![Page 16: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/16.jpg)
316 million resources60,000+ unique visitors per day10+% visits from mobile78 million newspaper lines corrected1.7 million tags added75,000 registered users46,000 comments29,000 Trove lists
Monday, 4 March 13
These are a little out of date but they are the sorts of stats that we typically report on for things like the annual report. Fine for overall figures, but not really good at telling us exactly how users are using our resources and how we can improve our services based upon this. I’m really interested in the newspaper corrections.
![Page 17: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/17.jpg)
$12 million
Monday, 4 March 13
We’ve estimated that if we had to employ staff it would have cost in the vicinity of $12 million.Massive benefit to the Library & to the community.
![Page 18: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/18.jpg)
2%1%
45%
6%
24%
22%
Archived websites Australian newspapersBooks Diaries, letters, archivesJournal articles ListsMaps Music, sound & videoPeople & organisations Pictures & photos
8%1%
3%
85%
Work count UsageMonday, 4 March 13
This shows what Trove is made up of. Journals, Archived websites & Newspapers are the resources with the most content.
This shows what is being used. 85% of Trove use is from newspapers. It starts to give us an indication of where we can focus time, energy & resources.
I’m really interested in newspapers & the activity surrounding that.
![Page 19: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/19.jpg)
There’s more to Trove than newspaper corrections
Monday, 4 March 13
One thing to keep in the back of your minds is there’s more to Trove than text corrections. Say after me....
![Page 20: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/20.jpg)
15%
85%
Registered users Anonymous users
Monday, 4 March 13
One thing to focus on is newspapers & one of the appeals of newspapers is the correction. 85% of text corrections have been made by users that have created an account on Trove and are logged in. This is a commitment & an indication of having a relationship with the Library.
![Page 21: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/21.jpg)
0
250,000
500,000
750,000
1,000,000
1,250,000
1,500,000
Number of corrected lines
Top of leaderboard Bottom of leaderboard
Monday, 4 March 13
We have a leaderboard with 23,000 users that have made corrections. It’s not quite gamification and other studies have shown that competitiveness isn’t a main motivation for corrections. There’s a very small amount that have done a lot of corrections & then a super long tail of lots of users that have made very few corrections.
![Page 22: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/22.jpg)
50% users < 100 corrections
75% users < 500 corrections
0.01% users > 1 million corrections
Monday, 4 March 13
Now less than 100 lines of corrections isn’t a small amount of correction. The big numbers that you most commonly hear are being done by a very small amount of users.
![Page 23: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/23.jpg)
Users
Corrections
23,000
68,000,000
Monday, 4 March 13
Let’s look at it in a different way. We can’t track behaviour of non logged in usersApprox 23,000 logged in users have made 68,000,000 corrections
![Page 24: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/24.jpg)
Users
Corrections
43%
100
Monday, 4 March 13
100 users have made 43% of corrections
![Page 25: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/25.jpg)
Users
Corrections
43% 38%
1000100
Monday, 4 March 13
The top 1000 users have made 81% of all corrections
![Page 26: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/26.jpg)
Users
Corrections
5000
43% 38% 15%
1000100
Monday, 4 March 13
The top 5000 users have made 96% of all corrections
![Page 27: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/27.jpg)
Top 100 users extremely important
Top 1000 users very important
Monday, 4 March 13
So we’re starting to see the top users are extremely important to the corrections program.
![Page 28: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/28.jpg)
Monday, 4 March 13
How do these patterns compare across other crowdsourcing activities? Hive from National Archives of Australia
![Page 29: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/29.jpg)
0
300,000
600,000
900,000
1,200,000
Correction activity at Hive
Top of leaderboard Bottom of leaderboard
Monday, 4 March 13
Hive from National Archives. Much smaller numbers at 448 users, but the usage patterns are nearly identical.
![Page 30: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/30.jpg)
Monday, 4 March 13
Lets look at the 6 Australian institutions that participate in Flickr Commons. Crowdsourcing their photographs using tags & comments.
![Page 31: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/31.jpg)
0
1500
3000
4500
6000
Number of tags per user
Top of leaderboard Bottom of leaderboard
Monday, 4 March 13
Flickr tags. Takes the same shape.
![Page 32: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/32.jpg)
Users
Tags
62% 27%
10010
Monday, 4 March 13
Approx 1,005 users have added 31,026 tags
The top 100 users have added 89% of all tags
![Page 33: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/33.jpg)
0
400
800
1200
1600
Number of comments per user
Top of leaderboard Bottom of leaderboard
Monday, 4 March 13
Flickr comments per user. Once again we start to see an indentical pattern.
![Page 34: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/34.jpg)
Users
Comments
20% 10%
10020
17%
1000
Monday, 4 March 13
Approx 12,753 users have added 26,173 comments
The top 1000 users have made 47% of all comments
![Page 35: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/35.jpg)
0
250,000
500,000
750,000
1,000,000
1,250,000
1,500,000
Number of corrected lines
Top of leaderboard Bottom of leaderboard
Monday, 4 March 13
Given the user behaviour, how can we encourage someone from the top 1000 to keep at it to reach the top 100. It’s a massive difference in the amount of corrections needed. To get there, you need to give up work and start text correcting full time.
![Page 36: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/36.jpg)
Number of times user edited the same article
Recency
OCR accuracy
Monday, 4 March 13
Could this be ranked not just by the number of corrections but also incorporating how often the user returns, how efficient they are at correcting (not returning to the same article) or by how “difficult” the article might be (as a measure of the initial OCR accurancy). Let’s look at a few options.
![Page 37: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/37.jpg)
http://wraggelabs.com/shed/presentations/nla/pages/years_accuracy.html
Monday, 4 March 13
Could we rank on accuracy or difficulty of article? We have an approximate OCR accuracy rate and we know the exact amount of characters corrected.
![Page 38: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/38.jpg)
01234567
8-1415-3031-60
61-120121-364
365+
0% 10% 20% 30% 40% 50%
Days since last correction - top 100
Top 100
Monday, 4 March 13
How often do users make corrections. Using same recency patterns as Google Analytics. Over 40% of the top 100 users return on a daily basis.
![Page 39: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/39.jpg)
01234567
8-1415-3031-60
61-120121-364
365+
0% 10% 20% 30% 40% 50%
Days since last correction - top 1000
Top 1000
Monday, 4 March 13
The top 1000 aren’t quite as dedicated. There’s a decrease in the immediate recency & an increase in the long term return rates.
![Page 40: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/40.jpg)
01234567
8-1415-3031-60
61-120121-364
365+
0% 10% 20% 30% 40% 50%
Days since last correction - overall
Overall
Monday, 4 March 13
Looking at the pattern for the overall registered user base, 7% of users have made corrections within the past week. Nearly 70% of users haven’t made corrections in the previous 6 months & for nearly 45% of users it’s been more than 12 months since they last made a correction.
![Page 41: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/41.jpg)
01234567
8-1415-3031-60
61-120121-364
365+
0% 10% 20% 30% 40% 50%
Days since last correction
Top 100 Top 1000 Overall
Monday, 4 March 13
So the behaviour for the top 100 users is the opposite to the general behaviour patterns.
![Page 42: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/42.jpg)
48,822http://www.flickr.com/photos/denial_land/4183422564/
Monday, 4 March 13
To give a bit of an idea of the patterns, 48,822 lines of correction on Christmas Day. Given that an average day will see in the vicinity of 120,000 corrections, it’s quite amazing.
![Page 43: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/43.jpg)
0
400
800
1,200
1,600
2,000
Days between first correction & last correction (lifespan)
Monday, 4 March 13
Is there a burnout time, when people have enough of text correction? First time a user made a correction & the last time a visitor made a correction. There isn’t really a burnout.
![Page 44: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/44.jpg)
0
400,000
800,000
1,200,000
1,600,000
1803 1819 1835 1851 1867 1883 1899 1915 1931 1947 1963 1979
Articles with corrections OCR corrections Articles digitised
Monday, 4 March 13
There doesn’t appear to be any specific time periods that people are targeting (eg: First World War etc).
![Page 45: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/45.jpg)
1%10%
75%
14%
Article types
Advertising ArticleDetailed lists, results, guides Family NoticesLiterature Other
15%
2%
72%
10%
Corrected article types
Monday, 4 March 13
Each article is classified according to the type of article it is: an artcile, advertisement, births deaths marriages etc. Trove newspapers are mostly articles. Not many articles that are Family Notices.
Once we look at what type of articles are being corrected, there’s some definite activity around family notices.
![Page 46: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/46.jpg)
Advertising
Article
Detailed lists results guides
Family Notices
Literature
Other
0 20 40 60 80 100
% of article types corrected
Monday, 4 March 13
If we look at it a bit differently. As a percentage of the total article types, nearly 64% of the family notices have had some level of correction.
![Page 47: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/47.jpg)
The future?
Monday, 4 March 13
How can we use this information to dictate the future direction of Trove newspapers?
![Page 48: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/48.jpg)
0
20,000,000
40,000,000
60,000,000
80,000,000
2008-06 2009-02 2009-10 2010-06 2011-02 2011-10 2012-06
Number of articles
Monday, 4 March 13
We keep adding articles to Trove. This isn’t going to stop. It’s increasing in a linear fashion.
![Page 49: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/49.jpg)
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
2008-06 2009-02 2009-10 2010-06 2011-02 2011-10 2012-06
Number of corrections per month
Monday, 4 March 13
The number of corrections that are happening each month isn’t increasing at the same rate.
![Page 50: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/50.jpg)
0
1,000
2,000
3,000
4,000
2008-06 2009-02 2009-10 2010-06 2011-02 2011-10 2012-06
Number of users making corrections per month
Monday, 4 March 13
Likewise the number of users making corrections isn’t increasing in a linear fashion. Are we reaching a plateau in what our existing users are capable of doing?
![Page 51: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/51.jpg)
0
750,000
1,500,000
2,250,000
3,000,000
2008-06 2009-04 2010-02 2010-12 2011-10 2012-080
20,000,000
40,000,000
60,000,000
80,000,000
Number of corrections Number of articles
?
Monday, 4 March 13
Let’s get back to the situation we faced at the start of the project. What’s going to happen over the next couple of years into the future? If we keep on putting more & more pages up - what happens when our correctors can’t keep up?
![Page 52: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/52.jpg)
Search
OCRCorrection
Monday, 4 March 13
If articles keep getting added & the corresponding number of users aren’t joining or correcting, will search slowly become less effective?
![Page 53: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/53.jpg)
Search
OCRCorrection
Monday, 4 March 13
Do we need to improve OCR through automated terms. Improvements in OCR technology, general text pattern analysis.
![Page 54: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/54.jpg)
Search
OCRCorrection
Monday, 4 March 13
Or improve manual corrections through marketing, promotion, incentives Do we need to change our API to allow write access so machines could programatically correct text?
![Page 55: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/55.jpg)
http://wraggelabs.com/shed/presentations/nla/pages/years_accuracy.html
Monday, 4 March 13
Do we redesign the interface to highlight articles that have a low correction level? For instance Do we concentrate on years around 1880 or 1930 and not so much the years surrounding the First World War?
![Page 56: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/56.jpg)
https://twitter.com/paulhagon/status/122846665722957826
Monday, 4 March 13
How can we get our passionate users doing high value tasks? Would other crowdsourcing activities like geo-spatial references be more valuable. Could they be set up doing specific tasks on uncatalogued material?
![Page 57: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/57.jpg)
We have:
Great content
Passionate users
Family History
Monday, 4 March 13
We have: Passionate users who want to help us. We have niche interest groups like Family History. Getting all of these factors to align with our strategic directions.
![Page 58: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/58.jpg)
78 million newspaper lines corrected
Monday, 4 March 13
Do you look at it the same way as you did 20 minutes ago?
![Page 59: Trove crowdsourcing behaviour Paul HagonUsers Corrections 23,000 68,000,000 Monday, 4 March 13 Let’s look at it in a di!erent way. We can’t track behaviour of non logged in users](https://reader034.vdocuments.mx/reader034/viewer/2022042311/5ed8cc3f6714ca7f4768958b/html5/thumbnails/59.jpg)
@paulhagon
@TroveAustralia
Monday, 4 March 13