uses of library collections

48
Context and collections Ben O’Steen, British Library Labs @benosteen

Upload: benosteen

Post on 12-Apr-2017

132 views

Category:

Education


0 download

TRANSCRIPT

Context and collectionsBen O’Steen, British Library Labs

@benosteen

British Library Labs

Getting to the heart of it

British Library Labs works with researchers on their specific problems, trying to assess how widely this problem is felt.

With their help, we talk to communities of researchers and try to pinpoint what they need as opposed to what they think they need to ask us.

One theme keeps appearing:

All projects to date would’ve been made incredibly easier if all “items” were accessible and citable (in a way that a computer can follow).

Impact?

Hard to measure but:- 13-20 million hits on average every month,

over 500,000,000 hits to date.- Over 450,000 tags added by volunteers and

machine algorithms.- Iterative crowdsourcing is key to making

the collection more useful to more people.

Iterative crowdsourcing?(The term is borrowed from Mia Ridge.)

1. Crowdsource broad facts and subcollections of related items emerge.

2. No 'one-size-fits-all': Subcollections allow for more focussed curation.

GOTO 1

Georeferencing - http://bl.uk/maps

Presentation shapes perception

“On The Road”, Jack Kerouac

(via http://www.openculture.com/2007/08/on_the_road_the_original_scroll.html)

David Normalhttp://www.davidnormal.com/

Burning Man Festival

David Normal created light boxes around theBurning man, using the British Library’s Flickr Images

“Crossroads of Curiosity” (20th June -> November, 2015)

But how can anyone find anything useful?

John Cooper, https://www.flickr.com/photos/atomicshed/2436324958 CC-BY-NC-ND 2.0

Infancy of understanding

Large-scale analysis of text is evolving but young.

Exasperating situation where ‘black boxes’ of algorithms are used to draw conclusions.

http://www.scottbot.net/HIAL/?p=41271

“Black Boxes”:a misnomer

It is legitimate and useful to use code that you could not write.

It is not legitimate to simply believe the ‘label’ on the side of the box.

E.g. “Sentiment Analysis” is often nothing of the sort.

Quoting Scott Weingart: (emphasis mine)

● Do sentiment analysis algorithms agree with one another enough to be considered

valid?

● Do sentiment analysis results agree with humans performing the same task

enough to be considered valid?

● Is Jockers’ instantiation of aggregate sentiment analysis validly measuring

anything besides random fluctuations?

● Is aggregate sentiment analysis, by human or machine, a valid method for revealing

plot arcs?

● If aggregate sentiment analysis finds common but distinct patterns and they don’t seem to

map onto plot arcs, can they still be valid measurements of anything at all?

● Can a subjective concept, whether measured by people or machines, actually be

considered invalid or valid?

(again from http://www.scottbot.net/HIAL/?p=41271)

“I am interested in travel accounts in Europe during the

19th Century”

2013 Competition winnershttp://labs.bl.uk/Ideas+for+Labs

Pieter Francois

Bias in digitisation

The tool was made to give a statistically valid sample.

Due to the paltry amount digitised, it showed how skewed the digital corpus is, compared to the overall holdings.

Allen B. Riddell in “Where are the novels?”* estimates that using HathiTrust’s corpus:

“... about 58%—somewhere between 47% and 68%—of the 2,903 novels [all publications in English between 1800 and 1836] have publicly accessible scans.”

* (2012) https://ariddell.org/where-are-the-novels.html

In Summary:

- Context about how an digitised image came to be and why it was scanned is both crucial to understand and sometimes crucial to hide.

- aka Opening up large collections brings its own issues.

- Presentation shapes perception.- Too much trust in black boxes algorithms, like search

engines or social feed suggestions.- So little of our history is online that there is a natural

bias. The gaps are being filled in with less credible sources.

- It still might have happened even if you cannot google it, and vice versa!