linked open communism - c4l13

Download Linked Open Communism - c4l13

If you can't read please download the document

Upload: charper

Post on 16-Apr-2017

3.653 views

Category:

Documents


0 download

TRANSCRIPT

Linked Data

Linked Open Communism:

Presented at code4lib2013
by Corey A Harper
2013-02-13Better discovery through data dis- and re-aggregation

How I learned to shut about about linked dataAND BUILD SOMETHING!!

--- or ---

Linked Data

Metadata as a Graph

Typed things, named by URIs

The relationships between those things, also built on URIs

Ease of integration *across* data sources merging graphs

Refine

ViewShare

Context

NarrativeStory tellingContext

The archives story,

The library's story,

but also

Users stories

Adding context through recombinant metadata

Backing Away from Evangelism...

Image NOT used by permission.Probably a violation of several copyrights & trademarks.

Aside on metaphors

Image by Jonestown Institute via Wikimedia Commonshttp://en.wikipedia.org/wiki/File:Jonestown_entrance.jpg

Aside on metaphors

Image by Joe Mabel via Wikimedia Commons.http://en.wikipedia.org/wiki/File:Furthur_05.jpg

Premise

Context is so central

And yet our Controlled VocabsAre nearly gone




Because the interfaces to them
were broken

The Death of Browse

Next-Gen Discovery Systems don't make use of Authority Control

Browse was/is broken as a UI Design

Rich data in Authorities, disconnected from narrative, context, search

Richer Authority type data outside libraries...

Linked Data Based UI Design
For Boutique Collections

Public Domain image of Paulette Goddard
via Wikimedia Commons.http://en.wikipedia.org/wiki/File:Paulette_Goddard-publicity.JPG

A research leave

Initial Scope

Public Domain image via Wikimedia Commons.http://en.wikipedia.org/wiki/File:Symbol-hammer-and-sickle.svg

Linked Open Communism

Dis-aggregate EAD records into Collections & Components

Create a broad set of resource types

Extract key entities from EADPeople, Places, Topics, Corporate Bodies

Incorporate additional data about entites

Put this in Blacklight

Load MARC & other data

Technology Stack - UI

Vanilla BlacklightMinor SOLR Index Tweaks / Additions

Minor View Hacks

pre-betaOnly on localhost right now

Technology Stack Support Tools

Gadget!

Technology Stack - Backend

Python & RDFLib

4Store & HTTP4Store

Sunburnt

FuzzyWuzzy

(Lots of other Python modules....)

FuzzyWuzzy & SeatGeek!

Fuzzy Wuzzy Awesome Library from SeatGeekhttps://github.com/seatgeek/fuzzywuzzyhttp://seatgeek.com/blog/dev/fuzzywuzzy-fuzzy-string-matching-in-python

Data Flow

Object Oriented Python

Classes: Collections, Components, Entities

Class methodsmakeGraph

makeSolr

to4store

output (turtle, rdf/xml, etc)

Performance Benchmarks

EAD -> SOLR:~26 hrs to parse 1600 EAD, push 385k records to SOLR

DBPedia matchingX-ref label varients for entities against 9.4 million DBPedia labels (labels-en.ttl).

Should be using Hadoop

Other ideas?

Re-solr-izing entities: ~10 minutesPulls local copy of dbpedia data from 4store

4Store

Provenance-ishNaming of sub-graphs

Default context is everything

First EAD cut produced ~4m triples

Easy to delete whole graphs, or individ triples

SPARQL-able good for stats:992 DBPedia links for 6331 Entities

https://github.com/chrpr/ead2rdf2solr

Image by wallygrom via flickrhttp://www.flickr.com/photos/33037982@N04/3669790240/

Future Steps: Code to Incorporate

Components: Inheritance of accesspoints fuzzywuzzy string match to unittitle

matched about 10%

Extend to cross ead match via 4Store

VIAF, id.loc, fast reconciliation

Override configs for DBPedia matching

Germany. |t Treaties, etc. |g Soviet Union, |d 1939 Aug. 23.http://dbpedia.org/page/Treaty_of_Non-Aggression_between_Germany_and_the_Soviet_Union

Textile Workers' Strike, Gastonia, N.C., 1929.
http://dbpedia.org/page/Loray_Mill_Strike

DBPedia Override Examples

Further Development Next Steps

EAC-CPF reconciliation, record creation

Possibly relationship to Hydra?Annotation Interface, DBP Overrides

SOLR Relevancy Ranking

SOLR-Marc Modifications

Update mechanism

Test with other Datasets (NYPL/NYU/METRO project)

Thanks!

[email protected]@chrpr