access 2011: big data in libraries
TRANSCRIPT
BIG DATABIG DATA
"datasets that grow so large that they become
difficult to work with using relational
databases and within a tolerable elapsed time"
BIG DATA IS BIGBIG DATA IS BIG
LIKE, REALLY BIGLIKE, REALLY BIG
FACEBOOK: 140 BILLION PHOTOS
HUMAN GENOME: 3 BILLIONBASE PAIRS
GOOGLE: 50 BILLIONWEB PAGES
WORLDCAT: 1.5 BILLIONITEM RECORDS
NOT REALLYNOT REALLY
EUROPEANA: 20 MILLION(715K / COUNTRY)
LIBRARY OF CONGRESS:
1.9 MILLION
CANADIANA: 1 MILLION
LIBRARY AND ARCHIVES CANADA:
3.5 MILLION(ARCHIVAL DESCRIPTIONS)
BIG DATABIG DATAIS COMPLICATEDIS COMPLICATED
1966
1976
≠
≠
NOT REALLYNOT REALLY
ಠ_ಠ
SCALABILITYSCALABILITY
● ICA-AtoM (LAMP)
● BENCHMARK 3.5M RECORDS (current largest: < 100K)
● 100% OPEN SOURCE SOFTWARE
● COMMODITY HARDWARE
CAN WE DO IT?CAN WE DO IT?
WRITE SPEEDWRITE SPEED
READ SPEEDREAD SPEED
WRITE MEMORYWRITE MEMORY
READ MEMORYREAD MEMORY
NOSQL vs. SQLNOSQL vs. SQL(a.k.a. ODM vs. ORM)
● 4x - 10x FASTER
● 50% - 90% LESS MEMORY
RELATIONAL DATABASESSCALE WELL
IF YOUR DATAIS NOT HIERARCHICAL
SOLRSCALES WELL
IF YOU HAVE INFINITE RAM
BEWARE THEDOGMA OF SQL
NOSQL IS AVIABLE OPTION
THINK SIDEWAYS SCALE OUT →
THE CLOUD IS A LIETHE CLOUD IS A LIE
“big data is less about size, and more about
freedom”
open source tools+ distributed design= new opportunities