data and text mining: the search for unknown knowns
DESCRIPTION
Data and text mining: the search for unknown knowns. Geoffrey Bilder UKSG, 2007 [email protected]. - PowerPoint PPT PresentationTRANSCRIPT
-
Data and text mining: the search for unknown knownsGeoffrey BilderUKSG, [email protected]
-
"Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know."
-
The Mining Metaphor
-
Gold Mining
-
Diamond Mining
-
Data Mining
-
Data Mining- What it isnt
-
Information Retrieval
-
Information Extraction
-
Information Analysis
-
++InformationRetrievalInformationExtractionInformationAnalysis
-
Data Miningnew, previously unknown information
-
And so what is text data mining?
-
Text Mining
-
++InformationRetrievalInformationExtractionInformationAnalysis
-
Crucial question for publishers is: If hiding information in unstructured text is a problem- then shouldnt we be exploring new ways to publish?
-
So how did we get here?
-
The word tobacco originates from the Taino indians.There is no I in the word Team.The book captured the zeitgeist of the time.I am sure that I turned the gas off.
-
Semantic Web Light
-
But we can do more...
-
The web as a database
-
The Relational Model
TitleAuthorISBN-13PublisherLabyrinthsJorge Luis Borges978-0811200127New DirectionsHopscotchJulio Cortazar978-0394752846PantheonThe AlephJorge Luis Borges978-0140286809Penguin............
-
Rows represent things
TitleAuthorISBN-13PublisherLabyrinthsJorge Luis Borges978-0811200127New DirectionsHopscotchJulio Cortazar978-0394752846PantheonThe AlephJorge Luis Borges978-0140286809Penguin............
-
Columns are properties
TitleAuthorISBN-13PublisherLabyrinthsJorge Luis Borges978-0811200127New DirectionsHopscotchJulio Cortazar978-0394752846PantheonThe AlephJorge Luis Borges978-0140286809Penguin............
-
The things propertyThe book has an author Jorge Luis BorgesSubjectPredicateObject
TitleAuthorISBN-13PublisherLabyrinthsJorge Luis Borges978-0811200127New DirectionsHopscotchJulio Cortazar978-0394752846PantheonThe AlephJorge Luis Borges978-0140286809Penguin............
-
The book has an author Jorge Luis BorgesSubjectPredicateObject
-
http://www.amazon.com/isbn/978-0140286809has an author http://www.wikipedia.com/borges
-
Journal AJournal BWikiBlogPersonal WebsiteOPAC
-
Journal AJournal BWikiBlogPersonal WebsiteOPAC
-
PREFIX rdf: PREFIX foaf: SELECT DISTINCT ?nameWHERE { ?x rdf:type foaf:Person . ?x foaf:name ?name}ORDER BY ?namehttp://api.ingentaconnect.com/content/cabi/nrr/latest?format=rss
-
The Early Modern Internet
-
Data Mining = With the goal of discovering new, previously unknown informationInformation retrieval +Information extraction +Information analysis...
-
Data Mining = Text Data Mining = With the goal of discovering new, previously unknown informationComplex data extraction layer +data miningInformation retrieval +Information extraction +Information analysis...
-
Why do we publish text?
-
Thank [email protected]
Standby SlideText Mining vs Data MiningAssumption that text and data have to be two separate things.???The OTMI repository (on http://www.nature.com/) currently hosts 2 years (2005, 2006) worth of content for 5 journals:
* Nature (nature) * Nature Genetics (ng) * Nature Reviews Drug Discovery (nrd) * Nature Structural SKOS=Simple Knowledge Organisation SystemsEarly modern period about 342 years from Guttenberg to French RevolutionA span of 365 years form Guttenberg to steam press
Elsevier (~1580)OUP (~1586)Incunabula considered to end ~ 1501