© tefko saracevic 1 information science: where does it come from and where is it going? tefko...
TRANSCRIPT
© Tefko Saracevic 1
Information Science: Where does it come from
and where is it going?Tefko Saracevic, PhDSchool of Communication, Information and Library StudiesRutgers UniversityNew Brunswick, New Jersey USA
http://www.scils.rutgers.edu/~tefko
Gutenberg1397-1468
© Tefko Saracevic 2
Information science: a short definition
“the collection, classification, storage, retrieval, and
dissemination of recorded knowledge treated both as a pure and as an
applied science”
Merriam-Webster
© Tefko Saracevic 3
Organization of presentation
1. Big picture – problems, solutions, social place2. Structure – main areas in research & practice3. Technology – information retrieval – largest part4. Information – representation; bibliometrics5. People – users, use, seeking, context6. Paradigm split – distancing of areas7. Relations – librarianship, computer science8. Digital libraries – whose are they anyhow?9. Conclusions – big questions for the future
© Tefko Saracevic 4
Part 1. The big picture
Problems addressed
Bit of history: Vannevar Bush (1945):
Defined problem as “... the massive task of making more accessible of a bewildering store of knowledge.”
Problem still with us & growing
1890-1974
© Tefko Saracevic 5
… solution
Bush suggested a machine: “Memex ... association of ideas ... duplicate mental processes artificially.”
Technological fix to problemStill with us: technological determinant
© Tefko Saracevic 6
At the base of information science:Problem
Trying to control content inInformation explosion
exponential growth of information artifacts, if not of information itself
PLUS todayCommunication explosion
exponential growth of means and ways by which information is communicated, transmitted, accesses, used
© Tefko Saracevic 7
technological solution, BUT …
applying technology to solving problems of effective use of information
BUT:from a
HUMAN & SOCIALand not only TECHNOLOGICAL perspective
© Tefko Saracevic 8
or a symbolic model
Information
Technology
People
© Tefko Saracevic 9
Problems & solutions: SOCIAL CONTEXT
Professional practice AND scientific inquiry related to:Effective communication of knowledge records - ‘literature’ - among humans in the context of social, organizational, & individual need for and use of information
Taking advantage of modern information technology
© Tefko Saracevic 10
or as White & McCaine (1998) put it:
“modeling the world of publications with a practical goal of being able to deliver their content to inquirers [users] on demand.”
© Tefko Saracevic 11
General characteristics
Interdisciplinarity - relations with a number of fields, some more or less predominant
Technological imperative - driving force, as in many modern fields
Information society - social context and role in evolution - shared with many fields
Table of content
© Tefko Saracevic 12
Part 2. Structure
Composition of the field
As many fields, information science has different areas of concentration & specialization
They change, evolve over time grow closer, grow apart ignore each other, less or more sometimes fight
© Tefko Saracevic 13
most importantly different areas…
receive more or less in funding & emphasis producing great imbalances in work & progress
attracting different audiences & fields
this includes vastly different levels of support for research and
huge commercial investments & applications
© Tefko Saracevic 14
How to view structure?by decomposing areas & efforts in research & practice emphasizing
Technology
Information
or
People
or
Table of content
© Tefko Saracevic 15
Identified with information retrieval (IR) by far biggest effort and investment international & global commercial interest large & growing
Part 3.
Technology
© Tefko Saracevic 16
Information Retrieval – definition & objective
“ IR: ... intellectual aspects of description of information, ... search, ... & systems, machines...”
Calvin Mooers, 1951
How to provide users with relevant information effectively?
For that objective:1. How to organize information intellectually?2. How to specify the search & interaction intellectually?
3. What techniques & systems to use effectively?
1919-1994
© Tefko Saracevic 17
Streams in IR Res. & Dev.
1. Information science: Services, users, use; Human-computer interaction; Cognitive aspects
2. Computer science: Algorithms, techniques Systems aspects; evaluation
3. Information industry: Products, services, Web search engines – BIG! Market aspects
Problem: relative isolation – discussed later
© Tefko Saracevic 18
IR research
Started in the US through government support & in information science
Now mostly done within computer science e.g Special Interest Group on IR, Association for Computing Machinery (SIGIR,ACM) Gerard Salton
1927-1995
© Tefko Saracevic 19
Contemporary IR research
Spread globally e.g. major IR research communities emerged in China, Korea, Singapore
Branched outside of information science - “everybody does information retrieval”
search engines, data mining, natural language processing, artificial intelligence, computer graphics …
© Tefko Saracevic 20
Testing in IR
Major component of IR made it strong & affected innovation
Long history – started with Cranfield tests in late 1950’s
Measures – precision & recall based on relevance
Cyril Cleverdon 1914-1997
© Tefko Saracevic 21
Text REtrieval Conference (TREC)
Major research, laboratory effortStarted in 1992,
“support research within the IR community by providing the infrastructure necessary for large-scale evaluation”
Methods provides large test beds, queries, relevance judgments, comparative analyses
essentially using Cranfield 1960’s methodology organized around tracks
various topics – changing over years
© Tefko Saracevic 22
TREC impact
International – big impact on creating research communities
Annual conferences reports, exchange results, foster cooperation
Results mostly in reports, available at
http://trec.nist.gov/pubs.html overviews provided as well but, only a fraction published in journals Book (2005):
TREC: Experiment and Evaluation in Information RetrievalEdited by Ellen M. Voorhees and Donna K. Harman
© Tefko Saracevic 23
TREC tracks 2007116 groups from 20 countries
Genomics Spam Blog Question answering Enterprise Million query (new) Legal
Previous tracks: ad-hoc (1992-1999) routing (92–97) interactive (94-02) filtering (95-02) cross language (97-02) speech (97-00) Spanish (94-96) video (00-01) Chinese (96-97) query (98-00) and a few more run for
two years only
© Tefko Saracevic 24
Broadening of IR – sample ever changing, ever new areas added
Cross language IR (CLIR) Natural language processing (NLP IR) Music IR (MIR) Image, video, multimedia retrieval Spoken language retrieval IR for bioinformatics and genomics Summarization; text extraction Question answering Many human-computer interactions XML IR Web IR; Web search engines IR in context – big area for major search engines & newer research
© Tefko Saracevic 25
Commercial IR
Search engines based on IRBut added many elaborations & significant innovations dealing with HUGE number of pages fast countering spamming & page rank games – adversarial IR - combat of algorithms
adding context for searching Spread & impact worldwide
about 2000 engines in over 160 countries English was dominant, but not any more
© Tefko Saracevic 26
Commercial IR: brave new worldLarge investments & economic sector
hope for big profits, as yet questionable
Leading to proprietary, secret IR also aggressive hiring of best talent new commercial research centers in different countries (e.g. MS in China)
Academic research funding is changing brain drain from academe
Commercial search engines facing many challenges – hiring best talent and providing brain-drain for academics
© Tefko Saracevic 27
IR successfully effected:
Emergence & growth of the INFORMATION INDUSTRY
Evolution of IS as a PROFESSION & SCIENCE
Many APPLICATIONS in many fields including on the Web – search engines
Improvements in HUMAN - COMPUTER INTERACTION
Evolution of INTEDISCIPLINARITY
IR has a long, proud history
Table of content
© Tefko Saracevic 28
Part 4.
InformationSeveral areas of investigation;
as basic phenomenon – not much progress measures as Shannon's not successful concentrated on manifestations and effects no recent progress in this basic research
information representation large area connected with IR, librarianship metadata
bibliometrics structures of literature
© Tefko Saracevic 29
What is information?Intuitively well understood, but formally not well stated Several viewpoints, models emerged
Shannon: source-channel-destination signals not content – not really applicable, despite many tries
Cognitive: changes in cognitive structures content processing & effects
Social: context, situation information seeking, tasks
© Tefko Saracevic 30
Information in information science: Three senses (from narrowest to broadest)
1. Information in terms of decision involving little or no cognitive processing
signals, bits, straightforward data - e.g.. inf. theory (Shanon), economics,
2. Information involving cognitive processing & understanding
understanding, matching texts, Brookes3. Information also as related to context,
situation, problem-at-hand USERS, USE,TASK
For information science (including information retrieval):
third, broadest interpretation necessary
© Tefko Saracevic 31
Bibliometrics“… the quantitative treatment of the properties of
recorded discourse and behavior pertaining to it.” Fairthorne, 1969
Many quantitative studies & some laws Bradford’s law, Lotka’s law – regularities
quantity/yield distributions of journals, authors
also related areas: Scientometrics
covering science in general, not just publications
Infometrics all information objects
Webmetrics or cybermetrics using bibliometric techniques to study the web
Table of content
© Tefko Saracevic 32
Part 5.
People Professional services
in organization – moving toward knowledge management, competitive intelligence
in industry – vendors, aggregators, Internet,
Research user & use studies interaction studies broadening to information seeking studies, social context, collaboration
relevance studies social informatics
© Tefko Saracevic 33
User & use studies
Oldest area covers many topics, methods, orientations
many studies related to IR e.g. searching, multitasking, browsing, navigation
theoretical & experimental studies on relevance
Branching into Web use studies quantitative & qualitative studies emergence of webmetrics
© Tefko Saracevic 34
Interaction
Traditional IR model concentrates on matching but not on user side & interaction
Several interaction models suggested
Ingwersen’s cognitive, Belkin’s episode, Saracevic’s stratified model
hard to get experiments & confirmation Considered key to providing
basis for better design understanding of use of systems
Web interactions: a major new area
© Tefko Saracevic 35
Information seeking
Concentrates on broader context not only IR or interaction, people as they move in life & work
Number of models provided e.g. Kuhlthau’s information search process, Järvelin’s information seeking
Includes studies of ‘life in the round,’ making sense, information encountering, work life, information discovery
Based on concept of social construction of information
Table of content
© Tefko Saracevic 36
Part 6. Paradigm split in technology - people
Split from early 80’s to date into:
System-centered algorithms, TREC, search engines continue traditional IR model
Human-(user)-centered cognitive, situational, user studies interaction models, some started in TREC
relevance studies
© Tefko Saracevic 37
Human vs. system
Human (user) side: often highly critical, even one-sided mantra of implications for design but does not deliver concretely
System side: mostly ignores user side & studies ‘tell us what to do & we will’
Issue NOT H or S approach even less H vs. S but how can H AND S work together major challenge for the future
© Tefko Saracevic 38
Great separation
IR in computer science completely technology oriented
VERY international not aware at all of the other side
SIGIR growing a lot: 2007 subm. 490,
accept. 85, 17% 2006 subm. 399,
accept. 74, 19% 1999 subm. 135,
accept. 33, 24%
IR, user studies, services in information science mostly people oriented
aware, but participating less with other side
only a few LIS people come to SIGIR, even fewer SIGIR to ASIST, none to ALA
© Tefko Saracevic 39
Calls vs support
Many calls for user-centered or human-centered design, approaches & evaluation
Number of works discussing it, but few proposing concrete solutions
But: most support for system work in the digital age support is for digital
Recent attempt at combining two views:Book: Ingerwersen, P. and Järvelin, K. (2005). The
Turn: Integration of information seeking and retrieval in context. Springer.
Table of content
© Tefko Saracevic 40
Part 7. Relations, alliances, competition
With a number of fields...Strongest:
1. Librarianship
2. Computer science
© Tefko Saracevic 41
Common grounds
IS & librarianship share:Social role in information societyConcern with effective utilization of graphic & other types of records
Research problems related to a number of topics
Transfer to & from information retrieval
© Tefko Saracevic 42
Differences
IS & librarianship differ in:Selection & definition of many problems addressed
Theoretical questions & frameworkNature & degree of experimentation
Tools and approaches usedNature & strength of interdisciplinary relations
© Tefko Saracevic 43
One field or two? Point of many debates Suggest: TWO fields in strong interdisciplinary relations
Not a matter of “better” or “worse” - matters little common arguments between many fields
Differences matter in: problem selection & definition agenda, paradigms theory, methodology practical solutions, systems
Best example: IR & library automation
© Tefko Saracevic 44
Which?
Librarianship. Information scienceLibrary and information scienceLibraryandinformationscience
Michael Buckland’s suggestion
Information scienceInformation sciencesInformation
like in the “Information School”
© Tefko Saracevic 45
IS & computer science
CS primarily about algorithms IS primarily about information and its users and use
Not in competition, but complementary Growing number of computer scientists active in IS – particularly in IR and digital libraries
Concentrating on advanced IR algorithms & techniques digital library infrastructure & various domains
human computer interaction
© Tefko Saracevic 46
Interaction and ISTwo streams:
computer-human interaction human-computer interaction
Many studies on: machine aspects of interaction human variables in interaction
Problems: little feedback between very hard to evaluate
Web interactions: a major areaAnother interdisciplinary area
computers sc., cognitive sc., ergonomics,
Table of content
© Tefko Saracevic 47
Part 8. Digital libraries
LARGE & growing area“Hot” area in R&D
a number of large grants & projects in the US, European Union, & other countries
but “DIGITAL” big & “libraries“ small“Hot” area in practice
building digital collections, hybrid libraries,
many projects throughout the world but in the US funding drying out
© Tefko Saracevic 48
Technical problems
Substantial - larger & more complex than anticipated: representing, storing & retrieving of library objects
particularly if originally designed to be printed & then digitized
operationally managing large collections - issues of scale
dealing with diverse & distributed collections
interoperability; federated searching assuring preservation & persistence incorporating rights management
© Tefko Saracevic 49
Research issuesunderstanding objects in DL
representing in many formatsmetadata, cataloging, indexingconversion, digitizationorganizing large collectionsmanaging collections, scalingpreservation, archivinginteroperability, standardizationaccessing, using, searching
federated searching of distributed collections evaluation of digital libraries
© Tefko Saracevic 50
DL projects in practiceHeavily oriented toward institutions & their missions in libraries, but also others
museums, societies, government, commercial
come in many varieties
Spread globally including digitization
U California, Berkeley’s Libweb “lists over 7700 pages from libraries in over 145 countries”
Spending increasing significantly often a trade-off for other resources
© Tefko Saracevic 51
Connection?
DL research & DL practice presently are conducted mostly independently of
each other minimally informing
each other and having slight, or
no connection Parallel universes with
little connections & interaction, at present not good for either
research or practice
Table of content
© Tefko Saracevic 52
Part 9. Conclusions
IS contributions
IS effected handling of information in society
Developed an organized body of knowledge & professional competencies
Applied interdisciplinarity IR reached a mature stage
penetrated many fields & human activities
Stressed HUMAN in human-computer interaction
© Tefko Saracevic 53
Challenges Adjust to the growing & changing social & organizational role of inf. & related inf. infrastructure
Play a positive role in globalization of information
Respond to technological imperative in human terms
Respond to changes from inf. to communication explosion - bringing own experiences to resolutions, particularly to the web
Join competition with quality Join DIGITAL with LIBRARIES
© Tefko Saracevic 54
Juncture
IS is at a critical juncture in its evolution Many fields, groups ... moving into information
big competition entrance of powerful players fight for stakes
To be a major player IS needs to progress in its: research & development professional competencies educational efforts interdisciplinary relations
Reexamination necessary
© Tefko Saracevic 55
Thank you Miró!
Thank you Picasso!
© Tefko Saracevic 56
Thank you Javier &
for inviting me!
© Tefko Saracevic 57
Bibliography
Bates, M. J. (1999). Invisible Substrate of Information Science. Journal of the American Society for Information Science,50, 1043-1050.
Bush, V. (1945). As We May Think. Atlantic Monthly, 176, (11), 101-108. Available: http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm
Hjørland, B. (2000). Library and Information Science: Practice, Theory, and Philosophical Basis. Information Processing & Management, 36 (3), 501-531.
Pettigrew, K.E. & McKechnie, L.E.F. (2000). The use of theory in information science research. Journal of the American Society for Information Science and Technology, 52 (1), 62 - 73.
Saracevic, T. (1999). Information Science. Journal of the American Society for Information Science, 50 (9) 1051-1063. Available: http://www.scils.rutgers.edu/~tefko/JASIS1999.pdf
Saracevic, T. (2005). How were digital libraries evaluated? Presentation at the course and conference Libraries in the Digital Age (LIDA)30 May-3 June 2005, Dubrovnik, Croatia. Available: http://www.scils.rutgers.edu/~tefko/DL_evaluation_LIDA.pdf
Webber, S. (2003) Information Science in 2003: A Critique. Journal of Information Science, 29, (4), 311-330.
White, H. and Mc Cain, K. (1998). Visualizing a Discipline: An Author Co-citation Analysis of Information Science 1972-1995. Journal of the American Society for Information Science, 49 (4), 327-355.