lis618 lecture 1 thomas krichel 2004-02-01. structure of talk recap on boolean (aurally) before...
TRANSCRIPT
LIS618 lecture 1
Thomas Krichel
2004-02-01
structure of talk
• Recap on Boolean (aurally)• Before online searching • Working with DIALOG
– Overview– Search command
• Boolean exercise (on the fly)
before a search I
• What is the purpose of the query?– brief overview– comprehensive search
• What perspective on the topic is required?– scholarly– technical– business– popular
before search II
• What type of information does the patron want?– fulltext– bibliographic– directory– numeric
• Are there any known sources?– authors– journals– papers– conferences
before search III
• What are the language restrictions?
• What, if any, are the cost restrictions?
• How current need the data to be?
• How much of each record is required?
concept analysis
• This is the art/science of taking the topic to search for and develop facets. Example “Internet filtering in Libraries”– Internet filter– Libraries– Controversy not technical issues
• We may also need the think about the aim of the search.
search aims
• a known needle in a known haystack
• a known needle in an unknown haystack
• an unknown needle in an unknown haystack
• any needle in a haystack
• the sharpest needle in a haystack
• most of the sharpest needles in a haystack
search aims
• all the needles in a haystack
• affirmation of no needles in a haystack
• things like needles in a haystack
• is there a new needle in the haystack
• where are the haystacks
• needles, haystacks, anything
types of searches
• known-item searches
• negative searches
• selective dissemination of information
• topical or subject searches
• passage searching, where the user is only interested in part of the item
search strategies I
• Building block approach– Do a number of elementary searches– Combine the resulting sets with Boolean
operators
• This is what I did in the example in the previous lecture
• Works only with the Boolean model
search strategies II
• Snowballing approach– Start with a very specific query– Think of other term that can be added to get
more results– Stop when a reasonable number of results are
achieved.
• Not sure this really works well in practice.
search strategies III
• The successive fraction approach is the opposite of the snowballing approach– First search for a broad concept– Then repeat the query by adding various
limiting factors.
• Can work well if the IR system allows to repeat and edit queries.
• But queries can become unwieldy.
search strategies IV
• Most specific facet first– Conduct concept analysis– Look for the most specific facet– Search that first, add others later
• Presupposes that you have done a decent concept analysis.
two steps in DIALOG
• step one: select databases (aka files) to look at
• step two: perform searches on the selected databases
• You may wonder why one does not have one single step like in a search engine. Discuss.
• today we concentrate on the second step
working on selected files
• We assume that we have selected database that we know and we look at the search interface on the selected database.
• The database selection process is a bit more complicated, covered next week.
• First, let us login and look at the command prompt.
• Then we select the first database (file) with the begin command
the ‘begin’ command
• As its name suggests, usually the first command.
• begin number, number,…
• selects files with numbers number
• Once they are selected they can be searched.
• Now select the ERIC "begin 1"
• "Begin 1" can be abbreviated as "b 1"
substeps in the second step
• Identify search terms
• Use Dialog basic commands to conduct a search
• View records online or print the results
the 's' (select) command
• Once issued the "begin" command to select a database, we issue the "s" command on the database.
• "s query_expression" where query_expression is a query expression.
• This will search the index of selected database in full-text view for the query issued
• It will not find any of the following: "an and by for from of the to with". They are stop words.
query expression
• A query expression contains search terms expressed in special ways– You can truncate search terms. – You can build an elementary expression by
putting several keywords together. This is achieved by DIALOG's connectors.
– You can combine several expressions with the use of Boolean operators
• We will cover this is in turn now.
truncation of terms I
• Open Truncation– "select path?" retrieves all words that begin
with path: paths, pathos, pathway, pathology
• Controlled-Length Truncation– "select path??" retrieves the root and up to
two additional characters: paths, pathos
truncation of terms II• Embedded Character truncation can be used
for variant spellings:– "select organi?ation" -> organization
organisation – "select fib??board" -> fiberboard fibreboard
• This truncation feature is also useful for searching for unusual plural forms:– "select wom?n" -> woman women
• Apparently you can also do prefixes by putting the ? in the beginning. – "?mobile" -> automobile metamobile
use of connectors
• Connectors are used to put several words together.
• One instance where this is useful is when you have words that on their own mean different things.
• For example "mate" is a herbal beverage consumed in South America. Looking for mate on the Internet retrieves a lot of singles' pages.
example: terms related to "mate"
What other terms to be used? – matear (drink mate)– matero (mate drinker)– cebar (prepare mate)– cebador (mate preparer) – yerba (mate herb)– bombilla (mate straw)
connectors I
• '(W)' requires terms to appear one after the other next to each other e.g. 'yerba(W)mate?' matches "yerba mate".
• '(i W)' where i is an integer, means followed by at most i words, e.g. 'ceba?(3W)mate?' matches "cebar un maravilloso mate" but not "cebador guapo mirando un buen mate"
connectors II
• '(N)' requires terms to be next to each other e.g. 'yerba(N)mate?' matches "yerba mate" or "mate yerba".
• '(i N)' where i is an integer, means proximity by at most i words, e.g. 'ceba?(3N)mate?' matches "cebar mate" or "matear con la cebadora".
• '(S)' searches for the occurrence of connected terms in the same paragraph.
using Boolean operators
• In your query, you can combine several expressions with Boolean operators
• Example: "S LIBRARY(W)SCHOOL? AND DISTANCE(W)EDUCATION"
• But I usually do not issue such fancy queries.
executing several searches
• There can be several searches done sequentially, and the results sets are saved by the system.
• Each time the system assigns a set number, Si,
• These can be combined in Boolean expressions, e.g. 's S1 or S2 and S3'
• Remember that Boolean operations are set-theoretic!
Boolean operators on sets
• When using Booleans, be aware that "and" has higher precedence than "or".
• Thus:a or b and cis not the same as(a or b) and cbut it is a or (b and c)
• Use parenthesis when in doubt
DS (display sets)
• This command can be executed any time to review the sets that have been formed since the last B (begin) command.
• This can be useful to review your search history.
the target command
• "target set" where set is a search result set creates a subset of the "statistically most relevant results" in the original set.
• I have not seen details about how this subset is computed.
• A new result set is being formed.
display: the type command
type set/format/range
• set is a result set
• format is a format
• range can be – start – end
• start is a record number to start• end is a record number to end
– all
standard delivery formats
• 2 -- full record except abstract• 3 or medium – citation• 5 or long – full except full text• 6 or free – title and dialog number• 8 or short – title plus indexing terms
– useful to find other indexing terms
• 9 or full – everything• KWIC or K – keywords in context
options for delivery
• I once tried to email results to me, to no avail
• You can save the html of the search results in the browser.
• You can print the results within the browser.
http://openlib.org/home/krichel
Thank you for your attention!
• to do: set up consistent notation