teaching chemical information retrieval 26 may 2015
TRANSCRIPT
1
WEBINAR:
TEACHING CHEMICAL
INFORMATION RETRIEVAL
26 MAY, 2015
Damon RidleyJudith Currano
Why I became interested in chemical information retrieval
2
1983
1. Electronic storage of chemical information was the future of the “chemistry library”
2. There were exciting challenges to understand
• Indexing of documents
• Indexing of substances
By 1985 …
There is only so much chemistry you can teach in lectures
3. “If I cannot teach ‘all of chemistry’, then I’d better include in my courses something about how to search the literature”
3
1990s: The Greatest Breakthrough for Chemistry in the last 20 years
Primary Literature
Secondary Literature
4
Teaching Chemical Information Retrieval
The first thing …
“ the chemical literature has a specific organization and specialized entry points, and that, although it is possible to find some information without learning to use them, one is much more efficient and effective if one spends a little bit of time learning ‘the rules.’ ”
Reproduced with permission from Currano, Judith N. “Teaching chemical information for the future: The more things change, the more they stay the same.” The Future of the History of Chemical Information, McEwen, L. R.; Buntrock, R. E. Eds. ACS Symposium Series v. 1167. Washington, DC: American Chemical Society, 2014. Copyright 2014 American Chemical Society.
We invited you to indicate topics to discuss …
5
1. Does chemical information retrieval need to be taught? => Engaging staff and students.
2. Who should teach what course? Faculty or Library staff? => Dedicated/Integrated courses?
3. Teaching about databases?
4. Teaching search skills or teaching solutions?
5. Teaching substructure (and related) searching?
1. Does chemical information need to be taught? => Engaging Staff and Students
6
1. Engaging Staff and Students
7
1a. The retrieval of chemical information is more complex than the retrieval of information in any other subject
COMMON TO ALL SUBJECTS
• Searching bibliographic information (authors, institutions, journal titles …)
• Searching citations
• Searching topics
• Issues include:
• Synonyms, truncation (L- and R-), proximity, Boolean, index terms and index hierarchies, single terms, phrases
• Auto-truncation, implied proximity
• Intellectually/Algorithmically controlled searches
• Understanding the search engine and its defaults
• Plain text and special text
• Greek characters, superscripts, subscripts GABAA => GABAA or GABA A?
UNIQUE TO CHEMISTRY – and all its allied fields
• Searching substances, reactions, properties
• Numerous issues with database building and searching
OTHER FIELDS
• Technical/engineering drawings
• Circuit diagrams
• Mathematical equations
8
1b. Arguably the chemical literature is the largest and most commercialised literature of all
1. Engaging Staff and Students
• The chemical literature provides a lot of value to its customers• To the companies who use it• To the students who rely on information for their research
degrees … …who will become the future chemists in industry and in academia
• It costs a lot of money to:
• Produce high quality articles/journals (open-access or subscription-based)
• Produce databases, especially for substances, reactions, and properties
• Provide services to customers (24/7 access, data usage figures, training materials, repositories)
9
1. Engaging Staff and Students
1c. Course Competencies
10
1d. The Full Text MisconceptionAsk the question: “If this was a particularly relevant article to our work, how would we
find it?”
1. Engaging Staff and Students
The two major commercial products that extract the most information from primary
articles are:
Reaxys and SciFinder
There is a big difference between what you read and what you search
11
The Primary Literature is READ
The Secondary Literature is SEARCHED
1d. The Full Text Misconception
• We read:
• A single document
It does not matter what terms the authors used. We have the document!
• We search:
• Millions of documents
We do not know what terms the authors used
We have to add a lot of terms to cover all the options, then more synonyms => more records
We have a problem!
2. Who should teach what course? Faculty or Library staff? => Dedicated/Integrted Courses
12
Other difficult question!
• My feeling is that teaching is best done by the specialists in the subject
• The collection and dissemination of Chemical Information is the responsibility of the library and of the teaching faculties
• Areas for which the library is primarily responsible include:
• Researching the different resources• Advising Faculty of alternatives, and of developments
• Managing acquisition budgets• Negotiating with suppliers (either through the institution or through consortia
arrangements)
• Teaching dedicated courses or classes to develop skills to a basic level… …from which more advanced, specialized and research-related instruction may easily be developed by Faculty
• Understanding required information competencies for chemistry graduates, and ensuring that the library provides the resources required
• Updating information on library websites• e.g., new releases of, and developments in, information retrieval resources
13
• Areas for which Faculty is primarily responsible include:
• Ensuring that courses include training in information retrieval to meet professional standards or mandates
• Providing their graduates with the levels of information literacy required to meet their immediate postgraduate research needs and their future professional needs
2. Who should teach what course? Faculty? Library staff? The roles of Faculty and the Library?
• Teaching specialized skills in chemical information retrieval that relate to knowledge of chemical topics and fields such as the retrieval of substance, reaction, property and health & safety information
• In most cases teaching these specialized skills is best done through integration of instruction on chemical information issues into relevant parts of lecture, tutorial or laboratory coursework
3. Teaching Databases (Library)3a. Judith’s Principles
14
Principle 1:If it is not there, you cannot find it. If it is there, you need
to know what to call it.
Principle 2:Information systems take you literally …
Except when they don’t … And even then, they do!
Principle 3:If you want to use a source effectively, you need to
understand its scope and organization.
Principle 5:To choose a source effectively, you need to understand the information
landscape.
Principle 4:All information sources are not created
equal.
Different:• Keywords• Substance
s• Reactions• Properties
Currano, Judith N. “Teaching chemical information for the future: The more things change, the more they stay the same.” The Future of the History of Chemical Information, McEwen, L. R.; Buntrock, R. E. Eds. ACS Symposium Series v. 1167. Washington, DC: American Chemical Society, 2014. Copyright 2014 American Chemical Society.
3. Teaching Databases (Library)3b. Buying and teaching databases are intimately linked
15
When Buying Databases we
should:
When Teaching Databases we
should:
subscribe to a database only if it adds clear value to
our primary collection
teach the limitations of searching the
primary literature
Buying and Teaching Databases are Intimately Linked
understand the unique features and
functions of the database
teach the unique features and
functions of the database
16
3c. Buying/Teaching Databasesi. The Value that Databases may Add
Databases add value through grouping
together all primary documents in the
discipline
The breadth and depth of the grouping (journal articles, patents) is of
importance(“the numbers game”) …
… but far more important is …
… what is extracted from these articles
… what additional content is added
… how it is searched
… what (answer) analysis tools does
it have
What content is added
17
3c. Buying/Teaching Databasesii. New content added in records
Controlled VocabularyIndex Headings/Index Keywords
• May help with search precision/comprehension, but how can
we use them?
18
IntellectuallyAlgorithmically
3c. Buying/Teaching Databasesiii. How it is searched
• Synonyms, truncation (L- and R-), proximity, Boolean, index terms and index hierarchies, single terms, phrases
• Auto-truncation
• Auto-suggest
• Understanding the search engine and its defaults
• Plain text and special text
• Greek characters, superscripts, subscripts – GABAA => GABAA or GABA A?
Ask Reaxys
“Google”?
SciFinder: Explore by
Research Topic
19
SciFinder: Analyze, Refine, Categorize (“Precision Tools”,
SciPlanner)
Reaxys: Filter by: Analysis View (Synthesize, Synthesis
Planner)
Scopus: Analyze Search Results
3c. Buying/Teaching Databasesv. Post-processing tools
20
Judith:“Teach Relevant,
Transferrable Skills, and Use Resources to Demonstrate Those
Skills”
4. Teaching search skills or teaching solutions? (Library)
Reproduced with permission from Currano, Judith N. “Teaching chemical information for the future: The more things change, the more they stay the same.” The Future of the History of Chemical Information, McEwen, L. R.; Buntrock, R. E. Eds. ACS Symposium Series v. 1167. Washington, DC: American Chemical Society, 2014. Copyright 2014 American Chemical Society.
21
5. Specialised Searches (Library/Faculty)5a. Substance Searches
Reproduced with permission from Currano, Judith N. “Teaching chemical information for the future: The more things change, the more they stay the same.” The Future of the History of Chemical Information, McEwen, L. R.; Buntrock, R. E. Eds. ACS Symposium Series v. 1167. Washington, DC: American Chemical Society, 2014. Copyright 2014 American Chemical Society.
22
MAYBE … …but in many cases MAYBE NOT
5. Specialised Searches (Library/Faculty)5a. Substance Searches
• CAS Registry Numbers are the systematic indexing entries for substances in CAS databases... …but other databases have different ways to systematically index substances
• Some databases also list CAS Registry Numbers…… but usually the CAS RNs they list will be quite different from the CAS RNs listed in records from the same original document
23
5. Specialised Searches (Library/Faculty)5a. Substance Searches
CAS Registry Numbers are relatively:• Easy to find when substances have specific structure,
formula, or a simple name• Difficult to find (and you may have to use a number of
them) in “more complex substances” such as substances:• that do not have specific structures or formulas• that, in the literature, may be variously described …
… such as topaz
24
Search Google: CAS Registry Number for topaz
5. Specialised Searches (Library/Faculty)5a. Substance Searches
25
5. Specialised Searches (Library/Faculty)5b. Substructure Searches
Teaching Students to Think… Like a Database!
O
Preconception: This is acetone!In a substructure search
hydrogen atoms are not assumed. Think of it as a template drawn on a piece of glass and superimposed over substances in the database.
O
O
OK, this result makes sense
O
O
The template overlaps this substance in an expected way.
O
I have no idea why I retrieved this substance.
O
The template actually does overlap this substance, but the topology may not be what you wanted!
Questions to encourage your students to ask
1. Questions about atoms
A. Can a site be substituted?
B. If so, how
2. Questions about bonds
A. Does stereochemistry matter?
B. Must the bond orders be as drawn?
C. Are there restrictions on topology?
These questions should be asked for everything drawn, as well as those things not drawn!
26
Think about how the database will treat your query
1. Choose the database that contains the content that you want to find
2. Construct the query using the tools available in that database What are the defaults for free sites/substitution? What are the defaults for topology? What kinds of predefined variables and groups are
available? Will the system allow you to build your own R-group (if
needed) How will the system allow you to control substituents
that are not drawn?3. Build and run your query4. Review your results to see if they are relevant5. Refine your query or construct a new query and combine hit
sets with the original query for greater precision
5. Specialised Searches (Library/Faculty)5b. Substructure Searches
27
5. Specialised Searches (Library/Faculty)5b. Substructure Searches
Undergraduate students• When to teach: Upper-level lecture and laboratory classes
• What to teach• Substructure theory• Basics: free sites vs unsubstituted sites, bond order/topology variability• When to use substructure searching
Graduate students, MS scientists, and PhD scientists• When to teach
• Organic synthesis classes• Information courses and library orientation sessions• Database-specific and general information skills workshops
• What to teach• Substructure theory and basics• Setting bond order, and topology• R-groups and advanced skills (ex. repeating units, variable points of
attachment)• Applications to reaction searching
When should substructure searching be taught, and which skills are important at each level?
28
Teaching the searching of:• Reactions• Properties
When you log off this session you will be asked if you would like further information• Please check the boxes of interest
If you would like further information …
30
Wittig Reaction – Ask Reaxys