chemistry. early artificial intelligence mechem –valdés-pérez, r.e. some recent human-computer...
TRANSCRIPT
Chemistry
Early Artificial Intelligence• MECHEM
– Valdés-Pérez, R.E. Some Recent Human-Computer Discoveries in Science and What Accounts or Them. AI Magazine, 16 (3). 37-44, 1995.
• FAHRENHEIT– Zytkow, J.M., Combining many searches in the
FAHRENHEIT discovery system. in Proceedings of the 4th International workshop on machine learning, (San Mateo, 1987),281-7.
– Zytkow, J.M. Integration of knowledge and method in real-world discovery. ACM SIGART Bulletin, 2 (4). 179-184, 1991.
Chemistry Informatics
• Information Retrieval– http://www.scientificpsychic.com/az.h
tml• Project Halo
– http://www.projecthalo.com/
• E-notebooks– http://wiki.myexperiment.org/index.php/
Main_Page
Scientific Discovery :A View from the Trenches
Catherine BlakeMeredith Rendall
University of North Carolina at Chapel Hill
Discovery Science 2006
Motivation
• “Discovery systems which solve tasks cooperatively with a domain expert are likely to have an important role, because in any nontrivial domain, it will be virtually impossible to provide the system with a complete theory which is anyway constantly evolving” [Simon, Valdés-Pérez, & Sleeman, 1997]
• “as developers realize the need to provide explicit support for human intervention, we will see even more productive systems and even more impressive discoveries” [Langley, 1998]
• Discovery is “an inherently complex task” [Kuhn ,1996]
Goal
• Model the day-to-day processes of scientists
• Day-to-day activities reflect– the human cognitive processing surrounding discovery – the complex socio-technical environments in which
successful discovery tools will eventually be embedded
• Key questions– What is their definition of Discovery ?– How do they arrive at their research question ?– How do they transition from an initial idea to
publication ?
Hypothesis Verification
Hypothesis Generation
Study Design
• Recruitment– Experienced scientists (7-45 yrs)– Chemists and Chemical Engineers who are
part of the Center for the Environmentally Responsible Solvents and Processes (CERSP)
– Response rate 84% (21/25)
• Semi-structured interviews
• Critical incident technique– consider a recent research paper that you
have published– consider seminal papers in your field
ParticipantsID Interview Title Area of Research Experience (yrs)A 67 Director Biochemical Engineering 32B 55 Assistant Professor Colloid Science and Engineering 10C 51 Associate Professor Polymer Design and Synthesis 12D 51 Professor Semiconductor Surface Chemistry 34E 43 Professor Polymer Chemistry 16F 60 Professor Nano-electronics and Photonics 10G 50 Professor Electronic Materials Synthesis 26H 59 Director Polymer Design and Synthesis 35I 39 Assistant Professor Colloidal & Macromolecular Physics 14J 61 Professor Nano-electronics and Photonics 36K 58 Associate Professor Bioorganic Chemistry 7L 44 Professor Rheology 13M 41 Professor Organometallic Chemistry 31N 54 Professor Polymer Theory 23O 41 Professor Electrochemistry 46P 56 Professor Synthetic Organometallic Chemistry 37
Q 5 * Associate Professor Surface and Interface Polymers 10
R † Associate Professor Polymer Thin Films 20
S 53 Director Polymer Synthesis 16X 56 Professor Neutron Scattering 35Y 33 Professor Chemical Reaction Engineering 40
Average Experience
= 24 yrs
Chemists &
Chemical Engineers
Data Collection
• Interviews– interview time fixed to 1 hour– conducted in interviewees office
• Work flow process design– Outline the process you used to from your
initial idea to the first published paper – Activity cards
• Reading - Thinking - Online searching• Books - Journals -
Experimenting• Analyzing - Writing - Discussion• Organizing
– Extra cards • Repeat cards - Wild card
Interview Questions• Discovery Questions
– What is your definition of discovery ?– What evidence convinced you that the paper addressed the initial research
questions ?– What factors limited the adoption and deployment of the discovery ?– How did you arrive at the research question ?– What if any existing evidence prompted the study/experiment ?– Were there any alternative explanations ?
• Information Usage questions– Other than the scientific literature, what information resources do you draw
from to aid in your research processes ?– How many articles did you read last month that related to each of those
projects ? – Is that typical of how many articles you read in a month for research projects ?– Do you read articles for another purpose ? If so what?– How many hours do you spend reading journal articles for research projects?– Which journals do you typically read and draw from ?– How would you characterize the journals that you read- are they only within
your domain, or do you read journals that would be considered non-traditional in your research ?
– If you only have a few minutes to read an article, what parts would you read? – What do you do with the article once you have read it ?
Work Flow Example
Data Analysis
• 25/27 recorded interviews transcribed verbatim
• 2/27 interview notes transcript• Interviews - qualitative analysis
– All transcripts coded using NVivo 7– Bottom-up theme identification– Reveals similarities and differences
• Work flow process – qualitative and quantitative– Transcribed descriptions– Calculated frequency of activities– Interactions and transitions identified
(1) Novelty
• “new insight” (G and M)• “obviously novel and new and doesn’t exist
in the literature” (L)• “finding something new and unexpected” (P)• “learning something that hasn’t really been
well understood before (G)”; • “not previously seen” (M)• “it [discovery] opens the door to exploration”
(O).
Discovery Definition
(2) Build on existing ideas
• “from my standards, one has only less than ten completely new ideas in their lifetime, and so, most of the time you are sort of doing some modifications on a new idea or something” (N)
• “Even supposedly the most creative people…I don’t think things are cut from a whole cloth anymore. I think there aren’t any more cloths without big holes anymore” (I)
• “Everything has precedent, in my opinion” (M)
Discovery Definition
(3) Have a practical application
• “there are lots of firsts that are wearing the shirt of a different color or making a new isorun that nobody cares about. I don’t put those in a discovery category. On the other hand, NSF likes to try to support research that is going to lead to discoveries that will be transformational and start new areas of research. Those are hard things to do and don’t happen very often.” (M)
Discovery Definition
(4) Balance experimentation and theory
• . “… as an experimentalist, I treat analyzing data as ‘let me try to decide whether or not what I have measured is real before I get too excited about it’ (I)
• “There’s this added level of, almost, engineering that we build a system to see if we can mimic what nature does. If we can mimic what nature does then that tells us that we do understand it as well as we think that we do” (K)
Discovery Definition
(5) Simplicity
• “A discovery doesn’t have to be something that is very hard. It could be something very simple that you can get to work. It doesn’t have to be tedious work or years spent. Some people to see something simple might say it is way too simple, but as long as it is an elegant thing and hasn’t been thought out already; I think that’s fine. It’s a discovery” (L)
Discovery Definition
(1) Discussion
• “Talking to people outside my field is something I would always do. I would never consider it a waste of my time to talk with people” (Q)
• “hearing about other stuff” (K)
Arriving at a Research Question
(2) Previous projects
• “I’ve been doing that since 1972” (D) • “mistakes”, with scientists challenging: “Is it
really wrong? No. It might be interesting” (E) • “There are times when an investigation finds
itself off course and heading in an unanticipated direction, which may be for a variety of reasons including the original idea looking less and less promising to the unexpected outcome is very exciting and potentially a new area of science” (P).
Arriving at a Research Question
(3) Combining expertise
• “different things came together in this project and when I was visiting a lab in France, I was talking to experimentalist and theorist there. They had some very strange results they couldn’t explain, and on the other side we were working on this different part of the same problem. So the two things clicked. That’s how, from a discussion in France and our research at EKC, this idea came that this must be new.” (N)
Arriving at a Research Question
(4) Reading literature
• finding an article “chemically offensive” (E)
• “Literature inspires questions such as: ‘Is there a better way to separate X’, ‘Is there a cheaper way to do Y’, or ‘Can this approach be applied to Z?’” (A)
Arriving at a Research Question
Overview
• 26 process diagrams • literature-related cards
– comprised more than 30% of the activities in all– chemists read (Tenopir, et al., 2003) and
download more articles (Davis & Solla, 2003) than other sciences
• Repeat cards – indeterminate number of repeats– added one extra transition from last to first step
Work Flow Process
Transition Model (>7)
Work Flow Process
Co-occurrence model
Work Flow Process
Results
• Definition of discovery– Novelty is not everything– Simplicity and linking to practical
applications• System design implications
– Link to previous literatures– Describe and justify new discoveries– Use open box algorithms – Explanation based systems
Results
• Arriving at a research question– a collaborative process– build on previous projects
• System design implications– Integrate email or IM– Enable group annotations and
discussions– Enable easy access of previous papers &
analyses
Results
• Transforming a research idea to publication– The work flow process is
• Tightly coupled• Highly iterative
• System design implications– Integrate data within the workflow– Allow iteration between activities
Conclusions
• Discovery definition– Novelty - Balance theory and
experimentation– Build on existing ideas - Practical application– Simplicity
• Hypothesis generation– Discussion - Previous experiments– Combine expertise - Read literature
• Hypothesis validation– Iterative - Tightly coupled
• This is just the first step • Next Step : Operationalize these design
recommendations