2008 © chengxiang zhai 1 introduction to research chengxiang zhai department of computer science...

67
2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign http://www-faculty.cs.uiuc.edu/~czhai, [email protected]

Upload: neil-martin

Post on 31-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 1

Introduction to Research

ChengXiang ZhaiDepartment of Computer Science

University of Illinois, Urbana-Champaign

http://www-faculty.cs.uiuc.edu/~czhai, [email protected]

Page 2: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 2

Outline

1.What is research?

2.How to prepare yourself for IR research?

3.How to identify and define a good IR research problem?

4.How to formulate and test IR research hypotheses?

5.How to write and publish an IR paper?

Page 3: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 3

Part 1. What is research?

Page 4: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 4

What is Research?• Research

– Discover new knowledge

– Seek answers to questions

• Basic research– Goal: Expand man’s knowledge (e.g., which genes control social

behavior of honey bees? )

– Often driven by curiosity (but not always)

– High impact examples: relativity theory, DNA, …

• Applied research– Goal: Improve human condition (i.e., improve the wolrd) (e.g.,

how to cure cancers?)

– Driven by practical needs

– High impact examples: computers, transistors, vaccinations, …

• The boundary is vague; distinction isn’t important

Page 5: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 5

Why Research?

Amount of knowledge

Advancement of Technology

Utility of Applications

Quality of Life

Basic ResearchApplied Research

ApplicationDevelopment

Curiosity

Page 6: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 6

Where’s IR Research?

Amount of knowledge

Advancement of Technology

Utility of Applications

Quality of Life

Basic ResearchApplied Research

ApplicationDevelopment

Information Science

Computer Science

Page 7: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 7

Where’s Your Position?

Amount of knowledge

Advancement of Technology

Utility of Applications

Quality of Life

Basic ResearchApplied Research

ApplicationDevelopment

Different position benefits from different collaborators

Page 8: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 8

Research Process

• Identification of the topic (e.g., Web search)

• Hypothesis formulation (e.g., algorithm X is better than Y=state-of-the-art)

• Experiment design (measures, data, etc) (e.g., retrieval accuracy on a sample of web data)

• Test hypothesis (e.g., compare X and Y on the data)

• Draw conclusions and repeat the cycle of hypothesis formulation and testing if necessary (e.g., Y is better only for some queries, now what?)

Page 9: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 9

Typical IR Research Process

• Look for a high-impact topic (basic or applied)

• New problem: define/frame the problem

• Identify weakness of existing solutions if any

• Propose new methods

• Choose data sets (often a main challenge)

• Design evaluation measures (can be very difficult)

• Run many experiments (need to have clear research hypotheses)

• Analyze results and repeat the steps above if necessary

• Publish research results

Page 10: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 10

Research Methods

• Exploratory research: Identify and frame a new problem (e.g., “a survey/outlook of personalized search”)

• Constructive research: Construct a (new) solution to a problem (e.g., “a new method for expert finding”)

• Empirical research: evaluate and compare existing solutions (e.g., “a comparative evaluation of link analysis methods for web search”)

• The “E-C-E cycle”: exploratoryconstructiveempiricalexploratory…

Page 11: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 11

Types of Research Questions and Results

• Exploratory (Framework): What’s out there?

• Descriptive (Principles): What does it look like? How does it work?

• Evaluative (Empirical results): How well does a method solve a problem?

• Explanatory (Causes): Why does something happen the way it happens?

• Predictive (Models): What would happen if xxx ?

Page 12: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 12

Solid and High Impact Research

• Solid work: – A clear hypothesis (research question) with conclusive result

(either positive or negative)

– Clearly adds to our knowledge base (what can we learn from this work?)

– Implications: a solid, focused contribution is often better than a non-conclusive broad exploration

• High impact = high-importance-of-problem * high-quality-of-solution– high impact = open up an important problem

– high impact = close a problem with the best solution

– high impact = major milestones in between

– Implications: question the importance of the problem and don’t just be satisfied with a good solution, make it the best

Page 13: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 13

Part 2. How to prepare yourself for IR research?

Page 14: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 14

What It Takes to Do Research

• Curiosity: allow you to ask questions

• Critical thinking: allow you to challenge assumptions

• Learning: take you to the frontier of knowledge

• Persistence: so that you don’t give up

• Respect data and truth: ensure your research is solid

• Communication: allow you to publish your work

• …

Page 15: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 15

Learning about IR

• Start with an IR text book (e.g., Manning et al., Grossman & Frieder, a forth-coming book from UMass,…)

• Then read “Readings in IR” by Karen Sparck Jones, Peter Willett

• And read papers recommended in the following article: http://www.sigir.org/forum/2005D/2005d_sigirforum_moffat.pdf

• Read other papers published in recent IR/IR-related conferences

• Take advantage of online resources (e.g., http://timan.cs.uiuc.edu/resources)

Page 16: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 16

Learning about IR (cont.)

• Getting more focused – Choose your favorite sub-area (e.g., retrieval models)

– Extend your knowledge about related topics (e.g., machine learning, statistical modeling, optimization)

• Stay in frontier:– Keep monitoring literature in both IR and related areas

• Broaden your view: Keep an eye on – Industry activities

• Read about industry trends

• Try out novel prototype systems

– Funding trends

• Read request for proposals

Page 17: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 17

Critical Thinking

• Develop a habit of asking questions, especially why questions

• Always try to make sense of what you have read/heard; don’t let any question pass by

• Get used to challenging everything

• Practical advice

– Question every claim made in a paper or a talk (can you argue the other way?)

– Try to write two opposite reviews of a paper (one mainly to argue for accepting the paper and the other for rejecting it)

– Force yourself to challenge one point in every talk that you attend and raise a question

Page 18: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 18

Respect Data and Truth

• Be honest with the experiment results

– Don’t throw away negative results!

– Try to learn from negative results

• Don’t twist data to fit your hypothesis; instead, let the hypothesis choose data

• Be objective in data analysis and interpretation; don’t mislead readers

• Aim at understanding/explanation instead of just good results

• Be careful not to over-generalize (for both good and bad results); you may be far from the truth

Page 19: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 19

Communications

• General communication skills:

– Oral and written

– Formal and informal

– Talk to people with different level of backgrounds

• Be clear, concise, accurate, and adaptive (elaborate with examples, summarize by abstraction)

• English proficiency

• Get used to talking to people from different fields

Page 20: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 20

Persistence

• Work only on topics that you are passionate about

• Work only on hypotheses that you believe in

• Don’t draw negative conclusions prematurely and give up easily

– positive results may be hidden in negative results

– In many cases, negative results don’t completely reject a hypothesis

• Be comfortable with criticisms about your work (learn from negative reviews of a rejected paper)

• Think of possibilities of repositioning a work

Page 21: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 21

Optimize Your Training

• Know your strengths and weaknesses

– strong in math vs. strong in system development

– creative vs. thorough

– …

• Train yourself to fix weaknesses

• Find strategic partners

• Position yourself to take advantage of your strengths

Page 22: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 22

Part 3. How to identify and define a good IR research problem?

Page 23: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 23

What is a Good Research Problem?

• Well-defined: Would we be able to tell whether we’ve solved the problem?

• Highly important: Who would care about the solution to the problem? What would happen if we don’t solve the problem?

• Solvable: Is there any clue about how to solve it? Do you have a baseline approach? Do you have the needed resources?

• Matching your strength: Are you at a good position to solve the problem?

Page 24: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 24

Challenge-Impact Analysis

Level of Challenges

Impact/Usefulness

Known

UnknownGood applications

Not interestingfor research

High impactLow risk (easy)

Good short-termresearch problems

High impactHigh risk (hard)Good long-term

research problemsDifficult

basic researchProblems,

but questionable impact

Low impactLow risk

Bad research problems(May not be publishable)

“entry point” problems

Page 25: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 25

Optimizing “Research Return”:Pick a Problem Best for You

Your Passion

High (Potential)

Impact

Your Strength

Best problems for you

Find your passion: If you don’t have to work/study for money, what would you do?

Test of impact: If you are given $1M to fund a research project, what would you fund?

Find your strength/Avoid your weakness: What are you (not) good at?

Page 26: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 26

How to Find a Problem?

• Application-driven (Find a nail, then make a hammer)

– Identify a need by people/users that cannot be satisfied well currently (“complaints” about current data/information management systems?)

– How difficult is it to solve the problem?

• No big technical challenges: do a startup

• Lots of big challenges: write a research proposal

– Identify one technical challenge as your topic

– Formulate/frame the problem appropriately so that you can solve it

• Aim at a completely new application/function (find a high-stake nail)

Page 27: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 27

How to Find a Problem? (cont.) • Tool-driven (Hold a hammer, and look for a nail)

– Choose your favorite state-of-the-art tools • Ideally, you have a “secret weapon”

• Otherwise, bring tools from area X to area Y

– Look around for possible applications

– Find a novel application that seems to match your tools

– How difficult is it to use your tools to solve the problem? • No big technical challenges: do a startup

• Lots of big challenges: write a research proposal

– Identify one technical challenge as your topic

– Formulate/frame the problem appropriately so that you can solve it

• Aim at important extension of the tool (find an unexpected application and use the best hammer)

Page 28: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 28

How to Find a Problem? (cont.)

• In practice, you do both in various kinds of ways

– You talk to people in application domains and identify new “nails”

– You take courses and read books to acquire new “hammers”

– You check out related areas for both new “nails” and new “hammers”

– You read visionary papers and the “future work” sections of research papers, and then take a problem from there

– …

Page 29: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 29

Three Basic Questions to Ask about an IR Problem

• Who are the users?– Everyone vs. Small group of people

• What data do we have?– Web (whole web vs. sub-web)

– Email (public email vs. personal email)

– Literature (general vs. special discipline)

– Blog, forum, …

• What functions do we want to support?– Information access vs. knowledge acquisition

– Decision and task support

Everyone (who has an Internet connection)

The whole web (indexed by Google)

Search (by keywords)

Page 30: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 30

Look for New IR Research Questions

• Driven by new data: X is a new type of data emerging (e.g., X= blog vs. news)– How is X different from existing types of data?

– What new issues/problems are raised by X?

– Are existing methods sufficient for solving old problems on X? If not, what are the new challenges?

– What new methods are needed?

– Are old evaluation measures adequate?

• Driven by new users: Y is a set of new users (e.g., ordinary people vs. librarians)– How are the new users different from old ones? What new needs do they have?

– Can existing methods work well to satisfy their needs? If not, what are the new challenges?

– What new functions are appropriate for Y?

• Driven by new tasks (not necessarily new users or new data): Z is a new task (e.g., social networking, online shopping)

– What information management functions are needed to better support Z?

– Can these new functions reduced to old ones? If not, what are the new challenges?

Page 31: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 31

Map of IR Applications

Web pages

News articles

Email messages

Literature

Organization docs

Legal docs/Patents

Medical records

Customer complaint letter/transcripts

KidsPeking Univ. community

Lawyers Scientists

Search Browsing Alert MiningTask/Decision

support

CustomerServicePeople

Email management+ automatic reply

“Google Kids”

Legal InfoSystems

LiteratureAssistant

IntranetSearch

LocalWeb

Service

Blog articles

OnlineShoppers

?

Page 32: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 32

High-Level Challenges in IR

• How to make use of imperfect IR techniques to do something useful?

– Save human labor (e.g., partially automate a task)

– Create “add on” value (e.g., literature alert)

– A lot of HCI issues (e.g., allowing users to control)

• How to develop robust, effective, and efficient methods for a particular application?

– Methods need to “work all the time” without failure

– Methods need to be accurate enough to be useful

– Methods need to be efficient enough to be useful

Page 33: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 33

Challenge 1: From Search to Information Access

• Search is only one way to access information

• Browsing and recommendation are two other ways

• How can we effectively combine these three ways to provided integrated information access?

• E.g., artificially linking search results with additional hyperlinks, “literature pop-ups”…

Page 34: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 34

Challenge 2: From Information Access to Task Support

• The purpose of accessing information is often to perform some tasks

• How can we go beyond information access to support a user at the task level?

• E.g., automatic/semi-automatic email reply for customer service, literature information service for paper writing (suggest relevant citations, term definitions, etc), comparing prices for shoppers

Page 35: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 35

Challenge 3: Support Whole Life Cycle of Information

• A life cycle of information consists of “creation”, “storage”, “transformation”, “consumption”, “recycling”, etc

• Most existing applications support one stage (e.g., search supports “consumption”)

• How can we support the whole life cycle in an integrated way?

• E.g., Community publication/subscription service (no need for crawling, user profiling)

Page 36: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 36

Challenge 4: Collaborative Information Management

• Users (especially similar users) often have similar information need

• Users who have explored the information space can share their experiences with other users

• How to exploit the collective expertise of users and allow users to help each other?

• E.g., allowing “information annotation” on the Web (“footprints”), collaborative filtering/retrieval,

Page 37: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 37

General Steps to Define a Research Problem

• Generate and Test

• Raise a question

• Novelty test: Figure out to what extent we know how to answer the question– There’s already an answer to it: Is the answer good enough?

• Yes: not interesting, but can you make the question more challenging?

• No: your research problem is how to get a better answer to the raised question

– No obvious answer: you’ve got an interesting problem to work on

• Tractability test: Figure out whether the raised question can be answered – I can see a way to answer it or potentially answer it: you’ve got a solvable

problem

– I can’t easily see a way to answer it: Is it because the question is too hard or you’ve not worked hard enough? Try to reframe the problem to make it easier

• Evaluation test: Can you obtain a data set and define measures to test solutions/answers?

– Yes: you’ve got a clearly defined problem to work on

– No: can you think of anyway to indirectly test the solutions/answers? Can you reframe the problem to fit the data?

• Every time you reframe a problem, try to do all the three tests again.

Page 38: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 38

Rigorously Define Your Research Problem

• Exploratory: what is the scope of exploration? What is the goal of exploration? Can you rigorously answer these questions?

• Descriptive: what does it look like? How does it work? Can you formally define a principle?

• Evaluative: can you clearly state the assumptions about data collection? Can you rigorously define measures?

• Explanatory: how can you rigorously verify a cause?

• Predictive: can you rigorously define what prediction is to be made?

Page 39: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 39

Frame a New Computation Task

• Define basic concepts

• Specify the input

• Specify the output

• Specify any preferences or constraints

Page 40: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 40

From a new application to a clearly defined research problem

• Try to picture a new system, thus clarify what new functionality is to be provided and what benefit you’ll bring to a user

• Among all the system modules, which are easy to build and which are challenging?

• Pick a challenge and try to formalize the challenge– What exactly would be the input?

– What exactly would be the output?

• Is this challenge really a new challenge (not immediately clear how to solve it)?– Yes, your research problem is how to solve this new problem

– No, it can be reduced to some known challenge: are existing methods sufficient?

• Yes, not a good problem to work on

• No, your research problem is how to extend/adapt existing methods to solve your new challenge

• Tuning the problem

Page 41: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 41

Tuning the Problem

Level of Challenges

Impact/Usefulness

Known

Unknown

Make a hard problem easier

Make an easy problem harder

Increase impact (more general)

Page 42: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 42

“Short-Cut” for starting IR research• Scan most recently published papers to find papers that you like or can

understand

• Read such papers in detail

• Track down background papers to increase your understanding

• Brainstorm ideas of extending the work

– Start with ideas mentioned in the future work part

– Systematically question the solidness of the paper (have the authors answered all the questions? Can you think of questions that aren’t answered?)

– Is there a better formulation of the problem

– Is there a better method for solving the problem

– Is the evaluation solid?

• Pick one new idea and work on it

Page 43: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 43

Part 4. How to formulate and test IR research hypotheses?

Page 44: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 44

Formulate Research Hypotheses• Typical hypotheses in IR:

– Hypothesis about user characteristics (tested with user studies or user-log analysis, e.g., clickthrough bias)

– Hypothesis about data characteristics (tested with fitting actual data, e.g., Zipf’s law)

– Hypothesis about methods (tested with experiments):• Method A works (or doesn’t work) for task B under condition C by

measure D (feasibility)

• Method A performs better than method A’ for task B under condition C by measure D (comparative)

• Introduce baselines naturally lead to hypotheses

• Carefully study existing literature to figure our where exactly you can make a new contribution (what do you want others to cite your work as?)

• The more specialized a hypothesis is, the more likely it’s new, but a narrow hypothesis has lower impact than a general one, so try to generalize as much as you can to increase impact

• But avoid over-generalizing (must be supported by your experiments)

• Tuning hypotheses

Page 45: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 45

Procedure of Hypothesis Testing

• Clearly define the hypothesis to be tested (include any necessary conditions)

• Design the right experiments to test it (experiments must match the hypothesis in all aspects)

• Carefully analyze results (seek for understanding and explanation rather than just description)

• Unless you’ve got a complete understanding of everything, always attempts to formulate a further hypothesis to achieve better understanding

Page 46: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 46

Clearly Define a Hypothesis

• A clearly defined hypothesis helps you choose the right data and right measures

• Make sure to include any necessary conditions so that you don’t over claim

• Be clear about any justification for your hypothesis (testing a random hypothesis requires more data than testing a well-justified hypothesis)

Page 47: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 47

Design the Right Experiments• Flawed experiment design is a common cause of rejection of

an IR paper (e.g., a poorly chosen baseline)

• The data should match the hypothesis – A general claim like “method A is better than B” would need a

variety of representative data sets to prove

• The measure should match the hypothesis– Multiple measures are often needed (e.g., both precision and

recall)

• The experiment procedure shouldn’t be biased – Comparing A with B requires using identical procedure for both

– Common mistake: baseline method not tuned or not tuned seriously

• Test multiple hypotheses simultaneously if possible (for the sake of efficiency)

Page 48: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 48

Carefully Analyze the Results

• Do the significance test if possible/meaningful

• Go beyond just getting a yes/no answer

– If positive: seek for evidence to support your original justification of the hypothesis.

– If negative: look into reasons to understand how your hypothesis should be modified

– In general, seek for explanations of everything!

• Get as much as possible out of the results of one experiment before jumping to run another

– Don’t throw away negative data

– Try to think of alternative ways of looking at data

Page 49: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 49

Modify a Hypothesis

• Don’t stop at the current hypothesis; try to generate a modified hypothesis to further discover new knowledge

• If your hypothesis is supported, think about the possibility of further generalizing the hypothesis and test the new hypothesis

• If your hypothesis isn’t supported, think about how to narrow it down to some special cases to see if it can be supported in a weaker form

Page 50: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 50

Derive New Hypotheses

• After you finish testing some hypotheses and reaching conclusions, try to see if you can derive interesting new hypotheses

– Your data may suggest an additional (sometimes unrelated) hypothesis; you get a by-product

– A new hypothesis can also logically follow a current hypothesis or help further support a current hypothesis

• New hypotheses may help find causes:

– If the cause is X, then H1 must be true, so we test H1

Page 51: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 51

Part 5:How to write and publish an IR

paper?

Page 52: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 52

When to Write a Paper? • Survey/Review paper:

– An emerging field or topic has appeared (i.e., a hot topic) but no survey is available, or sufficient new development has occurred such that existing surveys are out of date

– You’ve read and digested enough papers about the topic

• Original research paper: when you have sufficient results to draw an interesting conclusion or answer an interesting research question, i.e., you’ve got a basic story to tell, e.g.,– A new problem, a solution, and results showing how good the

solution is

– An old problem, a new solution, and results showing advantage(s) of the new solution over the old ones

– An old problem, many old solutions, and results showing an understanding of their relative performance

– In general, a research question and an answer

Page 53: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 53

Before you write any paper, be clear about the targeted readers

Page 54: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 54

Typical Structure of a Survey Paper

• Introduction:

– Motivation for the survey

• An emerging field/topic, but no survey available

• Surveys exist, but they are out of date (e.g., due to new development in a field/topic)

– Scope of the survey

• Background (if necessary)

• Conceptual framework ( based on synthesis of the literature)

– Define basic concepts, terminology, etc

– Give a big picture of the topic so that your survey is coherent

Page 55: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 55

Typical Structure of a Survey Paper (cont.)

• Systematic review of existing work – It’s very important that you have some clear structure for this part

• The structure is usually your conceptual framework, or

• other meaningful structures (e.g., by time or some way to classify all the work)

– Be critical! Add your opinions about the work surveyed

– Don’t treat every work equally; elaborate on some representative work and simply give pointers to other work

• Summary– Summarize the progress and the state of the art

– Give recommendations if any (e.g., for practitioners)

– Outlook (remaining challenges, future directions)

• References

Page 56: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 56

Typical Structure of a Research Paper

• 1. Introduction– Background discussion to motivate your problem

– Define your problem

– Argue why it’s important to solve the problem

– Identify knowledge gap in existing work or point out deficiency of existing answers/solutions

– Summarize your contributions

– Briefly mention potential impact

• Tips: – Start with sentences understandable to almost everyone

– Tell the story at a high-level so that the entire introduction is understandable to people with no/little technical background in the topic

– Use examples if possible

Page 57: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 57

Typical Structure of a Research Paper (cont.)

• 2. Previous/Related work

– Sometimes this part is included in the introduction or appears later

– Previous work = work that you extend (readers must be familiar with it to understand your contribution)

– Related work = work related to your work (readers can until later in the paper to know about it)

• Tips:

– Make sure not to miss important related work

– Always safer to include more related work

– Discuss the existing work and its connection to your work

• Your work extends …

• Your work is similar to … but differs in that …

• Your work represents an alternative way of …

– Whenever possible, explicitly discuss your contribution in the context of existing work

Page 58: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 58

Typical Structure of a Research Paper (cont.)

• 3. Problem definition/formulation

– Clearly define your problem

• If it’s a new problem, discuss its relation to existing related problems

• If it’s an old problem, cite the previous work

– Justify why you define the problem in this way

– Discuss challenges in solving the problem

• Tips:

– Give both an informal description and a formal description if possible

– Make sure that you mention any assumption you make when defining the problem (e.g., your focus may be on studying the problem in certain conditions)

Page 59: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 59

Typical Structure of a Research Paper(cont.)

• 4. Overview of the solution(s) (can be merged with the next part)– Give a high-level information description of the proposed

solutions or solutions you study

– Use examples if possible

• 5. Specific components of your solution(s)– Be precise (formal description helps)

– Use intuitive descriptions to help people understand it

• Tips: – make sure that you organize this part so that it’s understandable

to people with various backgrounds

– Don’t just throw in formulas; include high-level intuitive descriptions whenever possible

Page 60: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 60

Typical Structure of a Research Paper(cont.)

• 6. Experiment design: make sure you justify it

– Data set

– Measures

– Experiment procedure

• Tips:

– Given enough details so that people can reproduce your experiments

– Discuss limitation/bias if any, and discuss its potential influence on your study

Page 61: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 61

Typical Structure of a Research Paper(cont.)

• 7. Result analysis: – Organized based on research questions to be answered or hypotheses

tested

– Be comprehensive, but focus on the major conclusions

– Include “standard” components• Baseline comparison

• Individual component analysis

• Parameter sensitivity analysis

• Individual query analysis

• Significance test

– Discuss the influence of any bias or limitation

• Tips– Don’t leave any question unanswered (try to provide an explanation for

all the observed results)

– Discuss your findings in the context of existing work if possible • Similar observations have also been made in …

• This is in contrast to … observed in … One explanation is ….

Page 62: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 62

Typical Structure of a Research Paper(cont.)

• 8. Conclusions and future work

– Summarize your contributions

– Discuss its potential impact

– Discuss its limitation and point out directions for future work

• 9. References

Page 63: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 63

Tips on Polishing your Paper

• Start with the core messages you want to convey in the paper and expand your paper by following the core story

• Try to convey the core messages at different levels so that people with different knowledge background can all get them

• Try to write a review of your paper yourself, commenting on its originality, technical soundness, significance, evaluation, etc, and then revise the paper if needed

• Check out reviewer’s instructions, e.g., the following: http://nips07.stanford.edu/nips07reviewers.html (not necessarily matching your conference, but should share a lot of common requirements)

• Try to polish English as much as you can

Page 64: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 64

What an IR reviewer often looks for• Most important factors:

– Realistic setup of a retrieval problem

• What kind of users would benefit from your research?

– Solid evaluation of methods

• Truly state of the art baseline

• Careful selection of data sets

– Use as many representative data sets as possible

– Always use a standard data set (e.g., TREC) if possible

• Careful definition of measures

• Unbiased experiment procedure

• General factors:

– Quality of argument, novelty, writing, …

– Avoid all kinds of careless mistakes! (If you aren’t careful about writing, it’s possible you aren’t careful about your experiments either.)

Page 65: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 65

Where to Publish IR Papers• Core IR conferences:

– ACM SIGIR, ACM CIKM

– ECIR, AIRS

• Core IR journals– ACM TOIS, IRJ

– IPM, JASIS

• Web Applications– WWW, WSDM

• Other related conferences– Natural Language Processing: HLT, ACL, NAACL, COLING, EMNLP

– Machine Learning: ICML, NIPS

– Data Mining: KDD, ICDM

– Databases: SIGMOD, VLDB, ICDE

• …

Page 66: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 66

After You Get Reviews Back• Carefully classify comments into:

– Unreasonable comments (e.g., misunderstanding):• Try to improve the clarity of your writing

– Reasonable comments • Constructive: easy to implement

• Non-constructive: think about it, either argue the other way or mention weakness of your work in the paper

• If paper is accepted– Take the last chance to polish the paper as much as you can

– You’ll regret if later you discover an inaccurate statement or a typo in your published paper

• If paper is rejected– Digest comments and try to improve the research work and the paper

– Run more experiments if necessary

– Don’t try to please reviewers (the next reviewer might say something opposite); instead use your own judgments and use their comments to help improve your judgments

– Reposition the paper if necessary (again, don’t reposition it just because a reviewer rejected your original positioning)

Page 67: 2008 © ChengXiang Zhai 1 Introduction to Research ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign czhai,

2008 © ChengXiang Zhai 67

Summary • Research is about discovery and increase our knowledge

(innovation & understanding)

• Intellectual curiosity and critical thinking are extremely important

• Work on important problems that you are passionate about

• Aim at becoming a top expert on one topic area

– Obtain complete knowledge about the literature on the topic (read all the important papers and monitor the progress)

– Write a survey if appropriate

– Publish one or more high-quality papers on the topic

• Don’t give up!

• Good luck!