question generation for answering questions from a ... · the context of our work is an intelligent...

12
Question Generation from a Knowledge Base Vinay K. Chaudhri 1 , Peter E. Clark 2 , Adam Overholtzer 1 , and Aaron Spaulding 1 1 Artificial Intelligence Center, SRI International, Menlo Park, CA, 94025 2 Vulcan Inc., 505 Fifth Ave S., Suite 900, Seattle, WA, 98104 Abstract. When designing the natural language question asking interface for a formal knowledge base, managing and scoping the user expectations regarding what questions the system can answer is a key challenge. Allowing users to type ask arbitrary English questions will likely result in user frustration, because the system may be unable to answer many questions even if it correctly understands the natural language phrasing. We present a technique for responding to natural language questions, by suggesting a series of questions that the system can actu- ally answer. We also show that the suggested questions are useful in a variety of ways in an intelligent textbook to improve student learning. 1 Introduction Creating natural language interfaces for knowledge bases and databases is an actively studied problem for knowledge acquisition systems [9–11, 13, 14]. The design choices for these interfaces range from supporting a full-fledged natural language interface to using a formal query language such as SPARQL. In between lie options such as us- ing a controlled natural language interface [10] or an interface guided by the ontology [14]. Using a full natural language interface is compelling, as it requires no training, but creates the obvious problem of the system not always being able to understand the question. Moreover, even if the system correctly understood the question, the knowl- edge base may be unable to answer the question. In this paper we describe a suggested question (SQ) facility that in response to the user’s natural language questions, suggests questions that the system can actually answer. Such a facility both scopes the user’s expectations, and minimizes the training requirements. The context of our work is an intelligent textbook called Inquire Biology [3] that helps students to learn better. Inquire includes a curated biology knowledge base [8] and a reasoning system [15] for answering questions. This application required that the training for its users, who are students, be close to zero. Several studies with students have found both Inquire and the SQ facility reported here to be effective in practice [3]. We begin this paper by giving examples of different ways in which SQs are used in Inquire. We then give background on the knowledge base and the questions that can be answered by using it. Next, we describe our approach to question generation. Finally, we consider lessons learned and conclude with a summary. 2 Suggested Questions in Inquire First and foremost, the SQs are used to auto-complete and suggest alternative ques- tions in response to a user’s natural language question input. In Figure 1, we show the

Upload: nguyentuyen

Post on 02-Apr-2018

220 views

Category:

Documents


3 download

TRANSCRIPT

Question Generation from a Knowledge Base

Vinay K. Chaudhri1, Peter E. Clark2, Adam Overholtzer1, and Aaron Spaulding1

1 Artificial Intelligence Center, SRI International, Menlo Park, CA, 940252 Vulcan Inc., 505 Fifth Ave S., Suite 900, Seattle, WA, 98104

Abstract. When designing the natural language question asking interface for aformal knowledge base, managing and scoping the user expectations regardingwhat questions the system can answer is a key challenge. Allowing users to typeask arbitrary English questions will likely result in user frustration, because thesystem may be unable to answer many questions even if it correctly understandsthe natural language phrasing. We present a technique for responding to naturallanguage questions, by suggesting a series of questions that the system can actu-ally answer. We also show that the suggested questions are useful in a variety ofways in an intelligent textbook to improve student learning.

1 Introduction

Creating natural language interfaces for knowledge bases and databases is an activelystudied problem for knowledge acquisition systems [9–11, 13, 14]. The design choicesfor these interfaces range from supporting a full-fledged natural language interface tousing a formal query language such as SPARQL. In between lie options such as us-ing a controlled natural language interface [10] or an interface guided by the ontology[14]. Using a full natural language interface is compelling, as it requires no training,but creates the obvious problem of the system not always being able to understand thequestion. Moreover, even if the system correctly understood the question, the knowl-edge base may be unable to answer the question. In this paper we describe a suggestedquestion (SQ) facility that in response to the user’s natural language questions, suggestsquestions that the system can actually answer. Such a facility both scopes the user’sexpectations, and minimizes the training requirements.

The context of our work is an intelligent textbook called Inquire Biology [3] thathelps students to learn better. Inquire includes a curated biology knowledge base [8]and a reasoning system [15] for answering questions. This application required that thetraining for its users, who are students, be close to zero. Several studies with studentshave found both Inquire and the SQ facility reported here to be effective in practice [3].

We begin this paper by giving examples of different ways in which SQs are used inInquire. We then give background on the knowledge base and the questions that can beanswered by using it. Next, we describe our approach to question generation. Finally,we consider lessons learned and conclude with a summary.

2 Suggested Questions in Inquire

First and foremost, the SQs are used to auto-complete and suggest alternative ques-tions in response to a user’s natural language question input. In Figure 1, we show the

question answering dialog box which also illustrates the questions that are suggested inresponse to a user’s input. In this example, the user’s question is well-formed and willbe answered as it is, but in many cases, the question as typed cannot be answered. Thesuggested questions provide alternative questions that are closely related to the conceptsin the user’s questions and that are known to be answerable by the system.

Fig. 1. Suggesting questions in response to a user’s questions

Second, as the user is reading the textbook, and creates a highlight, the system willgenerate questions that relate to the highlighted text. This feature is illustrated in Fig-ure 2 in which the user highlighted the word Mitochondrion. In response to this high-light, the system automatically generated questions that are shown in the right margin.

Finally, when the user views the answer to a question, or views the summary ofa concept, we embed a set of SQs that can provide further information. In Figure 3,we show the concept of Mitochondrion, and the suggested follow-up questions in thebottom right hand corner.

3 Knowledge Base and Question Answering Capability

Our system uses a knowledge representation that has many of the standard featuressuch as classes, individuals, class-subclass hierarchy, disjointness, slots, slot hierarchy,necessary and sufficient properties, and Horn rules [5]. Although knowledge engineerscan edit any portion of the KB, domain experts author only existential rules [1] througha graphical user interface. An example of a rule authored by the domain experts isshown in Figure 4. In this graph, the root node is Mitochondria and is shown in whitebackground and is universally quantified. Every other node in the graph is existentiallyquantified. For example, this graph asserts that for every instance of a Mitochondria,there exists an Enzyme and a Mitochondrial-Matrix such that Mitochondria has-part

Fig. 2. Suggesting questions while the student is reading the textbook. The textbook content isfrom page 110 of Biology (9th edition) by Neil A. Campbell and Jane B. Reece, Copyright 2011by Pearson Education, Inc. Used with permission of Pearson Education, Inc.

Fig. 3. Suggesting follow up questions. The images 6.17, 6.16 and 10.17 are from Biology (9thedition) by Neil A. Campbell and Jane B. Reece, Copyright 2011 by Pearson Education, Inc.Used with permission of Pearson Education, Inc.

an Enzyme, and that Mitochondria has-region Mitochondrial-Matrix, and further thatthe Enzyme is-inside the Mitochondrial-Matrix.

Fig. 4. Existential rule for Mitochondria

Our current knowledge base (KB) covers more than 5500 concepts. By the time, theproject concluded and the funding stopped, approximately 2000 had been well devel-oped. A team of biologists performed the curation and also extensively tested the KBby posing a large set of questions. An encoding effort of approximately 12 person yearswas invested by the biologists in creating this KB.

Each question answered by the system can be viewed as an abstract question tem-plate, and we currently have more than 30 such question templates. Instead of enu-merating all the question templates, we consider a few salient ones below. For eachquestion template, we first give a phrase identifying it, followed by its abstract formu-lation, and then an example. (1) Definitional questions. What is X? What is an RNA?(2) Find a value question. What is the R of X? What are the parts of a nucleus? (3) Yes-No question. Is it true that Y is R of X? Is it true that DNA is part of chloroplast? (4)Identification question. What X has the following specific properties? What cells havea nucleus? (5) Relationship question. What is the relationship between X and Y? Whatis the relation between Chromosome and Protein? (6) Comparison questions: What arethe [similarities/differences] between X and Y? What are the [similarities/differences]

between Chromatin and Gene? (7) How many questions. How many Ys are R of X?How many chromosomes does a human cell have? (8) Event relations. What is the R ofan event X? What does DNA transfer?

Reasoning processes underlying these questions have been previously published [7,6, 4]. For the purpose of this paper, we will primarily be concerned with the problemof automatically generating instantiations of the above questions that the system canactually answer.

4 Question Generation

The question generation process includes two steps: (1) crawling the knowledge baseto synthesize a question in logical form and its realization in English, and (2) rankingthe questions. Below we consider these steps in greater detail.

4.1 Crawling the KB

We crawl the KB to create instantiations of each question and generate a database ofquestions. This pre-computed database is then used at run-time to select questions tosuggest. The crawling process is different for each question type as we explain next.

For definition questions, we query the KB for all the biology specific concepts. Eachconcept name is substituted in the template “What is X?” to create a question.

For the find a value question, we process the graph corresponding to each existentialrule. The questions are generated based on the length of the path from the root nodewhich can be controlled as a parameter. For example, for the concept shown in Figure 4,the root node is Mitochondria. For a concept with root node X , if a relation R has morethan one value, (e.g., has-part), we generate the question: What are the parts of X?(Here, has-part is realized as “parts” in the English version of the question.) For aconcept with root node X , if a relation R has only one value, we generate the question:What is the R of X? This process can be recursively repeated by traversing the graphto a greater depth. For example, for a depth two, we would get questions of the form:What is R1 of Y that is R2 of X? This question is realized in English as “What are theparts of the mitochondrial membrane that are also a part of a mitochondrion?”

For a yes-no question, the process is very similar to the find a value question typeexcept that for such questions instead of querying for a value of a relation, we areinterested in testing whether a specific value is in a given relationship to the concept.An example of such a question is “Is it true that an enzyme is a part of mitochondrion?”If we wish to generate a question that has a negative answer, we can switch the correctrelationship with another relationship. For example, we could produce “Is it true that anenzyme is a region of a mitochondrion?”

The identification questions are generated for concepts that have sufficient proper-ties defined for them. An example of such a question is: What cells have a nucleus? Thisquestion is generated from the sufficient property for being a Eukaryotic-Cell whichis that any cell with a nucleus is a Eukaryotic-Cell. One challenge in generating suchquestions is that if a concept has a complex sufficient property, generating good Englishsentences corresponding to it can be very hard.

For generating a relationship question, we pick two concepts. We choose the firstconcept based on the context in which the question is to be generated. For example, ifthe question is in response to a highlighting of the text, one of the concepts must appearin the highlight. If the question will be generated for a concept summary page, thenthat concept is chosen as one of the two concepts. We choose the second concept basedon the following criteria: (a) the second concept must have either a direct or indirectrelationship to the first concept in some concept graph; and (2) the second conceptcould be a sibling of the first concept in the class taxonomy. An example of such aquestion is “What is the relationship between a mitochondrion and chloroplast?”

For generating a comparison question, we must choose two concepts. The first con-cept is chosen based on the context as it was done for the relationship question. Thesecond concept is chosen based on the following criteria: (a) the second concept is asibling of the first concept in the class taxonomy (b) the second concept is disjoint fromthe first concept (c) the second concept is related to the first concept using the same re-lationship such as has-part. An example of such a question is: “What is the differencebetween a glycolipid and a glycoprotein?” Here, both glycolipid and glycoprotein aresibling subclasses of an Amphipathic-Molecule, and are also related to a Biomembraneby a has-part relationship.

We generate a how many question based on the cardinality constraints in the KB.For every qualified number constraint present in the KB, we generate the followingquestion: How many Ys are R of X? An example of such a question is “how manychromosomes does a human cell have?”

The event centered questions require a slightly different crawling of the KB as thesequestions are based on the participants of an event. Thus, the crawling of the KB canbe viewed as breadth-first search. For example, for an event E, if we have two relationsR1 and R2 with respective values of X and Y , we can generate a question using E andX such that Y is the answer. Suppose we have a process representing virus infectionin which the agent is Virus and the object is a Cell. For this representation, we cangenerate a question such as “What does a Virus infect?”

4.2 Ranking the Questions

Using the methods described in the previous section, we populate a question database.Our question database contains more than 20,000 questions; for any given situation,several hundred questions could be relevant. Because we do not wish to overwhelm theuser, we must select most interesting questions and display them in some ranked order.

The first step in selecting the relevant questions is identifying the relevant conceptsand relations that should serve as the trigger point for selecting questions from the SQsdatabase. For suggesting questions in response to a question typed in by the user (asin Figure 1), we determine the concept names that the user has already typed in thequestion. For suggesting questions in response to a user highlight (as in Figure 2), weidentify all the concept mentions in the selected region of the text. For suggesting followup questions (as in Figure 3), we use the concepts in the questions, but we make surethat we do not suggest the question that the user has already asked. The identificationof concepts based on the question or selected region of text, as well as identification of

relevant questions based on the concept names, is done by a semantic search that con-siders lexical mappings between the text and the concept names. Based on the conceptsidentified for each situation, we search the SQ database to determine all the relevantquestions. The questions selected during this step are used for further ranking as weexplain next.

The ranking function should rank questions based on their quality, importance andrelevance to the input paragraph. Another ranking criterion should be the diversity inquestion types and KB content. Even if some question types are more interesting thanothers, or some concepts are more important, that does not mean the system shouldalways rank those types of questions at the top of the list. Students should be given abroader landscape of questions they could potentially ask the system. Ranking shouldassure diversity in question types and concepts for which the questions are generated.Whenever no clear motive exists for ranking a question higher or lower, one can rankbased on the length of the question. Short questions should be given higher preferenceover longer questions. Because if a question is too long, then either it is quite likely toocomplex, or the SQ system may fail to properly phrase the question.

Our approach for ranking based on question types was to associate a numeric scorewith each question type. Our ranking scheme and the rationale for it is summarizedin Table 4.2. A lower numeric score indicates higher interest. These rankings wereprovided by the biology teachers on our development team.

Table 1. Ranking question types

Question type Rank RationaleEvent centered 0.3 They are complex and short, and therefore, considered interestingIdentification 0.4 Very small in number and important for the definition of a conceptHow many 0.8 Very small in number, so wherever they exist, we use themComparison 1 Very important for learning and educationFind a value 2 These questions are important, but there are too many of themRelationship 4 Overall a good question but there can be too many all of which are not

interestingYes/No 6 The answer to these questions is not very interesting, and there are too

many of theseDefinitional 8 Definitions are easily available through user interface

With each question we associate two ranking values: (1) The first ranking valueis referred to as rank within a concept, and is a rank of that question within all thequestions that are generated for that concept and an ordering based on the length ofquestion. For a given concept, we expect that multiple questions of each type exist; (2)The second rank referred to as rank within a question type, and is a rank of a questionwithin the questions for a concept that are of the system type. We compute the overallscore of a question as a product of rank within the concept, rank within a question typeand the overall rank of a question type based on the ranking in Table 4.2.

5 Evaluation

There are two kinds of evaluations of interest: (1) Given a paragraph, to what extent doesthe SQ facility generate questions that are educationally useful? (2) For the questions

generated by the SQ facility, how many of the questions are educationally useful andranked in the order of interest? To evaluate each of these aspects, we designed twoseparate evaluations that we discuss next.

5.1 Testing coverage of educationally useful questions

We selected a page with six paragraphs from the biology textbook. We chose the pagein a way that its content was adequately represented in the KB to provide a good basisfor question generation. We asked two biology teachers to generate questions that theythought were important for each of the six paragraphs. We asked them to generate asmany important questions as they could findl; however, a minimum of three questionsper paragraph was expected. The questions were to be asked from the perspective of astudent who wants to learn, or who is curious about some new information that is notpresent in the current paragraph, but is related to it. Further analysis of these questionsrevealed what makes a good and interesting question in the given contexts.

After the first task was completed, we computed how many of the questions gener-ated during this process were also present in the questions generated by the SQ facility.The following table summarizes those results.

Table 2. Coverage of educationally useful questions

Questions that can be generated and answered 29Questions that could be potentially generated and answered 35Questions that are too complex to generate 27Questions that are too vague 8Total number of questions 99

The above data show that of 99 questions authored by biologists, 8 questions weretoo vague. An example of a vague question is: “Is there anything else in nucleus besidesgenes?”. This question was considered too vague as it is too open ended about whatis being asked. Thus, 91 questions were deemed of good quality. The SQ system’scurrent capability could generate 29 of these questions. An example of such a questionis: “What is the difference between a nuclear matrix and a nuclear lamina?” This isa comparison question and can be easily generated and answered by the system. Thesystem could potentially generate 35 more questions by minor extensions to the currentcapability. An example of such a question is: “During what phase of the cell cycledoes chromatin condense into chromosomes?” This is an event-oriented question thatspecifies the context of the cell cycle and that requires an extension to the current SQfacility. Some questions produced by the biologists were too difficult to generate and toanswer by the system. An example of such a question is: “During transcription, whatis the advantage in synthesizing RNA instead of DNA?” Our system does not currentlyrepresent such information. Thus, of the 91 good-quality questions, 64 questions (i.e.,70%) were in an easy reach of the current SQ capability, making it a viable way toformulate questions in this domain. The remaining questions would not be answerableby the system even if they could be properly formulated and understood by the system.

5.2 Evaluating the quality of generated questions

For the selected page, we computed the questions produced by the SQ facility. Eachquestion was labeled by the system as relevant or not relevant for each paragraph. Thebiology teachers were asked to rate questions for each paragraph that the system deemedrelevant on a three-point scale: (1) The question is definitely important. (This questionis useful and could be presented to the student.) (2) This is a mediocre question. (Thisquestion might have a good answer but is probably not very relevant information. Thisquestion may also have good content but is a poorly formed question.) (3) This is nota good question. (This question should be thrown out because the answer is irrelevantinformation to a student.) Further, from the set of questions that were rated a “1”, theraters were asked to select the top three questions for each paragraph.

The system found 38 concepts in the selected page, for which it generated a setof 376 unique questions. For these questions, we asked two different raters to giverating data points. Their ratings were aggregated, and the overall scores were as follows:28.7% questions received a score of 1; 43.6% of questions received a score of 2; and27.7% of questions received a score of 3. Because the total number of questions was376, this gave us 108 questions for this page that received a score of 1. This set provideda good starting point for questions that we could use in our system right away and alsoleft ample room for further improvement.

Let us consider examples of questions from each category. As an example of aquestion rated “1” by both biologists, consider “What is the relationship between achromosome and a DNA?” This question gets at the deep knowledge associated withthe structural relationship between these two entities. As an example of a question rated“2” by both biologists, consider: “What is the relationship between a DNA and deoxyri-bose?” This question was rated a “2” because the biologists felt that it questioned twovery closely related entities. As an example of question rated as “3”, consider: “Is ittrue that translation termination is a sub step of termination?” This question was notconsidered particularly deep and educationally useful by the biologists.

6 Related Work

Considerable recent interest has been shown in generating questions for intelligent tu-toring systems (For example, see http://www.questiongeneration.org). As we have seenin our work, question generation has two distinct aspects: (1)generating the question and(2) identifying which question is most useful to a user in a given situation. A key differ-ence between our work and these related efforts is that in our system, the questions aregenerated only from a curated knowledge base, while most question generation systemsattempt question generation starting from English text [19, 20]. Our goals differ in thatwe strive to provide a natural querying interface to a knowledge base that scopes userexpectations about what the system can do. Perhaps, the work closest in spirit to ours isa recent effort to generate questions from linked data [12]. The approaches to rankingthe SQs range from purely lexical criteria for the quality of the generated questions [16,20] to ranking based on pedagogical goals [18]. Our work on ranking falls somewherein between these two approaches as we determine the ranking of questions based on the

current capabilities of our knowledge based system and the empirical feedback fromteachers and students.

Although users would prefer having a full natural language interface for accessinga knowledge base [17], few deployed system have seen a high degree of success. Thework we reported here builds on our previous work [10] to provide a controlled naturallanguage interface to a knowledge base. Although the previous interface was effectivefor users who could be given 4-6 hours of training, it still resulted in frequent awkwardquestion formulations. In many cases, even when the system correctly understood thequestion, the reasoning system could not answer the question, causing significant frus-tration to the user. In an approach based on question generation, we are guaranteed thatthe system can indeed answer the question. Because the user is relieved of formulatingthe question into a controlled English, the system requires no training. A major disad-vantage of this approach is that it is criticized for being too limited because it does nothandle the full natural language statement of questions.

7 Recent Work

Since the initial design and evaluation reported here, the suggested question facilityhas been substantially enhanced. The system now generates all the questions that werepreviously marked as could be generated with minor extension. We achieved this bysubstantially enhancing the number of question templates in the system. The currentsystem contains over 90 question templates. In spite of the substantial enhancements,the basic design of the question generation mechanism that has been described earilerin this paper has remained the same.

We have also conducted a study with end-user students that suggested that the SQcapabilities were well used and liked by students. A complete description of this studyhas been published previously [3], and therefore, we only summarize the salient pointshere.

We recruited current community-college biology students (n=25) who were study-ing Biology and trained them on the use of Inquire, including strategies for using thequestion answering features. The students were given one hour to read a section of thetext, 90 minutes to answer a series of questions representative of a typical homeworkassignment, and 20 minutes to complete a closed book quiz on the content they had justread.

Compared to control groups with a print text book (n=24) or a version of Inquirewithout suggested questions and the question answering capability (n=24), the partici-pants with the full SQ and QA functionality scored significantly better on the quiz (fullInquire vs. Ablated Inquire: 0.0015 p-value from a 2 tailed t-test; Full Inquire vs. Text-book: 0.0535) and homework (Full Inquire vs. Textbook: 0.0187). In the course of thestudy, the 25 students in the Full Inquire condition made heavy use of the SQ/QA capa-bilities, with Inquire answering 363 questions, with 194 unique questions. 61 questionswere asked from the ”blue card suggestions, and 60 from the related questions. Duringa post session debrief, all participants remarked that they relied on the autocomplete ca-pability when asking questions, largely as a way of insuring that a question was properlyformed.

8 Future Work and Summary

The SQ facility must evolve with the question answering facility of the system. As newquestion types are added, the question types for the SQ facility should be accordinglyexpanded. Although the current system always generates the same set of questions fora given highlight, one can imagine a generalization of this capability in which differ-ent sets of questions are suggested for different purposes. For example, the questionssuggested when the user is reading the book for the first time could be different fromthe questions that are suggested when the user is studying for an exam. The questionssuggested at the time of reading can serve both to review previously read information oras a self-test tool to assess whether the student has understood the material. More gen-erally, there is a tremendous potential to devise novel interactive dialogs based on thesequestions which are sensitive to the current knowledge and learning goals, and are alsodesigned to be instructive to the student by teaching them what are the right questions toask. Although the current system produces reasonable English sentences, the quality ofthe English can be further improved by using a good natural language generation sys-tem [2]. The current system restricts itself to only a single sentence question; generatingmulti-sentence questions is open for future research. The ranking function can also befurther tuned to specific pedagogical goals. Finally, we need to ensure the scalability ofthese methods when the size of the KB increases to the scope of full textbook, and thenumber of question templates is expanded by an order of magnitude.

In summary, we have described a practical approach for constructing a natural lan-guage query front end for a knowledge base. This work was conducted in the contextof an intelligent textbook application that was required to be a walk up and use system.Our approach was based on generating the questions by crawling the knowledge base.In response to a free form question typed by the user, the system suggests most closelymatching question that it can actually answer. Our evaluation results showed that theresulting capability provides good coverage of educationally useful questions and pro-duces good quality questions. We believe an approach based on SQs is a deployablemethod for querying complex information that strikes an ideal balance between ease ofuse, user expectations, and implementation feasibility.

Acknowledgments

This work has been funded by Vulcan Inc. and SRI International. We thank the membersof the AURA development team for their contributions to this work. We are grateful toMihai Lintean whose summer project with us provided the foundation for this work.

References

1. Jean-Francois Baget, Michel Leclere, Marie-Laure Mugnier, and Eric Salvat. On rules withexistential variables: Walking the decidability line. Artificial Intelligence, 175(9):1620–1654, 2011.

2. Eva Banik, Eric Kow, and Vinay K. Chaudhri. User-Controlled, Robust Natural LanguageGeneration from an Evolving Knowledge Base. In ENLG 2013 : 14th European Workshopon Natural Language Generation, 2013.

3. Vinay K Chaudhri, Britte Cheng, Adam Overholtzer, Jeremy Roschelle, Aaron Spaulding,Peter Clark, Mark Greaves, and Dave Gunning. Inquire Biology: A Textbook that AnswersQuestions. AI Magazine, 34(3), September 2013.

4. Vinay K. Chaudhri, Nikhil Dinesh, and Craig Heller. Conceptual Models of Structure andFunction. Technical report, SRI International, 2013.

5. Vinay K. Chaudhri, Stijn Heymans, and Son Cao Tran Michael Wessel. Object-OrientedKnowledge Bases in Logic Programming. In Technical Communication of InternationalConference in Logic Programming, 2013.

6. Vinay K. Chaudhri, Stijn Heymans, Adam Overholtzer, Aaron Spaulding, and Michael Wes-sel. Large-scale analogical reasoning. In Proceedings of AAAI-2014 Conference, 2014.

7. Vinay K. Chaudhri, Stijn Heymans, Michael Wessel, and Son Cao Tran. Query Answering inObject Oriented Knowledge Bases in Logic Programming. In Workshop on ASP and OtherComputing Paradigms, 2013.

8. Vinay K. Chaudhri, Michael A. Wessel, and Stijn Heymans. KB Bio 101: A challenge forOWL reasoners. In The OWL Reasoner Evaluation Workshop, 2013.

9. Philipp Cimiano, Peter Haase, Jorg Heizmann, Matthias Mantel, and Rudi Studer. Towardsportable natural language interfaces to knowledge bases–the case of the ORAKEL system.Data & Knowledge Engineering, 65(2):325–354, 2008.

10. Peter Clark, Ken Barker Jason Chaw, Vinay K. Chaudhri, Phil Harrison, James Fan, BonnieJohn, Bruce Porter, Aaron Spaulding, John Thompson, and Peter Yeh. Capturing and answer-ing questions posed to a knowledge-based system. Proceedings of The Fourth InternationalConference on Knowledge Capture (K-CAP), October 2007.

11. Danica Damljanovic and Kalina Bontcheva. Towards enhanced usability of natural languageinterfaces to knowledge bases. In Web 2.0 & Semantic Web, pages 105–133. Springer, 2009.

12. Mathieu d’Aquin and Enrico Motta. Extracting relevant questions to an rdf dataset usingformal concept analysis. In Proceedings of the sixth international conference on Knowledgecapture, pages 121–128. ACM, 2011.

13. Gregory Distelhorst, Vishrut Srivastava, Cornelius Rosse, and James F Brinkley. A proto-type natural language interface to a large complex knowledge base, the foundational modelof anatomy. In AMIA Annual Symposium Proceedings, volume 2003, page 200. AmericanMedical Informatics Association, 2003.

14. Enrico Franconi, Paolo Guagliardo, and Marco Trevisan. An intelligent query interface basedon ontology navigation. In Proceedings of the Workshop on Visual Interfaces to the Socialand Semantic Web (VISSW 2010), volume 565. Citeseer, 2010.

15. David Gunning, Vinay K. Chaudhri, Peter Clark, Ken Barker, Shaw-Yi Chaw, Mark Greaves,Benjamin Grosof, Alice Leung, David McDonald, Sunil Mishra, John Pacheco, Bruce Porter,Aaron Spaulding, Dan Tecuci, , and Jing Tien. Project Halo update: Progress toward DigitalAristotle. AI Magazine, Fall 2010.

16. Michael Heilman and Noah A Smith. Good question! Statistical ranking for question genera-tion. In Human Language Technologies: The 2010 Annual Conference of the North AmericanChapter of the Association for Computational Linguistics, pages 609–617. Association forComputational Linguistics, 2010.

17. Esther Kaufmann and Abraham Bernstein. How useful are natural language interfaces to thesemantic web for casual end-users? In The Semantic Web, pages 281–294. Springer, 2007.

18. Ming Liu, Rafael A Calvo, and Vasile Rus. Automatic question generation for literaturereview writing support. In Intelligent Tutoring Systems, pages 45–54. Springer, 2010.

19. Andrew M Olney, Arthur C Graesser, and Natalie K Person. Question generation fromconcept maps. Dialogue & Discourse, 3(2):75–99, 2012.

20. Xuchen Yao, Gosse Bouma, and Yi Zhang. Semantics-based question generation and imple-mentation. Dialogue & Discourse, 3(2):11–42, 2012.