a natural language interface for crime-related spatial queries chengyang zhang, yan huang, rada...
TRANSCRIPT
A Natural Language Interface for Crime-related Spatial Queries
Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector CuellarDepartment of Computer Science and Engineering
University of North Texas
ISI 2009 Presentation
ISI 2009
Motivation
• Related Work
• Proposed Method
• System Evaluation
Outline
ISI 2009
• The databases and query interfaces hosted by Federal and state justice departments are heterogeneous and complicated.
Motivation
ISI 2009
• Need tools for crime-related spatial queries.
Motivation
Find a police office near the school1
Find a house in neighborhood with low crime rate2
ISI 2009
• Neither web forms nor keyword search has the expressive power and flexibility desired in crime-related spatial queries.
• But natural language does!• No need for training• No need for proprietary user interface or esoteric formal language like SQL or Xquery• Ideal for ad-hoc real time query in emergency conditions
Motivation
ISI 2009
We propose a method to translate crime-related natural language spatial queries into spatial data queries
We implement a prototype query system
Experiments show that the system achieves results significantly better than those obtained by using Google Maps.
Our Contributions
ISI 2009
Motivation
Related Work
• Proposed Method
• System Evaluation
Outline
CSCE 5290
Related Work
• Syntax-based methods[3-4] use template or grammar rules to match natural language sentences into database schemas
• Simple but not scalable• Sometimes may lead to serious errors
• Semantic Parsing algorithms[5-9] preserve syntactic dependencies, but also seek to enforce semantic constraints over the possible mappings
• The quality of mapping is significantly improved• Precise system in [9] focused on high precision only
CSCE 5290
Related Work
• Lambda-calculus encoding can be used as the intermediate representation between natural language and database queries.[10]
• Training corpus is used to derive lexicons and grammars for the specific domain• The approach was found to lead to good results
• Structure of XML documents can be used to match natural language parse trees.[11]
• Identify a meaningful lowest common ancestor structure (MLCAS) from the tree structure
• Includes an interactive component to receive help from the user when formulating the query
ISI 2009
Motivation
Related Work
Proposed Method
• System Evaluation
Outline
ISI 2009
System Framework
ISI 2009
1. Part of Speech Tagging
• In POS tagging, we employ the classic Viterbi algorithm.• dynamic programming framework coupled with a Markov assumption• Efficient and widely used• Use manually labeled Penn Treebank Dataset for training purpose
Running Example:
I wish to find a police department within 2 miles of a law court
POS Tagging:
I/NP wish/VB to/IN find/VB a/DT police/NN department/NN
within/IN 2/CD miles/NNS of/IN a/DT law/NN court/NN
ISI 2009
2. Semantic Parsing
• In semantic parsing, we identify three type of “key words” using the parsing tree.
• Target object• Spatial predicate• Reference object
Example Parsing tree:
ISI 2009
2. Semantic Parsing
Running Example:
I wish to find a police department within 2 miles of a law court
Semantic Parsing:
Target Object: police department
Spatial predicate: within 2 miles
Reference object: law court
ISI 2009
3. Schema Matching
• In schema matching, we try to match target and reference spatial objects from the backend spatial database using
• Table name• Attribute name• Content of the database
• We then perform a spatial join for each retrieved candidate pair based on spatial predicate
ISI 2009
Motivation
Related Work
Proposed Method
System Evaluation
Outline
ISI 2009
Query Interface
ISI 2009
Experimental Evaluation
• Database contains real spatial data obtained from City of Denton• 32 tables• Including crime-related objects such as police office, law courts
• Gold standard: human prepared answers for 30 different crime-related queries.
• Baseline: Top 10 answers from Google Maps
• Result:
ISI 2009
Summary
• We proposed a method to build a natural language interface to spatial database queries. The prototype system demonstrated effectiveness of our approach in crime-related spatial queries.
• In our future work, we plan to extend our system by increasing the dataset size, and improving the accuracy of the tagging and parsing algorithms. We will collect more user queries and improve the system performance based on a larger evaluation dataset.
ISI 2009
References1. http://maps.google.com/
2. http://maps.met.police.uk/
3. I. Androutsopoulos, G. Ritchie, and P. Thanisch, “Natural language interfaces to databases – an introduction,” Journal of Natural Language Engineering, vol. 1, no. 1, 1995.
4. W. Woods, R. Kaplan, and B. Webber, “The Lunar sciences natural language information system,” Bolt Beranek and Newmann, Tech. Rep.,1972.
5. R. Ge and R. J. Mooney, “A statistical semantic parser that integrates syntax and semantics,” in Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), Ann Arbor, MI, Jul. 2005, pp. 9–16.
6. R. J. Kate and R. J. Mooney, “Using string-kernels for learning semantic parsers,” in Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL-06), Sydney, Australia, July 2006, pp. 913–920.
7. R. J. Mooney, “Learning for semantic parsing,” in Computational Linguistics and Intelligent Text Processing: Proceedings of the 8th International Conference, CICLing 2007, Mexico City, A. Gelbukh, Ed. Berlin: Springer Verlag, 2007, pp. 311–324.
8. Y. Wong and R. J. Mooney, “Learning for semantic parsing with statistical machine translation,” in Proceedings of Human Language Technology Conference / North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT-NAACL-06), New York City, NY, 2006, pp. 439–446.
9. A. Popescu, A. Armanasu, and O. Etzioni, “Modern natural language interfaces to databases: Composing statistical parsing with semantic tractability,” in Proceedings of the 20st International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, 2004.
10. L. Zettlemoyer and M. Collins, “Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars,” in Proceedings of the Twenty First Conference on Uncertainty in Artificial Intelligence (UAI-05), 2005.
11. Y. Li, H. Yang, and H. Jagadish, “NaLIX: an interactive natural language interface for querying XML,” in Proceedings of SIGMOD 2005, Baltimore, MD, 2005.