lami spring 2014
DESCRIPTION
LAMI Spring 2014. Search Engine and Services. Presented by Edgar Cornejo 03.03.14. Outline. Mobile information search for location-based information Web-a-Where: Geotagging Web Content - PowerPoint PPT PresentationTRANSCRIPT
Presented by
Edgar Cornejo03.03.14
LAMISpring 2014
Search Engine and Services
Outline
Mobile information search for location-based information
Web-a-Where: Geotagging Web Content
The design and implementation of SPIRIT: a spatially-aware search engine for
information retrieval on the Internet
Mobile information search for location-based information
Department of Industrial Engineering Tsinghua University
Beijing, ChinaApril 2010
Chengyi Liu · Pei-Luen Patrick Rau · Fei Gao
Mobile search for location-based information
Mobile information search for location-based information
The study investigated the effects of location and information type in mobile searching for location-based information by carrying out two experiments in an airport
Mobile search scenario
High time
pressure
Many environment
al disturbance
s
Device limitation
s (screen size, input
method)
Restricted users’
operations
Mobile information search for location-based information
Mobile searching context
Information queries
+ location
More suitable results
Mobile information search for location-based information
Since most of the information is location-based [1,2], the results can be improved by analyzing information queries and location
Search Engine
Features of mobile interaction [3]
Mobile information search for location-based information
User's hands are often used to manipulate physical objects
Users may be involved in tasks that demand a high level of visual attention
Features of mobile interaction [3]
Mobile information search for location-based information
Users may be highly mobile during the task and have high-speed interaction
Search queries
Mobile information search for location-based information
Query Type Purpose Share*
Navigational query to reach a particular site 29.4%
Informational query to find information 10.2%
Transactional query
to visit a site and perform some web-mediated activity
60.4%
*According to a large scale study of European mobile search behavior developed in 2008 [4]
Factors proposed that may influence the mobile information
search
Experiment 1 - Hypotheses
Mobile information search for location-based information
Hypothesis 1For information searches in mobile versus non-mobile: The average of clicks in mobile is less The first search is more important Free recall is worse
Experiment 1 - Hypotheses
Mobile information search for location-based information
Hypothesis 2For information searching about location-based with respect to non-location-based information The number of clicks is less The first search result is more important Free recall is better
Experiment 1 - Tasks
Experiment 1 - Results
Mobile information search for location-based information
Hypothesis 1
The intention was to find how the user’s context (mobile vs. non-mobile) might affect the user’s information searching performance The average of clicks in mobile are
less False
The first search is more important False
Free recall is worse False
Experiment 1 - Results
Mobile information search for location-based information
Hypothesis 2The intention was to examine how the information type (location-based vs. non-location-based) might affect the user’s information searching performance The average of clicks in mobile are
less True
The first search is more important True
Free recall is better True
Experiment 2 - Hypotheses
Mobile information search for location-based information
Hypothesis 3For mobile information searching under high pressure with respect to low pressure info requirement: Average number of clicks are less The first search result is more important Free recall is worse
Experiment 2 - Hypotheses
Mobile information search for location-based information
Hypothesis 4For mobile information searching of informational or navigational with respect to transactional queries Number of clicks is greater The first search result is less important Free recall is worse
Experiment 2 - Tasks
Experiment 2 - Result
Mobile information search for location-based information
Hypothesis 3
The intention was to examine how the information pressure (high vs. low) requirement might affect a user’s mobile search performance
The average of clicks is less True
The first search is more important False
Free recall is worse False
Experiment 2 - Results
Mobile information search for location-based information
Hypothesis 4The intention was to examine how the how the location-based information type (informational, navigational vs. transactional) might affect a user’s mobile search performance. The average of clicks is greater True The first search results are less
important True
Free recall is worse True
Summary
Mobile information search for location-based information
Information type (location-based vs. non-location-based) was found to be effective in user performance during the information search process
Information requirement pressure and location-based information type (navigational, informational and transactional) affect the mobile search process
The first two search results were found to be very important to good search efficiency and good user satisfaction
Web-a-Where: Geotagging Web Content
Einat Amitay · Nadav Har’El Ron · Sivan Aya Soffer
IBM Haifa Research LabHaifa 31905, Israel
July 2004
Web-a-Where: Geotagging Web Content
Web-a-Where: Geotagging Web Content
Is a system for associating geography with Web pages
Locates mentions of places and determines the place each name refers to
Assigns to each page a geographic focus a locality that the page discusses as a whole
Implemented within the framework of the IBM WebFountain data mining system
Web-a-Where: Geotagging Web Content
Web-a-Where: Geotagging Web Content
Pages may have two types of geography associated with it: a source and a target.
Source geography has to do with the origin of the page, the physical location, address of its author, etc.
Target geography is determined by the contents of the page and relates to the topic the page is discussing.
Ambiguities
Web-a-Where: Geotagging Web Content
Geo/non-geo ambiguity is the case of a place name having another, non geographic meaning e.g. Mobile (Alabama) or Reading (England)
Geo/geo ambiguity arises when two or more distinct places have the same name
System Components
Web-a-Where: Geotagging Web Content
Geotagger (Main component)
Finds and disambiguates geographic names
Assigns a taxonomy node to each phrase in the text to refer to a place e.g., Paris/France/Europe
The gazetteer
Database that keeps the list of geographic names, their canonical taxonomies and other information
Tagging individual place names
Web-a-Where: Geotagging Web Content
The processing of a page is done in three phases:
Spotting DisambiguationFocus
determination
1. Spotting place name candidates
Web-a-Where: Geotagging Web Content
Finding all the possible geographic names in each page
Short abbreviations are not spotted e.g. IN (for Indiana) or AT ( for Austria) but used to help disambiguate other spots e.g. Gary, IN
2. Disambiguating spots (Algorithm)
Web-a-Where: Geotagging Web Content
The geotagger assigns a unique meaning to spots that can be uniquely qualified. Confidence 95%
Combinations that are not unique are left unassigned
In a page with multiple spots with the same name where only one is qualified, this value is assigned to the others. Confidence 80%
Disambiguation contexts are also used to unassigned spots with confidence less than 70%
2. Disambiguating spot (Data sources)
Web-a-Where: Geotagging Web Content
The Geographic Names Information System (GNIS) for U.S. locations
world-gazetteer.com for non-U.S. locations
United Nations Statistic Division (UNSD) for countries and continents
ISO 3166-1 for country and other abbreviations
3. Focus determination
Web-a-Where: Geotagging Web Content
The basic idea is that if several cities from the same region are mentioned, probably this region is the focus
Sometimes cannot be said that a page has only one focus
The confidence score should be taken into account when finding the focus, giving higher weight to information coming from locations with higher confidence
Example
Web-a-Where: Geotagging Web Content
A certain page contained four mentions of Orlando/Florida (assigned confidence 0.5), three Texas (0.75), eight Fort Worth/Texas (0.75), three Dallas/Texas (0.75), one Garland/Texas (0.75), and one Iraq (0.5)A human was asked to judge what is the geographical focus of this page and responded with “It’s about Texas and perhaps also Orlando” Indeed, that page comes from the “Orlando Weekly” site, in a forum titled “Just a look at The Texas Local Music Scene...”
Evaluating geotagging precision
Web-a-Where: Geotagging Web Content
Collection Number of pages Accuracy
Arbitrary collection 200 81,7%
.GOV collection 200 73,3%
Open Directory Project (ODP) 200 63,1%
Geotags assigned automatically versus defined manually
Evaluating focus
Web-a-Where: Geotagging Web Content
92% Correct up to country level
8% Incorrect country
38% Precise match
30% Correct state
or city
24% Correct country
4%Correct
continent
4%Continent
wrong
Comparison of Web-a-Where-determined focus to human-determined one (ODP) for ~1 million pages
Summary
Web-a-Where: Geotagging Web Content
The system is able to correctly tag individual name place occurrences 80% of the time and define correct focus of a page 92% of the time
Accuracy can be further improved
The main source of errors is geo/non-geo ambiguity
The design and implementation of SPIRIT
Ross Purves, Paul Clough, Christopher Jones, Avi Arampatzis, Benedicte Bucheri, David Finch, Gaihua Fu,
Hideo Joho, Awase Hhirni Syed, Subodh Vaid and Bisheng Yang
Department of Geography, University of Zurich, SwitzerlandDepartment of Information Studies, University of Sheffield, UK
School of Computer Science, Cardiff University, UKInstitute of Information and Computing Sciences, Utrecht University,
NetherlandsLaboratoire COGIT - Institut Geographique National, France
August 2007
The design and implementation of SPIRIT
The design and implementation of SPIRIT
This paper describes the design and implementation of a complete solution to geographic information retrieval
Requirements
The design and implementation of SPIRIT
Exhaustive retrieval of relevant documents in a specified area
Place names should be automatically identified, and interactively disambiguated
Ability to query for geographical areas whose boundaries are imprecise
Requirements
The design and implementation of SPIRIT
Spatial concepts relating different geographic entities should be represented (outside, in)
It should be possible for users to specify the area of interest on a map
Ability to view query results on a map linked to relevant web documents
Document ranking should combine both spatial and thematic aspects of document relevance
Architecture Overview
The design and implementation of SPIRIT
User interface Broker Relevance
ranking
IndexesTextualSpatial
Web data collection
documentsSearch Engine
Geographical
ontology
Metadata Doc-to-
footprint mapping
Query disambiguationQuery expansion
Rank results
Search request
Geo-coding
Access indexes
Spatial index
Textual index
Geo-parsing
Run-timePre-processing
Functionality of the components
The design and implementation of SPIRIT
Pre-processing the document collection
Assigning spatial footprints to web documents:
Identify geographical references
(geoparsing)
Assign them to spatial
coordinates (geocoding)
Spatial footprint
Functionality of the components
The design and implementation of SPIRIT
Building document indexes
Grid-based spatial indexing For each cell of the grid, a list of
document ID’s was constructed, using the document footprints which resulted from the geo-tagging process
Functionality of the components
The design and implementation of SPIRIT
Retrieving the results: “T” (Text) Scheme
Simplest approach
Retrieve all the documents that match the concept terms of the query and then filter to return only those which intersect the geographical scope of the place in the query (footprint)
Functionality of the components
The design and implementation of SPIRIT
Retrieving the results: “ST” (Space-Text) Scheme
More integrated approach
Regarded as a space-primary method
At search time the cells that intersect the query footprint are determined and then only the corresponding text indexes are searched
Functionality of the components
The design and implementation of SPIRIT
Retrieving the results: “TS” (Text-Space) Scheme
Better query response time
Regarded as a text-primary method
At search time, for each term, the associated documents are grouped according to the spatial index which they relate to
Query interfaces
The design and implementation of SPIRIT
Results display
The design and implementation of SPIRIT
Evaluation
The design and implementation of SPIRIT
Performance analysis
A relevant document to the query had to be both thematically and spatially relevant.
In this sense, the key result of the work is that spatially aware search outperformed text-only search.
Evaluation
The design and implementation of SPIRIT
Usability analysis
Strongly disagree
Disagree Neutral Agree Strongly agree
0
5
10
15
20
25
30
It was easy to get started with the system and make my query
No, not at all A little Yes, very much0
5
10
15
20
25
30
It was easy to find the locations of doc-uments listed to the right of the map on
the map
Conclusions
The design and implementation of SPIRIT
The paper describes a unified approach, as well as the architecture, for introducing spatial-awareness into search-engine technology
A prototype system demonstrated the effectiveness of the strategy
Personal Conclusions
The design and implementation of SPIRIT
The first study that can lead to changes in search engines and devices to improve the mobile experience
The web-a-where system provides good insight for further location search improving though is not very precise
SPIRIT is a complete new paradigm in space aware searching but the interaction methods can be improved
Thank you
References
General References
[1] M. Sanderson, J. Kohler, Analyzing geographic queries, in: Proceedings of the SIGIR 2004 Workshop on Geographic Information Retrieval, Sheffield, UK, 2004.
[2] S. Asadi, Searching the World Wide Web for local services and facilities: a review on the patterns of location-based queries, in: WAIM’05, Hong Zhou, China, 2005.
[3] S. Kristoffersen, F. Ljungberg, ‘‘Making Place’’ to make IT work: empirical explorations of HCI for mobile CSCW, in: Paper Presented at the International ACM SIGGROUP Conference on Supporting Group Work, 1999.
References
General References
[4] K. Church, B. Smyth, K. Bradley, P. Cotter, A large scale study of European mobile search behavior, in: Proceedings of MobileHCI’08, 2008, pp. 13–22.
[5] M.A. Neerincx, J.W. Streefkerk, Interacting in desktop and mobile context: emotion, trust and task performance, in: Paper Presented at the Proceedings of the First European Symposium on Ambient Intelligence (EUSAI), Eindhoven, The Netherlands, 2003.