lami spring 2014

Presented by

Edgar Cornejo03.03.14

LAMISpring 2014

Search Engine and Services

Outline

Mobile information search for location-based information

Web-a-Where: Geotagging Web Content

The design and implementation of SPIRIT: a spatially-aware search engine for

information retrieval on the Internet


Department of Industrial Engineering Tsinghua University

Beijing, ChinaApril 2010

Chengyi Liu · Pei-Luen Patrick Rau · Fei Gao

Mobile search for location-based information


The study investigated the effects of location and information type in mobile searching for location-based information by carrying out two experiments in an airport

Mobile search scenario

High time

pressure

Many environment

al disturbance

s

Device limitation

s (screen size, input

method)

Restricted users’

operations


Mobile searching context

Information queries

+ location

More suitable results


Since most of the information is location-based [1,2], the results can be improved by analyzing information queries and location

Search Engine

Features of mobile interaction [3]


User's hands are often used to manipulate physical objects

Users may be involved in tasks that demand a high level of visual attention

Features of mobile interaction [3]


Users may be highly mobile during the task and have high-speed interaction

Search queries


Query Type Purpose Share*

Navigational query to reach a particular site 29.4%

Informational query to find information 10.2%

Transactional query

to visit a site and perform some web-mediated activity

60.4%

*According to a large scale study of European mobile search behavior developed in 2008 [4]

Factors proposed that may influence the mobile information

search

Experiment 1 - Hypotheses


Hypothesis 1For information searches in mobile versus non-mobile: The average of clicks in mobile is less The first search is more important Free recall is worse



Hypothesis 2For information searching about location-based with respect to non-location-based information The number of clicks is less The first search result is more important Free recall is better

Experiment 1 - Tasks

Experiment 1 - Results


Hypothesis 1

The intention was to find how the user’s context (mobile vs. non-mobile) might affect the user’s information searching performance The average of clicks in mobile are

less False

The first search is more important False

Free recall is worse False



Hypothesis 2The intention was to examine how the information type (location-based vs. non-location-based) might affect the user’s information searching performance The average of clicks in mobile are

less True

The first search is more important True

Free recall is better True



Hypothesis 3For mobile information searching under high pressure with respect to low pressure info requirement: Average number of clicks are less The first search result is more important Free recall is worse



Hypothesis 4For mobile information searching of informational or navigational with respect to transactional queries Number of clicks is greater The first search result is less important Free recall is worse

Experiment 2 - Tasks

Experiment 2 - Result


Hypothesis 3

The intention was to examine how the information pressure (high vs. low) requirement might affect a user’s mobile search performance

The average of clicks is less True

The first search is more important False

Free recall is worse False



Hypothesis 4The intention was to examine how the how the location-based information type (informational, navigational vs. transactional) might affect a user’s mobile search performance. The average of clicks is greater True The first search results are less

important True

Free recall is worse True

Summary


Information type (location-based vs. non-location-based) was found to be effective in user performance during the information search process

Information requirement pressure and location-based information type (navigational, informational and transactional) affect the mobile search process

The first two search results were found to be very important to good search efficiency and good user satisfaction


Einat Amitay · Nadav Har’El Ron · Sivan Aya Soffer

IBM Haifa Research LabHaifa 31905, Israel

July 2004



Is a system for associating geography with Web pages

Locates mentions of places and determines the place each name refers to

Assigns to each page a geographic focus a locality that the page discusses as a whole

Implemented within the framework of the IBM WebFountain data mining system



Pages may have two types of geography associated with it: a source and a target.

Source geography has to do with the origin of the page, the physical location, address of its author, etc.

Target geography is determined by the contents of the page and relates to the topic the page is discussing.

Ambiguities


Geo/non-geo ambiguity is the case of a place name having another, non geographic meaning e.g. Mobile (Alabama) or Reading (England)

Geo/geo ambiguity arises when two or more distinct places have the same name

System Components


Geotagger (Main component)

Finds and disambiguates geographic names

Assigns a taxonomy node to each phrase in the text to refer to a place e.g., Paris/France/Europe

The gazetteer

Database that keeps the list of geographic names, their canonical taxonomies and other information

Tagging individual place names


The processing of a page is done in three phases:

Spotting DisambiguationFocus

determination

1. Spotting place name candidates


Finding all the possible geographic names in each page

Short abbreviations are not spotted e.g. IN (for Indiana) or AT ( for Austria) but used to help disambiguate other spots e.g. Gary, IN

2. Disambiguating spots (Algorithm)


The geotagger assigns a unique meaning to spots that can be uniquely qualified. Confidence 95%

Combinations that are not unique are left unassigned

In a page with multiple spots with the same name where only one is qualified, this value is assigned to the others. Confidence 80%

Disambiguation contexts are also used to unassigned spots with confidence less than 70%

2. Disambiguating spot (Data sources)


The Geographic Names Information System (GNIS) for U.S. locations

world-gazetteer.com for non-U.S. locations

United Nations Statistic Division (UNSD) for countries and continents

ISO 3166-1 for country and other abbreviations

3. Focus determination


The basic idea is that if several cities from the same region are mentioned, probably this region is the focus

Sometimes cannot be said that a page has only one focus

The confidence score should be taken into account when finding the focus, giving higher weight to information coming from locations with higher confidence

Example


A certain page contained four mentions of Orlando/Florida (assigned confidence 0.5), three Texas (0.75), eight Fort Worth/Texas (0.75), three Dallas/Texas (0.75), one Garland/Texas (0.75), and one Iraq (0.5)A human was asked to judge what is the geographical focus of this page and responded with “It’s about Texas and perhaps also Orlando” Indeed, that page comes from the “Orlando Weekly” site, in a forum titled “Just a look at The Texas Local Music Scene...”

Evaluating geotagging precision


Collection Number of pages Accuracy

Arbitrary collection 200 81,7%

.GOV collection 200 73,3%

Open Directory Project (ODP) 200 63,1%

Geotags assigned automatically versus defined manually

Evaluating focus


92% Correct up to country level

8% Incorrect country

38% Precise match

30% Correct state

or city

24% Correct country

4%Correct

continent

4%Continent

wrong

Comparison of Web-a-Where-determined focus to human-determined one (ODP) for ~1 million pages

Summary


The system is able to correctly tag individual name place occurrences 80% of the time and define correct focus of a page 92% of the time

Accuracy can be further improved

The main source of errors is geo/non-geo ambiguity

The design and implementation of SPIRIT

Ross Purves, Paul Clough, Christopher Jones, Avi Arampatzis, Benedicte Bucheri, David Finch, Gaihua Fu,

Hideo Joho, Awase Hhirni Syed, Subodh Vaid and Bisheng Yang

Department of Geography, University of Zurich, SwitzerlandDepartment of Information Studies, University of Sheffield, UK

School of Computer Science, Cardiff University, UKInstitute of Information and Computing Sciences, Utrecht University,

NetherlandsLaboratoire COGIT - Institut Geographique National, France

August 2007



This paper describes the design and implementation of a complete solution to geographic information retrieval

Requirements


Exhaustive retrieval of relevant documents in a specified area

Place names should be automatically identified, and interactively disambiguated

Ability to query for geographical areas whose boundaries are imprecise

Requirements


Spatial concepts relating different geographic entities should be represented (outside, in)

It should be possible for users to specify the area of interest on a map

Ability to view query results on a map linked to relevant web documents

Document ranking should combine both spatial and thematic aspects of document relevance

Architecture Overview


User interface Broker Relevance

ranking

IndexesTextualSpatial

Web data collection

documentsSearch Engine

Geographical

ontology

Metadata Doc-to-

footprint mapping

Query disambiguationQuery expansion

Rank results

Search request

Geo-coding

Access indexes

Spatial index

Textual index

Geo-parsing

Run-timePre-processing

Functionality of the components


Pre-processing the document collection

Assigning spatial footprints to web documents:

Identify geographical references

(geoparsing)

Assign them to spatial

coordinates (geocoding)

Spatial footprint



Building document indexes

Grid-based spatial indexing For each cell of the grid, a list of

document ID’s was constructed, using the document footprints which resulted from the geo-tagging process



Retrieving the results: “T” (Text) Scheme

Simplest approach

Retrieve all the documents that match the concept terms of the query and then filter to return only those which intersect the geographical scope of the place in the query (footprint)



Retrieving the results: “ST” (Space-Text) Scheme

More integrated approach

Regarded as a space-primary method

At search time the cells that intersect the query footprint are determined and then only the corresponding text indexes are searched



Retrieving the results: “TS” (Text-Space) Scheme

Better query response time

Regarded as a text-primary method

At search time, for each term, the associated documents are grouped according to the spatial index which they relate to

Query interfaces


Results display


Evaluation


Performance analysis

A relevant document to the query had to be both thematically and spatially relevant.

In this sense, the key result of the work is that spatially aware search outperformed text-only search.

Evaluation


Usability analysis

Strongly disagree

Disagree Neutral Agree Strongly agree

0

5

10

15

20

25

30

It was easy to get started with the system and make my query

No, not at all A little Yes, very much0

5

10

15

20

25

30

It was easy to find the locations of doc-uments listed to the right of the map on

the map

Conclusions


The paper describes a unified approach, as well as the architecture, for introducing spatial-awareness into search-engine technology

A prototype system demonstrated the effectiveness of the strategy

Personal Conclusions


The first study that can lead to changes in search engines and devices to improve the mobile experience

The web-a-where system provides good insight for further location search improving though is not very precise

SPIRIT is a complete new paradigm in space aware searching but the interaction methods can be improved

Thank you

References

General References

[1] M. Sanderson, J. Kohler, Analyzing geographic queries, in: Proceedings of the SIGIR 2004 Workshop on Geographic Information Retrieval, Sheffield, UK, 2004.

[2] S. Asadi, Searching the World Wide Web for local services and facilities: a review on the patterns of location-based queries, in: WAIM’05, Hong Zhou, China, 2005.

[3] S. Kristoffersen, F. Ljungberg, ‘‘Making Place’’ to make IT work: empirical explorations of HCI for mobile CSCW, in: Paper Presented at the International ACM SIGGROUP Conference on Supporting Group Work, 1999.

References

General References

[4] K. Church, B. Smyth, K. Bradley, P. Cotter, A large scale study of European mobile search behavior, in: Proceedings of MobileHCI’08, 2008, pp. 13–22.

[5] M.A. Neerincx, J.W. Streefkerk, Interacting in desktop and mobile context: emotion, trust and task performance, in: Paper Presented at the Proceedings of the First European Symposium on Ambient Intelligence (EUSAI), Eindhoven, The Netherlands, 2003.

lami spring 2014

Documents

locationbased informationweb

users information queries

search engines

effects of location

suitable mobile search

aware search engine

lami search engine

search queriesthe purpose