lami spring 2014

54
Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services

Upload: joy

Post on 25-Feb-2016

32 views

Category:

Documents


1 download

DESCRIPTION

LAMI Spring 2014. Search Engine and Services. Presented by Edgar Cornejo 03.03.14. Outline. Mobile information search for location-based information Web-a-Where: Geotagging Web Content - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: LAMI Spring 2014

Presented by

Edgar Cornejo03.03.14

LAMISpring 2014

Search Engine and Services

Page 2: LAMI Spring 2014

Outline

Mobile information search for location-based information

Web-a-Where: Geotagging Web Content

The design and implementation of SPIRIT: a spatially-aware search engine for

information retrieval on the Internet

Page 3: LAMI Spring 2014

Mobile information search for location-based information

Department of Industrial Engineering Tsinghua University

Beijing, ChinaApril 2010

Chengyi Liu · Pei-Luen Patrick Rau · Fei Gao

Page 4: LAMI Spring 2014

Mobile search for location-based information

Mobile information search for location-based information

The study investigated the effects of location and information type in mobile searching for location-based information by carrying out two experiments in an airport

Page 5: LAMI Spring 2014

Mobile search scenario

High time

pressure

Many environment

al disturbance

s

Device limitation

s (screen size, input

method)

Restricted users’

operations

Mobile information search for location-based information

Page 6: LAMI Spring 2014

Mobile searching context

Information queries

+ location

More suitable results

Mobile information search for location-based information

Since most of the information is location-based [1,2], the results can be improved by analyzing information queries and location

Search Engine

Page 7: LAMI Spring 2014

Features of mobile interaction [3]

Mobile information search for location-based information

User's hands are often used to manipulate physical objects

Users may be involved in tasks that demand a high level of visual attention

Page 8: LAMI Spring 2014

Features of mobile interaction [3]

Mobile information search for location-based information

Users may be highly mobile during the task and have high-speed interaction

Page 9: LAMI Spring 2014

Search queries

Mobile information search for location-based information

Query Type Purpose Share*

Navigational query to reach a particular site 29.4%

Informational query to find information 10.2%

Transactional query

to visit a site and perform some web-mediated activity

60.4%

*According to a large scale study of European mobile search behavior developed in 2008 [4]

Page 10: LAMI Spring 2014

Factors proposed that may influence the mobile information

search

Page 11: LAMI Spring 2014

Experiment 1 - Hypotheses

Mobile information search for location-based information

Hypothesis 1For information searches in mobile versus non-mobile: The average of clicks in mobile is less The first search is more important Free recall is worse

Page 12: LAMI Spring 2014

Experiment 1 - Hypotheses

Mobile information search for location-based information

Hypothesis 2For information searching about location-based with respect to non-location-based information The number of clicks is less The first search result is more important Free recall is better

Page 13: LAMI Spring 2014

Experiment 1 - Tasks

Page 14: LAMI Spring 2014

Experiment 1 - Results

Mobile information search for location-based information

Hypothesis 1

The intention was to find how the user’s context (mobile vs. non-mobile) might affect the user’s information searching performance The average of clicks in mobile are

less False

The first search is more important False

Free recall is worse False

Page 15: LAMI Spring 2014

Experiment 1 - Results

Mobile information search for location-based information

Hypothesis 2The intention was to examine how the information type (location-based vs. non-location-based) might affect the user’s information searching performance The average of clicks in mobile are

less True

The first search is more important True

Free recall is better True

Page 16: LAMI Spring 2014

Experiment 2 - Hypotheses

Mobile information search for location-based information

Hypothesis 3For mobile information searching under high pressure with respect to low pressure info requirement: Average number of clicks are less The first search result is more important Free recall is worse

Page 17: LAMI Spring 2014

Experiment 2 - Hypotheses

Mobile information search for location-based information

Hypothesis 4For mobile information searching of informational or navigational with respect to transactional queries Number of clicks is greater The first search result is less important Free recall is worse

Page 18: LAMI Spring 2014

Experiment 2 - Tasks

Page 19: LAMI Spring 2014

Experiment 2 - Result

Mobile information search for location-based information

Hypothesis 3

The intention was to examine how the information pressure (high vs. low) requirement might affect a user’s mobile search performance

The average of clicks is less True

The first search is more important False

Free recall is worse False

Page 20: LAMI Spring 2014

Experiment 2 - Results

Mobile information search for location-based information

Hypothesis 4The intention was to examine how the how the location-based information type (informational, navigational vs. transactional) might affect a user’s mobile search performance. The average of clicks is greater True The first search results are less

important True

Free recall is worse True

Page 21: LAMI Spring 2014

Summary

Mobile information search for location-based information

Information type (location-based vs. non-location-based) was found to be effective in user performance during the information search process

Information requirement pressure and location-based information type (navigational, informational and transactional) affect the mobile search process

The first two search results were found to be very important to good search efficiency and good user satisfaction

Page 22: LAMI Spring 2014

Web-a-Where: Geotagging Web Content

Einat Amitay · Nadav Har’El Ron · Sivan Aya Soffer

IBM Haifa Research LabHaifa 31905, Israel

July 2004

Page 23: LAMI Spring 2014

Web-a-Where: Geotagging Web Content

Web-a-Where: Geotagging Web Content

Is a system for associating geography with Web pages

Locates mentions of places and determines the place each name refers to

Assigns to each page a geographic focus a locality that the page discusses as a whole

Implemented within the framework of the IBM WebFountain data mining system

Page 24: LAMI Spring 2014

Web-a-Where: Geotagging Web Content

Web-a-Where: Geotagging Web Content

Pages may have two types of geography associated with it: a source and a target.

Source geography has to do with the origin of the page, the physical location, address of its author, etc.

Target geography is determined by the contents of the page and relates to the topic the page is discussing.

Page 25: LAMI Spring 2014

Ambiguities

Web-a-Where: Geotagging Web Content

Geo/non-geo ambiguity is the case of a place name having another, non geographic meaning e.g. Mobile (Alabama) or Reading (England)

Geo/geo ambiguity arises when two or more distinct places have the same name

Page 26: LAMI Spring 2014

System Components

Web-a-Where: Geotagging Web Content

Geotagger (Main component)

Finds and disambiguates geographic names

Assigns a taxonomy node to each phrase in the text to refer to a place e.g., Paris/France/Europe

The gazetteer

Database that keeps the list of geographic names, their canonical taxonomies and other information

Page 27: LAMI Spring 2014

Tagging individual place names

Web-a-Where: Geotagging Web Content

The processing of a page is done in three phases:

Spotting DisambiguationFocus

determination

Page 28: LAMI Spring 2014

1. Spotting place name candidates

Web-a-Where: Geotagging Web Content

Finding all the possible geographic names in each page

Short abbreviations are not spotted e.g. IN (for Indiana) or AT ( for Austria) but used to help disambiguate other spots e.g. Gary, IN

Page 29: LAMI Spring 2014

2. Disambiguating spots (Algorithm)

Web-a-Where: Geotagging Web Content

The geotagger assigns a unique meaning to spots that can be uniquely qualified. Confidence 95%

Combinations that are not unique are left unassigned

In a page with multiple spots with the same name where only one is qualified, this value is assigned to the others. Confidence 80%

Disambiguation contexts are also used to unassigned spots with confidence less than 70%

Page 30: LAMI Spring 2014

2. Disambiguating spot (Data sources)

Web-a-Where: Geotagging Web Content

The Geographic Names Information System (GNIS) for U.S. locations

world-gazetteer.com for non-U.S. locations

United Nations Statistic Division (UNSD) for countries and continents

ISO 3166-1 for country and other abbreviations

Page 31: LAMI Spring 2014

3. Focus determination

Web-a-Where: Geotagging Web Content

The basic idea is that if several cities from the same region are mentioned, probably this region is the focus

Sometimes cannot be said that a page has only one focus

The confidence score should be taken into account when finding the focus, giving higher weight to information coming from locations with higher confidence

Page 32: LAMI Spring 2014

Example

Web-a-Where: Geotagging Web Content

A certain page contained four mentions of Orlando/Florida (assigned confidence 0.5), three Texas (0.75), eight Fort Worth/Texas (0.75), three Dallas/Texas (0.75), one Garland/Texas (0.75), and one Iraq (0.5)A human was asked to judge what is the geographical focus of this page and responded with “It’s about Texas and perhaps also Orlando” Indeed, that page comes from the “Orlando Weekly” site, in a forum titled “Just a look at The Texas Local Music Scene...”

Page 33: LAMI Spring 2014

Evaluating geotagging precision

Web-a-Where: Geotagging Web Content

Collection Number of pages Accuracy

Arbitrary collection 200 81,7%

.GOV collection 200 73,3%

Open Directory Project (ODP) 200 63,1%

Geotags assigned automatically versus defined manually

Page 34: LAMI Spring 2014

Evaluating focus

Web-a-Where: Geotagging Web Content

92% Correct up to country level

8% Incorrect country

38% Precise match

30% Correct state

or city

24% Correct country

4%Correct

continent

4%Continent

wrong

Comparison of Web-a-Where-determined focus to human-determined one (ODP) for ~1 million pages

Page 35: LAMI Spring 2014

Summary

Web-a-Where: Geotagging Web Content

The system is able to correctly tag individual name place occurrences 80% of the time and define correct focus of a page 92% of the time

Accuracy can be further improved

The main source of errors is geo/non-geo ambiguity

Page 36: LAMI Spring 2014

The design and implementation of SPIRIT

Ross Purves, Paul Clough, Christopher Jones, Avi Arampatzis, Benedicte Bucheri, David Finch, Gaihua Fu,

Hideo Joho, Awase Hhirni Syed, Subodh Vaid and Bisheng Yang

Department of Geography, University of Zurich, SwitzerlandDepartment of Information Studies, University of Sheffield, UK

School of Computer Science, Cardiff University, UKInstitute of Information and Computing Sciences, Utrecht University,

NetherlandsLaboratoire COGIT - Institut Geographique National, France

August 2007

Page 37: LAMI Spring 2014

The design and implementation of SPIRIT

The design and implementation of SPIRIT

This paper describes the design and implementation of a complete solution to geographic information retrieval

Page 38: LAMI Spring 2014

Requirements

The design and implementation of SPIRIT

Exhaustive retrieval of relevant documents in a specified area

Place names should be automatically identified, and interactively disambiguated

Ability to query for geographical areas whose boundaries are imprecise

Page 39: LAMI Spring 2014

Requirements

The design and implementation of SPIRIT

Spatial concepts relating different geographic entities should be represented (outside, in)

It should be possible for users to specify the area of interest on a map

Ability to view query results on a map linked to relevant web documents

Document ranking should combine both spatial and thematic aspects of document relevance

Page 40: LAMI Spring 2014

Architecture Overview

The design and implementation of SPIRIT

User interface Broker Relevance

ranking

IndexesTextualSpatial

Web data collection

documentsSearch Engine

Geographical

ontology

Metadata Doc-to-

footprint mapping

Query disambiguationQuery expansion

Rank results

Search request

Geo-coding

Access indexes

Spatial index

Textual index

Geo-parsing

Run-timePre-processing

Page 41: LAMI Spring 2014

Functionality of the components

The design and implementation of SPIRIT

Pre-processing the document collection

Assigning spatial footprints to web documents:

Identify geographical references

(geoparsing)

Assign them to spatial

coordinates (geocoding)

Spatial footprint

Page 42: LAMI Spring 2014

Functionality of the components

The design and implementation of SPIRIT

Building document indexes

Grid-based spatial indexing For each cell of the grid, a list of

document ID’s was constructed, using the document footprints which resulted from the geo-tagging process

Page 43: LAMI Spring 2014

Functionality of the components

The design and implementation of SPIRIT

Retrieving the results: “T” (Text) Scheme

Simplest approach

Retrieve all the documents that match the concept terms of the query and then filter to return only those which intersect the geographical scope of the place in the query (footprint)

Page 44: LAMI Spring 2014

Functionality of the components

The design and implementation of SPIRIT

Retrieving the results: “ST” (Space-Text) Scheme

More integrated approach

Regarded as a space-primary method

At search time the cells that intersect the query footprint are determined and then only the corresponding text indexes are searched

Page 45: LAMI Spring 2014

Functionality of the components

The design and implementation of SPIRIT

Retrieving the results: “TS” (Text-Space) Scheme

Better query response time

Regarded as a text-primary method

At search time, for each term, the associated documents are grouped according to the spatial index which they relate to

Page 46: LAMI Spring 2014

Query interfaces

The design and implementation of SPIRIT

Page 47: LAMI Spring 2014

Results display

The design and implementation of SPIRIT

Page 48: LAMI Spring 2014

Evaluation

The design and implementation of SPIRIT

Performance analysis

A relevant document to the query had to be both thematically and spatially relevant.

In this sense, the key result of the work is that spatially aware search outperformed text-only search.

Page 49: LAMI Spring 2014

Evaluation

The design and implementation of SPIRIT

Usability analysis

Strongly disagree

Disagree Neutral Agree Strongly agree

0

5

10

15

20

25

30

It was easy to get started with the system and make my query

No, not at all A little Yes, very much0

5

10

15

20

25

30

It was easy to find the locations of doc-uments listed to the right of the map on

the map

Page 50: LAMI Spring 2014

Conclusions

The design and implementation of SPIRIT

The paper describes a unified approach, as well as the architecture, for introducing spatial-awareness into search-engine technology

A prototype system demonstrated the effectiveness of the strategy

Page 51: LAMI Spring 2014

Personal Conclusions

The design and implementation of SPIRIT

The first study that can lead to changes in search engines and devices to improve the mobile experience

The web-a-where system provides good insight for further location search improving though is not very precise

SPIRIT is a complete new paradigm in space aware searching but the interaction methods can be improved

Page 52: LAMI Spring 2014

Thank you

Page 53: LAMI Spring 2014

References

General References

[1] M. Sanderson, J. Kohler, Analyzing geographic queries, in: Proceedings of the SIGIR 2004 Workshop on Geographic Information Retrieval, Sheffield, UK, 2004.

[2] S. Asadi, Searching the World Wide Web for local services and facilities: a review on the patterns of location-based queries, in: WAIM’05, Hong Zhou, China, 2005.

[3] S. Kristoffersen, F. Ljungberg, ‘‘Making Place’’ to make IT work: empirical explorations of HCI for mobile CSCW, in: Paper Presented at the International ACM SIGGROUP Conference on Supporting Group Work, 1999.

Page 54: LAMI Spring 2014

References

General References

[4] K. Church, B. Smyth, K. Bradley, P. Cotter, A large scale study of European mobile search behavior, in: Proceedings of MobileHCI’08, 2008, pp. 13–22.

[5] M.A. Neerincx, J.W. Streefkerk, Interacting in desktop and mobile context: emotion, trust and task performance, in: Paper Presented at the Proceedings of the First European Symposium on Ambient Intelligence (EUSAI), Eindhoven, The Netherlands, 2003.