determining the geographic location of internet hosts venkata n. padmanabhan microsoft research...
TRANSCRIPT
Determining the Geographic Location of Internet Hosts
Venkata N. Padmanabhan
Microsoft Research
Lakshminarayanan Subramanian
University of California at Berkeley
SIGMETRICS 2001
Background Location-aware services are relevant in the Internet context
too targeted advertising event notification territorial rights management
Existing approaches: user input: burdensome, error-prone whois: manual updates, host may not be at registered location
Goal: estimate location based on client IP address challenging problem because an IP address does not inherently
indicate location
IP2GeoMulti-pronged approach that exploits various “properties” of the
Internet DNS names of router interfaces often indicate location Network delay tends to correlate with geographic distance Hosts that are aggregated for the purposes of Internet routing also
tend to be clustered geographically
GeoTrack determine location of closest router with recognizable DNS name
GeoPing use delay measurements to triangulate location
GeoCluster extrapolate partial IP-to-location mapping information using cluster
information derived from BGP routing data
GeoPing Delay-based triangulation is conceptually simple
delay distance distance from 3 or more non-collinear points location
But there are practical difficulties network path may be circuitous transmission and queuing delays may corrupt delay estimate one-way delay is hard to measure
GeoPing delay is measured from several distributed probes minimum delay among several samples is picked Nearest Neighbor in Delay Space (NNDS) algorithm
construct a delay map containing (delay vector,location) tuples given a delay vector, search through the delay map for closest match location corresponding to the closest match is our location estimate
Validation of Delay-based Approach
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000 5000
Geographic Distance (kilometers)
Cu
mu
lati
ve
Pro
ba
bil
ity
5-15 ms 25-35 ms 65-75 ms
Delay tends to increase with geographic distance
Impact of the Number of Probes
0
200
400
600
800
1000
1200
1400
1600
1800
0 5 10 15
Number of probes
Err
or
Dis
tan
ce
(k
m)
25th 50th 75
Highest accuracy when 7-9 probes are used
GeoCluster Basic idea
divide up the space of IP addresses into clusters using BGP prefixes use partial IP-to-location mapping data to infer location of each cluster given target IP address, find matching cluster via longest-prefix match. location of the matching cluster is our estimate of host location
Issues partial IP-to-location mapping information may not be entirely accurate BGP prefixes might not correspond to geographic clusters
Sub-clustering algorithm use partial IP-to-location mapping information to test whether a BGP prefix
is likely to correspond to a geographic cluster if the test is negative, divide the prefix into two and recursively apply the test
to each half in the end we are only left with geographically clustered prefixes dispersion offers an indication of the accuracy of a location estimate
Performance of IP2Geo
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000
Error distance (kilometers)
Cu
mu
lati
ve P
rob
abil
ity
GeoTrack GeoPing GeoCluster
Median error: GeoCluster: 28 km,GeoTrack: 102 km, GeoPing: 382 km
Summary IP2Geo combines several techniques that leverage different
sources of information GeoTrack: DNS names GeoPing: network delay GeoCluster: address aggregates used for routing
Median error varies between 20 and 400 km
Even a 30% success rate is useful especially since we can tell when the estimate is likely to be accurate
Forthcoming paper at SIGCOMM 2001
For more information visit: http://www.research.microsoft.com/~padmanab/