mapping the internet and intranets
DESCRIPTION
Mapping the Internet and Intranets. Hal Burch, Bill Cheswick [email protected] http://www.cheswick.com. Intranets are out of control Always have been Highlands “day after” scenario Panix DOS attacks a way to trace anonymous packets back!. Internet tomography - PowerPoint PPT PresentationTRANSCRIPT
1 of 139Mapping the Internet and Intranets
134 slides
Mapping the Internet and
IntranetsHal Burch, Bill Cheswick
http://www.cheswick.com
3 of 139Mapping the Internet and Intranets
Motivations• Intranets are out of
control– Always have been
• Highlands “day after” scenario
• Panix DOS attacks– a way to trace
anonymous packets back!
• Internet tomography
• Curiosity about size and growth of the Internet
• Same tools are useful for understanding any large network, including intranets
4 of 139Mapping the Internet and Intranets
Related Work• See Martin Dodge’s cyber geography page
• MIDS - John Quarterman
• CAIDA - kc claffy
• Mercator
• “Measuring ISP topologies with rocketfuel” - 2002– Spring, Mahajan, Wetherall
• Enter “internet map” in your search engine
5 of 139Mapping the Internet and Intranets
The Goals• Long term reliable
collection of Internet and Lucent connectivity information– without annoying
too many people
• Attempt some simple visualizations of the data
– movie of Internet growth!
• Develop tools to probe intranets
• Probe the distant corners of the Internet
6 of 139Mapping the Internet and Intranets
Methods - data collection• Single reliable host connected at the
company perimeter
• Daily full scan of Lucent
• Daily partial scan of Internet, monthly full scan
• One line of text per network scanned– Unix tools
7 of 139Mapping the Internet and Intranets
Methods - network scanning• Obtain master network list
– network lists from Merit, RIPE, APNIC, etc.– BGP data or routing data from customers– hand-assembled list of Yugoslavia/Bosnia
• Run a traceroute-style scan towards each network
• Stop on error, completion, no data– Keep the natives happy
8 of 139Mapping the Internet and Intranets
TTL probes• Used by traceroute and other tools
• Probes toward each target network with increasing TTL
• Probes are ICMP, UDP, TCP to port 80, 25, 139, etc.
• Some people block UDP, others ICMP
9 of 139Mapping the Internet and Intranets
TTL probes
Application level
TCP/UDP
IP
Hardware
Client
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
RouterApplication level
TCP/UDP
IP
Hardware
Server
Hop 1 Hop 2 Hop 3
Hop 3 Hop 4
10 of 139Mapping the Internet and Intranets
Send a packet with a TTL of 1…
Application level
TCP/UDP
IP
Hardware
Client
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
RouterApplication level
TCP/UDP
IP
Hardware
Server
Hop 1 Hop 2 Hop 3
Hop 3 Hop 4
11 of 139Mapping the Internet and Intranets
…and we get the death notice from the first hop
Application level
TCP/UDP
IP
Hardware
Client
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
RouterApplication level
TCP/UDP
IP
Hardware
Server
Hop 1 Hop 2 Hop 3
Hop 3 Hop 4
12 of 139Mapping the Internet and Intranets
Send a packet with a TTL of 2…
Application level
TCP/UDP
IP
Hardware
Client
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
RouterApplication level
TCP/UDP
IP
Hardware
Server
Hop 1 Hop 2 Hop 3
Hop 3 Hop 4
13 of 139Mapping the Internet and Intranets
… and so on …
Application level
TCP/UDP
IP
Hardware
Client
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
Router
IP
Hardware
RouterApplication level
TCP/UDP
IP
Hardware
Server
Hop 1 Hop 2 Hop 3
Hop 3 Hop 4
14 of 139Mapping the Internet and Intranets
Advantages• We don’t need access (I.e. SNMP) to the
routers
• It’s very fast
• Standard Internet tool: it doesn’t break things
• Insignificant load on the routers
• Not likely to show up on IDS reports
• We can probe with many packet types
15 of 139Mapping the Internet and Intranets
Limitations• Outgoing paths only
• Level 3 (IP) only– ATM networks appear as a single node– This distorts graphical analysis
• Not all routers respond
• Many routers limited to one response per second
16 of 139Mapping the Internet and Intranets
Limitations• View is from scanning host only
• Takes a while to collect alternating paths
• Gentle mapping means missed endpoints
• Imputes non-existent links
17 of 139Mapping the Internet and Intranets
The data can go either way
A
E F
D
B C
18 of 139Mapping the Internet and Intranets
The data can go either way
A
E F
D
B C
19 of 139Mapping the Internet and Intranets
But our test packets only go part of the way
A
E F
D
B C
20 of 139Mapping the Internet and Intranets
We record the hop…
A
E F
D
B C
21 of 139Mapping the Internet and Intranets
The next probe happens to go the other way
A
E F
D
B C
22 of 139Mapping the Internet and Intranets
…and we record the other hop…
A
E F
D
B C
23 of 139Mapping the Internet and Intranets
We’ve imputed a link that doesn’t exist
A
E F
D
B C
24 of 139Mapping the Internet and Intranets
Data collection complaints• Australian parliament was the first to
complain• List of whiners (25 nets)• Military noticed immediately
– Steve Northcutt– arrangements/warnings to DISA and CERT
• These complaints are mostly a thing of the past– Internet background radiation
predominates
25 of 139Mapping the Internet and Intranets
26 of 139Mapping the Internet and Intranets
Distribution of path lengths
0
1000
2000
3000
4000
5000
6000
7000
8000
Path length
Num
ber
of n
ets
Reached Not reached
27 of 139Mapping the Internet and Intranets
Visualization goals• make a map
– show interesting features– debug our database and collection
methods– hard to fold up
• geography doesn’t matter
• use colors to show further meaning
28 of 139Mapping the Internet and Intranets
29 of 139Mapping the Internet and Intranets
30 of 139Mapping the Internet and Intranets
Infovis state-of-the-art in 1998• 800 nodes was a huge graph
• We had 100,000 nodes
• Use spring-force simulation with lots of empirical tweaks
• Each layout needed 20 hours of Pentium time
31 of 139Mapping the Internet and Intranets
134 slides
The Internet has a diameter of about
10,000 pookies
134 slides
Visualization of the layout algorithm
Laying out the Internet graph
34 of 139Mapping the Internet and Intranets
134 slides
Visualization of the layout algorithmLaying out an intranet
36 of 139Mapping the Internet and Intranets
37 of 139Mapping the Internet and Intranets
A simplified map• Minimum distance spanning tree uses 80%
of the data
• Much easier visualization
• Most of the links still valid
• Redundancy is in the middle
38 of 139Mapping the Internet and Intranets
Colored byAS number
39 of 139Mapping the Internet and Intranets
Map Coloring• distance from test host
• IP address– shows communities
• Geographical (by TLD)
• ISPs
• future– timing, firewalls, LSRR blocks
40 of 139Mapping the Internet and Intranets
Colored by IP address!
41 of 139Mapping the Internet and Intranets
Colored by geography
42 of 139Mapping the Internet and Intranets
Colored by ISP
43 of 139Mapping the Internet and Intranets
Colored by distancefrom scanning host
44 of 139Mapping the Internet and Intranets
US militaryreached by ICMP ping
45 of 139Mapping the Internet and Intranets
US military networksreached by UDP
46 of 139Mapping the Internet and Intranets
47 of 139Mapping the Internet and Intranets
48 of 139Mapping the Internet and Intranets
History of the Project• Started in August 1998 at Bell Labs
• April-June 1999: Yugoslavia mapping
• July 2000: first customer intranet scanned
• Sept. 2000: spun off Lumeta from Lucent/Bell Labs
• June 2002: “B” round funding completed
• 2003: sales >$4MM
49 of 139Mapping the Internet and Intranets
Backhoes/truck bombs/mayhem• The former happens surprisingly often
– Almost daily on the network of one major ISP
• 9/11 took out a fair amount of connectivity
50 of 139Mapping the Internet and Intranets
CIDR and IP Counts
145K
150K
155K
160K
165K
170K
175K
180K
9/11 9/12 9/13 9/14 9/15 9/16 9/17 9/18 9/19 9/20 9/21 9/22
Date
Cou
nt
# Edges# CIDRs# IPs
51 of 139Mapping the Internet and Intranets
Routers in New York Citymissing generator fuel
1000
1100
1200
1300
1400
9/11 9/12 9/13 9/14 9/15 9/16 9/17 9/18 9/19 9/20 9/21 9/22
Date
# Ro
uter
s
52 of 139Mapping the Internet and Intranets
Internet before 9/11/2001
53 of 139Mapping the Internet and Intranets
Internet after 9/11/2001
134 slides
YugoslaviaAn unclassified peek at a new
battlefield
55 of 139Mapping the Internet and Intranets
134 slides
Un film par Steve “Hollywood” Branigan...
57 of 139Mapping the Internet and Intranets
134 slides
fin
134 slides
Intranets: the rest of the Internet
The Pretty GoodWall of China
61 of 139Mapping the Internet and Intranets
62 of 139Mapping the Internet and Intranets
63 of 139Mapping the Internet and Intranets
64 of 139Mapping the Internet and Intranets
65 of 139Mapping the Internet and Intranets
66 of 139Mapping the Internet and Intranets
67 of 139Mapping the Internet and Intranets
This wasSupposedTo be aVPN
68 of 139Mapping the Internet and Intranets
69 of 139Mapping the Internet and Intranets
134 slides
Anything large enough to be called
an “intranet” isout of control
71 of 139Mapping the Internet and Intranets
Case studies: corp. networksSome intranet statistics
Min MaxIntranet sizes (devices) 7,900 365,000Corporate address space 81,000 745,000,000% devices in unknown address space 0.01% 20.86%
% routers responding to "public" 0.14% 75.50%% routers responding to other 0.00% 52.00%
Outbound host leaks on network 0 176,000% devices with outbound ICMP leaks 0% 79%% devices with outbound UDP leaks 0% 82%
Inbound UDP host leaks 0 5,800% devices with inbound ICMP leaks 0% 11%% devices with inbound UDP leaks 0% 12%% hosts running Windows 36% 84%
72 of 139Mapping the Internet and Intranets
Leak Detection
Internet intranet
Mapping hostA
Test hostB
mittD
C
• A sends packet to B, with spoofed return address of D
• If B can, it will reply to D with a response, possibly through a different interface
73 of 139Mapping the Internet and Intranets
Outbound Leak Detection
Internet intranet
Mapping hostA
Test hostB
mittD
C
• Packet must be crafted so the response won’t be permitted through the firewall
• A variety of packet types and responses are used
• Either inside or outside address may be discovered
• Packet is labeled so we know where it came from
74 of 139Mapping the Internet and Intranets
Inbound Leak Detection
Internet intranet
Mapping hostA
Test hostB
mittD
C
• This direction is usually more important
• It all depends on the site policy…
• …so many leaks might be just fine.
75 of 139Mapping the Internet and Intranets
Inbound Leak Detection
Internet intranet
Mapping hostA
Test hostB
mittD
C
76 of 139Mapping the Internet and Intranets
Existence proofs of intranet leaks: the slammer worm
• It’s a pop-quiz on perimeter integrity
• The best run networks (e.g. spooks’ nets) do not get these plagues– Internal hosts may be susceptible
77 of 139Mapping the Internet and Intranets
Some Lumeta lessons• Reporting is the really hard part
– Converting data to information• “Tell me how we compare to other clients”• Offering a service was good practice, for a
while• The clients want a device• We have >70 Fortune-200 companies and
government agencies as clients• Need-to-have vs. want-to-have
78 of 139Mapping the Internet and Intranets
Honeyd – network emulation• Anti-hacking tools by Niels Provos at
citi.umich.edu
• Can respond as one or more hosts
• I am configuring it to look like an entire client’s network
• Useful for testing and debugging
• Product?
134 slides
Some open questions
80 of 139Mapping the Internet and Intranets
How do you analyze a large graph over time?
• Five years of Internet data, mostly unanalyzed
• Alternate paths to a target country
• Sample insight: “Poland was off the Internet yesterday”
• Placement of monitoring tools?
• Compute a display differences between two complex graphs
81 of 139Mapping the Internet and Intranets
Visualizations• These graphs are too big for a piece of paper
• Various approaches available, but none really satisfactory
• Build visualization graph as the data comes in, and as the network evolves
82 of 139Mapping the Internet and Intranets
134 slides
Mapping the Internet and
IntranetsHal Burch, Bill Cheswick
http://www.cheswick.com
134 slides
Some Internet Mapping Innards
and LessonsBill Cheswick
http://www.cheswick.com
85 of 139Mapping the Internet and Intranets
Some of the Original Unix philosophy
• A tool should do one job, and do it well
• Pipes let us build powerful systems by linking tools together
• Kernighan and Pike, The Unix Programming Environment
86 of 139Mapping the Internet and Intranets
Pipes let you feed the output of one program directly into the next
• The programs must be written with this in mind
• Simple, constant formats
• One line per item
• Usually white space separates fields
• No header or trailer lines, please
87 of 139Mapping the Internet and Intranets
Bad: multiple files on one line.Worse: different on pipes and terminal
88 of 139Mapping the Internet and Intranets
Bad: stupid header line
89 of 139Mapping the Internet and Intranets
Bad: header lines. (duplicates Unix’s traceroute mistake)
90 of 139Mapping the Internet and Intranets
Pinglist: traceroute according to Unix specs
• Each input line produces a packet
• Each returning packet produces a single output line
• Ident field allowed the calling program to keep track of packets
• Original design was to generate UDP packets with varying TTL fields
• No DNS lookup: that interferes with timing, and we might not want to do it, and other programs could do it better anyway– Do one thing, and do it well
91 of 139Mapping the Internet and Intranets
Original pinglist sample
$ pinglist1 207.95.23.14 11 12.31.52.10 died5 207.95.23.14 405 207.95.23.14 exceeded
92 of 139Mapping the Internet and Intranets
Original pinglist sample
$ pinglist1 207.95.23.14 1
Ident: 1Target: 207.95.23.14Packet type: UDP, TTL=1
93 of 139Mapping the Internet and Intranets
Typical sample usage: cscanping all the hosts on a class C net
( for i in `seq 0 255`do
echo $i $1.$i 50donesleep 1for i in `seq 0 255`do
echo $i $1.$i 50donesleep 2
) |pinglist | sort -n -k1,2
94 of 139Mapping the Internet and Intranets
Pinglist -> netio• More parameters
• Can send pings (and a lot of other stuff)
95 of 139Mapping the Internet and Intranets
Pinglist written in 1989• Target and packet type fields are much more
complicated now• Sample target modifications
– Source routing:• 209.123.16.98.1,204.178.16.6
– Tunnelling:• 209.123.16.98:204.178.16.6
• Type fields for ICMP, UDP, TCP with options, DNS, SNMP
• It’s too complicated now
96 of 139Mapping the Internet and Intranets
New version coming• Pinglist “classic” pipe
• SGML packet descriptor type
• Others as we need them
• IPv6
97 of 139Mapping the Internet and Intranets
Newest version of cscan( for i in `seq 0 255`
doecho $i $1.$i P # ICMP probe
donesleep 1for i in `seq 0 255`do
echo $i $1.$i 64 # UDP probedone
) |netio -l 2 -x | sort -n -k1,2
98 of 139Mapping the Internet and Intranets
Simple (and ugly) traceroutefor i in `seq 0 30`do
echo $i $1 $idone |netio –l 2 |sort –n –k1,1 |awk ‘{print $1, $2}’
99 of 139Mapping the Internet and Intranets
Product prototype• Based on netio
• Pipe networks into netscan
• Netscan pipes specific probe packets for many networks into netio
• Returning packets are piped back to netscan
• Netscan writes completed network information to standard output
• All the concurrency and scanning logic is in netscan
100 of 139Mapping the Internet and Intranets
netionetscan ToInte
rnet1 207.95.23.14 1
1 12.31.52.10 died
135.104.0.0/16
135.104.0.0/16 Path=.....
102 of 139Mapping the Internet and Intranets
Prototyping and debugging is easier
• Packet i/o is in text, easily written to a file for debugging
• Printf is easier interface than binary
• Ident field made it easy to match probes with their responses
• Negligible performance hit: these programs are kernel- and network-bound, not CPU hogs
103 of 139Mapping the Internet and Intranets
What a full scan produces: simple text files!np:~/20041028/raw$ ls -ltotal 148858-rw-rw-r-- 1 ches ches 20 Oct 28 03:07 label-somerset-ext-begin-000-rw-rw-r-- 1 ches ches 20 Oct 28 05:09 label-somerset-ext-end-000-rw-rw-r-- 1 ches ches 8836890 Oct 28 05:09 label-somerset-ext-labels-000-rw-rw-r-- 1 ches ches 46 Oct 28 03:07 label-somerset-ext-log-000-rw-rw-r-- 1 ches ches 0 Oct 28 03:07 label-somerset-ext-stopresp-000-rw-rw-r-- 1 ches ches 20 Oct 28 00:48 scan-scan-begin-rw-rw-r-- 1 ches ches 20 Oct 28 05:15 scan-scan-end-rw-rw-r-- 1 ches ches 20 Oct 28 00:48 scan-scan-somerset-begin-rw-rw-r-- 1 ches ches 20 Oct 28 05:15 scan-scan-somerset-end-rw-rw-r-- 1 ches ches 20 Oct 28 05:09 stitch-somerset-udp-begin-000-rw-rw-r-- 1 ches ches 20 Oct 28 05:15 stitch-somerset-udp-end-000-rw-rw-r-- 1 ches ches 6122667 Oct 28 05:15 stitch-somerset-udp-plout-000-rw-rw-r-- 1 ches ches 20 Oct 28 00:48 td-scan-somerset-icmp-begin-000-rw-rw-r-- 1 ches ches 20 Oct 28 03:04 td-scan-somerset-icmp-end-000-rw-rw-r-- 1 ches ches 3657 Oct 28 03:02 td-scan-somerset-icmp-log-000-rw-rw-r-- 1 ches ches 137329650 Oct 28 03:02 td-scan-somerset-icmp-paths-000
104 of 139Mapping the Internet and Intranets
Raw path data162.83.64.0/19 Probe=20041028: Target=20041028:162.83.64.5 Destination=20041028:162.83.64.5 Path=20041028,somerset,ping:65.198.68.33,157.130.95.173,152.63.18.162,152.63.19.33,152.63.21.17,152.63.21.81,204.255.173.25,205.171.17.61,205.171.8.246,205.171.230.10,205.171.8.218,205.171.209.114,205.171.251.22,208.46.127.254,130.81.10.89,130.81.12.146,130.81.5.170,162.83.64.5;R9,9,8,8,6,14,11,11,9,9,20,15,21,21,18,41,35,35;S879,881,883,886,887,890,892,894,896,898,901,903,907,910,913,918,922,927;T255,254,252,251,250,249,249,250,251,250,244,245,244,243,243,241,241,240;I30836,0,0,0,0,0,0,0,63731,7129,0,0,0,0,63638,0,60220,52101
105 of 139Mapping the Internet and Intranets
Raw path data, sample line broken out by fields
• 162.83.64.0/19• Probe=20041028:• Target=20041028:162.83.64.5• Destination=20041028:162.83.64.5 • Path=20041028,somerset,ping:65.198.68.33,157.130.95.173,
152.63.18.162,152.63.19.33,152.63.21.17,152.63.21.81,204.255.173.25,205.171.17.61,205.171.8.246,205.171.230.10,205.171.8.218,205.171.209.114,205.171.251.22,208.46.127.254,130.81.10.89,130.81.12.146,130.81.5.170,162.83.64.5;
• R9,9,8,8,6,14,11,11,9,9,20,15,21,21,18,41,35,35;S879,881,883,886,887,890,892,894,896,898,901,903,907,910,913,918,922,927;
• T255,254,252,251,250,249,249,250,251,250,244,245,244,243,243,241,241,240;
• I30836,0,0,0,0,0,0,0,63731,7129,0,0,0,0,63638,0,60220,52101
106 of 139Mapping the Internet and Intranets
Raw label data:ilookup output
144.232.1.237 sl-gw20-ana-0-0-0.sprintlink.net 204.178.16.49209.194.240.14 (dns1.xspedius.net) 204.178.16.49199.2.212.245 ip245.212.2.199.stingcomm.net 204.178.16.6195.33.160.4 (ns1.att.nl) 65.198.68.67203.213.192.2 (ns1.apnic.net) 65.198.68.67199.18.24.10 owu-atm1-0s1.columbus.oar.net 204.178.16.666.46.50.164 (ns1.business.allstream.net) 65.198.68.6762.154.37.70 ma-eb1.MA.DE.net.DTAG.DE 65.198.68.67
107 of 139Mapping the Internet and Intranets
$ cat >/tmp/1212.0.0.0/8 ATT
$ fnin -n /tmp/12 <$labels | wc -l 2418
108 of 139Mapping the Internet and Intranets
cat "$@" |awk 'BEGIN { sum = 0 }/^#/ { next } # comments ignored/^\s*$/ { next } # lines with only whitespace ignoredNF >= 1 { n = split($1, fields, "/")
if (n != 2) { print "ignoring invalid line (" NR "): " $0 >
"/dev/stderr" next
} sum += 2 ** (32 - fields[2])}
END { printf "%0.0f\n", sum }'
netsize
109 of 139Mapping the Internet and Intranets
$ cat >/tmp/1212.0.0.0/8 ATT
$ fnin -n /tmp/12 <$labels | wc -l 2418$ fnin -n /tmp/12 <$labels | netsize34590812
110 of 139Mapping the Internet and Intranets
Quick prototype question• How many Kuwaiti routers did the scan find?
111 of 139Mapping the Internet and Intranets
$ sed 10q $labels4.78.164.2 (ns2.Level3.net) 65.198.68.67144.232.1.237 sl-gw20-ana-0-0-0.sprintlink.net 204.178.16.49209.194.240.14 (dns1.xspedius.net) 204.178.16.49199.2.212.245 ip245.212.2.199.stingcomm.net 204.178.16.6195.33.160.4 (ns1.att.nl) 65.198.68.67203.213.192.2 (ns1.apnic.net) 65.198.68.67199.18.24.10 owu-atm1-0s1.columbus.oar.net 204.178.16.6207.230.214.82 randolph-k12-wi-us.customer.centurytel.net 128.2.198.12766.46.50.164 (ns1.business.allstream.net) 65.198.68.6762.154.37.70 ma-eb1.MA.DE.net.DTAG.DE 65.198.68.67
112 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels
sl-gw20-ana-0-0-0.sprintlink.net(dns1.xspedius.net)ip245.212.2.199.stingcomm.net(ns1.att.nl)(ns1.apnic.net)owu-atm1-0s1.columbus.oar.netrandolph-k12-wi-us.customer.centurytel.net(ns1.business.allstream.net)ma-eb1.MA.DE.net.DTAG.DE
113 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels | tr 'A-Z' 'a-z' | sed 10q
(ns2.level3.net)sl-gw20-ana-0-0-0.sprintlink.net(dns1.xspedius.net)ip245.212.2.199.stingcomm.net(ns1.att.nl)(ns1.apnic.net)owu-atm1-0s1.columbus.oar.netrandolph-k12-wi-us.customer.centurytel.net(ns1.business.allstream.net)ma-eb1.ma.de.net.dtag.de
114 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels | tr 'A-Z' 'a-z' | tr -d '()' | sed 10q
ns2.level3.netsl-gw20-ana-0-0-0.sprintlink.netdns1.xspedius.netip245.212.2.199.stingcomm.netns1.att.nlns1.apnic.netowu-atm1-0s1.columbus.oar.netrandolph-k12-wi-us.customer.centurytel.netns1.business.allstream.netma-eb1.ma.de.net.dtag.de
115 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels | tr 'A-Z' 'a-z' | tr -d '()' | grep '\.kw$' | sed 10q$
116 of 139Mapping the Internet and Intranets
Answer• None. Why?
• Kuwait.edu != .kw
• Networks are aggregated
• Question: how about Iran?
117 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels | tr 'A-Z' 'a-z' | tr -d '()' | grep '\.ir$' | sed 10q
fe0-0.lyra.bb.niavaran.tehran.sinet.irn2-r3-c7206.iranet.irn2-r2-c7206.iranet.irn2-r4-c7513.iranet.irns.parscyberian.irns1.nic.irfa5-1-1.ipm-gw.bb.niavaran.tehran.sinet.irns.parscyberian.irns.parscyberian.irrouter1.tse.or.ir
118 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels | tr 'A-Z' 'a-z' | tr -d '()' | grep '\.ir$' | wc -l 37$
119 of 139Mapping the Internet and Intranets
Question• How many top-level country codes did the
scan encounter?
120 of 139Mapping the Internet and Intranets
Extract the first ten TLDs
121 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels | tr 'A-Z' 'a-z' | tr -d '()' | awk -v FS=. '{print $NF}' | sed 10qnetnetnetnetnlnetnetnetnetde
122 of 139Mapping the Internet and Intranets
Collect and sort the TLDs
123 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels | tr 'A-Z' 'a-z' | tr -d '()' | awk -v FS=. '{print $NF}' | sort | uniq -c | sort -rn | sed 20q86739 net27564 com6212 [servfail]4568 jp4089 au2294 de2054 ca1982 ru1623 br1510 kr1381 ar1348 mx1320 edu1241 [timeout]1182 it1050 pl1031 ro1005 cn 896 arpa 794 in
124 of 139Mapping the Internet and Intranets
Just two-letter TLDs
125 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels | tr 'A-Z' 'a-z' | tr -d '()' | awk -v FS=. 'NF > 1 {print $NF}' | grep '^..$' | grep '^[a-z]*$' | sort | uniq -c | sort -rn | grep -v '\[' | sed 20q4568 jp4089 au2294 de2054 ca1982 ru1623 br1510 kr1381 ar1348 mx1182 it1050 pl1031 ro1005 cn 794 in 789 id 709 uk 670 fr 603 ua 600 th 593 se
126 of 139Mapping the Internet and Intranets
Check the end of the list for weird entries
127 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels | tr 'A-Z' 'a-z' | tr -d '()' | awk -v FS=. 'NF > 1 {print $NF}' | grep '^..$' | grep '^[a-z]*$' | sort | uniq -c | sort -rn | grep -v '\[' | tail 1 gs 1 gn 1 gh 1 gf 1 fm 1 dj 1 ck 1 bt 1 bd 1 ac
128 of 139Mapping the Internet and Intranets
Count tlds
129 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels | tr 'A-Z' 'a-z' | tr -d '()' | awk -v FS=. 'NF > 1 {print $NF}' | grep '^..$' | grep '^[a-z]*$' | sort | uniq -c | sort -rn | grep -v '\[' | wc -l 177
130 of 139Mapping the Internet and Intranets
Use pr to lay out results in columns
131 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels | tr 'A-Z' 'a-z' | tr -d '()' | awk -v FS=. 'NF > 1 {print $NF}' | grep '^..$' | grep '^[a-z]*$' | sort | uniq -c | sort -rn | grep -v '\[' | pr -8 -t4568 jp 533 tw 161 sk 42 nc 18 kg 8 ao 3 to 1 ne4089 au 479 at 149 lt 41 bm 17 ug 7 tz 3 tm 1 ms2294 de 424 nl 129 pt 40 ma 17 ba 7 pa 3 om 1 mr2054 ca 423 fi 112 ec 39 su 16 zw 7 aw 3 mu 1 mn1982 ru 408 sg 95 lb 39 bo 16 li 6 vi 3 mg 1 jm1623 br 394 za 93 pe 38 yu 16 gi 6 tv 3 fo 1 io1510 kr 391 il 85 si 37 ir 16 bf 6 rw 3 cx 1 gs1381 ar 390 cz 83 vn 35 ge 16 am 6 ng 3 an 1 gn1348 mx 366 be 76 ee 34 ve 15 qa 6 ky 2 tg 1 gh1182 it 338 dk 71 sa 33 pf 15 mc 6 dm 2 pg 1 gf1050 pl 334 co 69 ae 32 uz 15 fj 6 as 2 nr 1 fm1031 ro 328 tr 61 by 29 ie 14 tn 6 ad 2 mv 1 dj1005 cn 324 cl 60 uy 27 mk 14 kh 5 vu 2 gl 1 ck 794 in 313 bg 59 tt 26 np 14 jo 5 mw 2 cd 1 bt 789 id 282 es 57 lu 25 cy 14 gy 5 ml 2 bj 1 bd 709 uk 260 gr 52 cc 24 lk 14 eg 5 bw 2 bh 1 ac 670 fr 259 my 51 kz 23 py 11 mz 5 bs 2 ai 603 ua 257 hu 51 ke 21 mt 11 ci 5 az 1 va 600 th 237 pk 51 hr 20 ag 10 na 4 sv 1 tp 593 se 234 no 47 ni 19 dz 10 bn 4 sb 1 tj 580 nz 189 hk 45 md 19 do 9 hn 4 lc 1 tc 564 us 174 ph 45 is 19 cr 8 ws 4 fk 1 sr 544 ch 171 lv 45 gt 18 nu 8 sz 3 zm 1 rc
132 of 139Mapping the Internet and Intranets
TLDs listed alphabetically
133 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels | tr 'A-Z' 'a-z' | tr -d '()' | awk -v FS=. 'NF > 1 {print $NF}' | grep '^..$' | grep '^[a-z]*$' | sort | uniq -c | sort -k1,1rn -k2,2 | grep -v '\[' | pr -8 -t4568 jp 533 tw 161 sk 42 nc 18 nu 8 ws 3 cx 1 gf4089 au 479 at 149 lt 41 bm 17 ba 7 aw 3 fo 1 gh2294 de 424 nl 129 pt 40 ma 17 ug 7 pa 3 mg 1 gn2054 ca 423 fi 112 ec 39 bo 16 am 7 tz 3 mu 1 gs1982 ru 408 sg 95 lb 39 su 16 bf 6 ad 3 om 1 io1623 br 394 za 93 pe 38 yu 16 gi 6 as 3 tm 1 jm1510 kr 391 il 85 si 37 ir 16 li 6 dm 3 to 1 mn1381 ar 390 cz 83 vn 35 ge 16 zw 6 ky 3 zm 1 mr1348 mx 366 be 76 ee 34 ve 15 fj 6 ng 2 ai 1 ms1182 it 338 dk 71 sa 33 pf 15 mc 6 rw 2 bh 1 ne1050 pl 334 co 69 ae 32 uz 15 qa 6 tv 2 bj 1 rc1031 ro 328 tr 61 by 29 ie 14 eg 6 vi 2 cd 1 sr1005 cn 324 cl 60 uy 27 mk 14 gy 5 az 2 gl 1 tc 794 in 313 bg 59 tt 26 np 14 jo 5 bs 2 mv 1 tj 789 id 282 es 57 lu 25 cy 14 kh 5 bw 2 nr 1 tp 709 uk 260 gr 52 cc 24 lk 14 tn 5 ml 2 pg 1 va 670 fr 259 my 51 hr 23 py 11 ci 5 mw 2 tg 603 ua 257 hu 51 ke 21 mt 11 mz 5 vu 1 ac 600 th 237 pk 51 kz 20 ag 10 bn 4 fk 1 bd 593 se 234 no 47 ni 19 cr 10 na 4 lc 1 bt 580 nz 189 hk 45 gt 19 do 9 hn 4 sb 1 ck 564 us 174 ph 45 is 19 dz 8 ao 4 sv 1 dj 544 ch 171 lv 45 md 18 kg 8 sz 3 an 1 fm
134 of 139Mapping the Internet and Intranets
TLDs sorted by frequency, subsorted alphabetically
135 of 139Mapping the Internet and Intranets
$ awk '{print $2}' $labels | tr 'A-Z' 'a-z' | tr -d '()' | awk -v FS=. 'NF > 1 {print $NF}' | grep '^..$' | grep '^[a-z]*$' | sort | uniq -c | awk '{print $2, $1}' | sort | grep -v '\[' | pr -8 -tac 1 bo 39 do 19 hn 9 lk 24 ng 6 sa 71 ua 603ad 6 br 1623 dz 19 hr 51 lt 149 ni 47 sb 4 ug 17ae 69 bs 5 ec 112 hu 257 lu 57 nl 424 se 593 uk 709ag 20 bt 1 ee 76 id 789 lv 171 no 234 sg 408 us 564ai 2 bw 5 eg 14 ie 29 ma 40 np 26 si 85 uy 60am 16 by 61 es 282 il 391 mc 15 nr 2 sk 161 uz 32an 3 ca 2054 fi 423 in 794 md 45 nu 18 sr 1 va 1ao 8 cc 52 fj 15 io 1 mg 3 nz 580 su 39 ve 34ar 1381 cd 2 fk 4 ir 37 mk 27 om 3 sv 4 vi 6as 6 ch 544 fm 1 is 45 ml 5 pa 7 sz 8 vn 83at 479 ci 11 fo 3 it 1182 mn 1 pe 93 tc 1 vu 5au 4089 ck 1 fr 670 jm 1 mr 1 pf 33 tg 2 ws 8aw 7 cl 324 ge 35 jo 14 ms 1 pg 2 th 600 yu 38az 5 cn 1005 gf 1 jp 4568 mt 21 ph 174 tj 1 za 394ba 17 co 334 gh 1 ke 51 mu 3 pk 237 tm 3 zm 3bd 1 cr 19 gi 16 kg 18 mv 2 pl 1050 tn 14 zw 16be 366 cx 3 gl 2 kh 14 mw 5 pt 129 to 3bf 16 cy 25 gn 1 kr 1510 mx 1348 py 23 tp 1bg 313 cz 390 gr 260 ky 6 my 259 qa 15 tr 328bh 2 de 2294 gs 1 kz 51 mz 11 rc 1 tt 59bj 2 dj 1 gt 45 lb 95 na 10 ro 1031 tv 6bm 41 dk 338 gy 14 lc 4 nc 42 ru 1982 tw 533bn 10 dm 6 hk 189 li 16 ne 1 rw 6 tz 7
136 of 139Mapping the Internet and Intranets
Data != information• The data collection is conceptually easy
• There are a lot of details that weren’t obvious when we started– Packet rates over dial-up connections,
where data speeds change dynamically– How do you manage packet timeouts in
such an environment?
137 of 139Mapping the Internet and Intranets
cat <<!EOF >index.html<html><body>
<h1>Summary of Paths from Somerset</h1>
`showsum somerset`
</body></html>!EOF
A prototype for the venture capitalists
138 of 139Mapping the Internet and Intranets
Converting the data into information has been the hardest part
• We are selling the third rewrite of the report now
• Difficult combination of data display, user interface, and network technical expertise
• Helpful to read Jef Raskin, Don Norman, and of course, Tufte
134 slides
Some Internet Mapping Innards
and LessonsBill Cheswick
http://www.cheswick.com