3 april 2010 the ripe ncc internet measurement data repository shane alcock

41
3 April 2010 The RIPE NCC Internet Measurement Data Repository Shane Alcock

Upload: esmond-ross

Post on 31-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

3 April 2010

The RIPE NCC Internet Measurement Data Repository

Shane Alcock

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 2

Introductions

• Research Programmer with WAND

• NOT affiliated with RIPE NCC, just speaking on their behalf

• Passive measurement

• Organise packet trace captures

• Maintainer of the WITS website

• Experienced in dealing with measurement data sets

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 3

Outline

• Sharing Internet datasets

• Challenges

• Case studies

• The RIPE NCC repository

• Available datasets

• Other RIPE datasets that may be added

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 4

Sharing Measurement Data

• Internet measurement research requires data

• Often it is difficult to collect suitable data

• Privacy

• Security

• Cost of infrastructure

• Selecting appropriate times and locations

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 5

Sharing Measurement Data

• Sharing data with the community is an awesome idea

• Saves time and effort

• Promotes collaboration

• Enables validation of previous results

• Encourages others to share their data as well

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 6

Sharing Measurement Data

• WITS – Waikato Internet Traffic Storage

• http://www.wand.net.nz/wits

• CAIDA

• http://www.caida.org/data/

• PREDICT

• https://www.predict.org/

• CRAWDAD

• http://crawdad.cs.dartmouth.edu/data.php

• NLANR

• No longer exists :(

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 7

Challenges

• Community awareness

• Datasets are scattered amongst multiple hosts

• Lack of publicity and detailed information about datasets

• Meta-data

• DatCat (CAIDA)

• http://www.datcat.org

• Catalogue of publicly available datasets

• Not an actual repository – data is hosted externally

• Not a comprehensive resource

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 8

Challenges

• Repositories often maintained by research groups

• Limited funding, therefore limited resources

• People

• Expertise

• Disk space

• Bandwidth

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 9

Case Study: WITS

• Maintenance is intermittent

• Maintainer has many other responsibilities

• Disk space is a huge limitation

• No room on the FTP server to put new data sets

• Adding new disks costs both money and time

• Sanitizing datasets requires even more space as we must retain the original version as well

• Bandwidth

• Cost of commercial bandwidth hinders availability of data

• Enable access via KAREN (NZ national research network) only

• Fortunately, KAREN peers with many international NRENs

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 10

Challenges

• Permanence

• Research groups typically depend on competitive funding

• Funding runs out – repository vanishes

• Loss of data is a major issue

• No longer able to replicate and validate previous studies

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 11

Case Study: NLANR

• Large public archive of measurement data

• Auckland, Abilene traces (PMA)

• AMP

• US government ceased funding

• Repository no longer maintained

• Domain eventually expired

• CAIDA and WAND salvaged the data

• Traces now available on WITS

• Without intervention, the data could easily have been lost permanently

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 12

Challenges

• Avoiding inappropriate disclosure

• Anonymisation of sensitive information, e.g. IP addresses

• Developing policy to cover user access and agreements

• Many datasets have unique restrictions or policies

• Policy that is appropriate for one dataset is not for another

• Personal contact information

• IP addresses

• User payload in packet traces

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 13

Challenges

• Communication with users

• Data sharing is often not top priority for collectors

• Collection designed to suit their purposes

• Small changes to the collection process can often make the data more useful to a wider audience

• Encourage users to engage with collectors

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 14

Challenges

• Support

• Measurement data is complicated to deal with

• Steep learning curve

• Formats, e.g. PCAP vs ERF vs legacy DAG formats for traces

• Tools / Processing libraries

• Timezones

• Documentation of shared datasets is often poor

• User support is intermittent, due to lack of resources again

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 15

Challenges

• Size

• Internet measurement datasets are huge

• Push modern storage technologies to the limit

• Server hosting and maintenance

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 16

The RIPE NCC Repository

• RIPE NCC collects a lot of measurement data already

• They want to share this data with the community

• Most is already available through various repositories

• Develop a single common and consistent platform

• Hosting

• Browsing

• Accessing and downloading data

• Open to other collectors who wish to share data

• Still under development

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 17

Hardware

• 2 servers – Master and back-up

• Size: 9U

• Disk: 48x 2TB on 2 controllers – 2 cold spares

• CPU: 2x Quad core Xeon L5420 2.5GHz

• Memory: 32GB

• Chassis: Chenbro RM91250

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 18

Hardware

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 19

Features of the RIPE NCC Repository

• Longevity

• RIPE NCC does not depend on competitive research funding

• Generating and keeping Internet measurement data for ~20 years

• Long time-series data

• Much less likely that the repository will disappear

• Emphasis on mirroring rather than replacing other repositories

• Host anonymized versions of data

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 20

Features of the RIPE NCC Repository

• Resources

• RIPE NCC manages servers, infrastructure

• Larger repository can justify a dedicated support staff

• Experience and expertise are important

• Diversity

• Variety of datasets from different collectors

• Increased awareness of new datasets

• One user account can access many different datasets

• Self sign-up for “basic access”

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 21

Features of the RIPE NCC Repository

• Communication

• Bridge the gap between data collectors and users

• Raise awareness of existing data

• Gather feedback from the user community

• Develop relationships with other data collectors

• Links to useful tools and libraries for processing data

• Share expertise as well as data

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 22

Available Datasets

• Data collected by RIPE NCC

• RIS routing database

• Reverse DNS delegations made by RIRs

• Data from external sources

• WITS

• Ex-NLANR data

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 23

Routing Information Server (RIS)

• 16 route collectors peering with 600 BGP routers

• Mostly within the RIPE region

• ~100 peers provide complete routing tables

• Routes are collected and published in MRT format

• Updates every 5 minutes

• Full table dump every 8 hours

• All data collected since 2000 has been retained

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 24

Routing Information Server (RIS)

• Other methods of access

• Last 3 months of data exported to MySQL database

• Weekly statistical reports

• Looking Glass queries

• Tools to query and visualise RIS data

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 25

Reverse DNS Zones

• (Partial) Reverse DNS delegations made by RIRs

• Generated using RIPE DB reverse DNS objects

• ~410,000 reverse DNS objects

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 26

Auckland

• Passive traces taken at the University of Auckland

• Auckland II – VII were previously available through NLANR

• Frequently feature in measurement literature

• Currently available from WITS archive

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 27

Waikato

• Passive traces taken at the University of Waikato

• Long duration continuous traces

• Waikato I is available

• Other Waikato sets will be included at a later date

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 28

NLANR

• Other NLANR datasets that were preserved by WAND

• IPLS (also known as Abilene)

• Leipzig

• Active Measurement Project (AMP)

• Much of this data is also currently available from WITS

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 29

Other Datasets

• Collected by RIPE NCC

• Not currently in the repository but may be added later

• K-root and reverse DNS server statistics and traces

• Hostcount

• TTM

• DNSMON

• AS112

• Other parts of RIPE DB

• These are covered in more detail in the paper

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 30

K-root

• Internet root name service operated by RIPE NCC

• PCAP traces of incoming port 53 traffic (DNS queries)

• 50 hours of traces included in CAIDA's DITL project

• DNS Statistics Collector (DSC)

• Summarises DNS traffic into 1 minute bins

• Generate graphs shown on the K-root website

• Raw data exported to DNS-OARC

• SNMP statistics

• Originate from RIPE NCC in Amsterdam

• Summarised and exported to an RRD

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 31

Reverse DNS

• 4 reverse DNS servers operated by RIPE NCC

• 50,000 queries per second (3x load of K-root)

• High query rate means regular trace collection is infeasible

• DSC used on each of the rDNS servers

• Raw data and graphs only available within RIPE NCC

• Could be made available if there was a need

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 32

AS112

• AS number for RFC 1918 private address space

• http://public.as112.net/

• Dynamic DNS update and rDNS server for AS112

• Hosted by RIPE NCC

• Goal is to measure and analyse DNS updates for invalid addresses

• PCAP trace collected annually and contributed to DITL

• More frequent captures could be scheduled if needed

• DSC data also collected

• Graphs publicly available from RIPE NCC AS112 site

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 33

Hostcount

• Monthly DNS scan of ~100 TLDs within the RIPE region

• Count A and PTR records for both forward and reverse Ipv4

• Also count forward AAAA for IPv6 addresses

• Not exhaustive, due to public zone transfers being disabled

• Statistics published via Hostcount website

• Raw data from 1990-2007 is archived off-line

• Current policy is to discard raw data after statistic extraction

• But this could be reversed if there is a need

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 34

Test Traffic Measurements (TTM)

• Active measurement system of ~100 probes

• Most probes located at ISPs and universities within Europe

• Not all are included in public measurements

• Regular series of active tests

• UDP one-way delay, traceroute, DNSMON, IPv6 PMTU

• Also supports ad-hoc measurements by authorised users

• Ping, HTTP page fetch

• Can also develop and run arbitrary tests

• Results not released outside of RIPE NCC

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 35

Test Traffic Measurements (TTM)

• Bulk data published using CERN ROOT

• Performance graphs on the TTM website

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 36

DNSMON

• Measures the reachability and latency of DNS

• Collected using 60 TTM probes

• Root domain, .com, .net, .org, e164.arpa, 24 CC-TLDs measured

• IPv4 and IPv6 performance measured

• Summary statistics and graphs are publicly available

• Only paying subscribers can access most recent graphs

• Raw data also available upon request

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 37

RIPE DB

• Internet number registration objects for the RIPE region

• IP addresses and AS numbers

• Reverse DNS objects

• Used to create zone files for the reverse DNS service

• Route registry objects

• Used to provide an Internet Routing Registry

• Conforms to RPSL and RFC 2650

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 38

RIPE DB

• Public queries supported via command-line and web

• Daily limit imposed on queries that include personal info

• Bulk data is available via FTP

• Personal details are not included

• Can subscribe to a near real-time mirror of the database

• Restrictions on personal data are very broad

• Can result in inappropriate limitations

• Better access policies and mechanisms should resolve this

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 39

Links

RIS http://www.ripe.net/ris

RIPE DB http://www.ripe.net/db

K-root http://k.root-servers.org

TTM http://www.ripe.net/ttm

Hostcount http://www.ripe.net/is/hostcount/stats

DNSMON http://dnsmon.ripe.net/dns-servmon

AS112 http://www.ripe.net/as112

WITS http://www.wand.net.nz/wits

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 40

Conclusion

• Repository is a 'beta'

• Server exists and some datasets are available for download

• Interested users can be given access

• Looking for feedback and ideas

• Development of policy, particularly for access

• Data collection

• Improving the RIPE datasets to be more useful to researchers

• Acquiring more external datasets

• Contributions of data, analysis tools

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 41

Contact

http://data-repository.ripe.net

[email protected]

[email protected]