internet measurement masterclass 2006

Post on 02-Jan-2016

25 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Internet Measurement Masterclass 2006. 10:00 Session 1: Kick off, problem space, thinking ahead, you and the law Andrew Moore - Queen Mary, University of London 11:00 Morning tea 11:15 Session 2: Monitoring with Windows and how not to be deluged with data - PowerPoint PPT Presentation

TRANSCRIPT

Internet Measurement Masterclass 2006

10:00 Session 1:Kick off, problem space, thinking ahead, you and the law

Andrew Moore - Queen Mary, University of London

11:00 Morning tea11:15 Session 2:

Monitoring with Windows and how not to be deluged with dataDinan Gunawardena - Microsoft Research Cambridge

12:15Hardware selection for monitoring

Fabian Schneider - TU Berlin

12:45 Lunch + concurrently with Endace hardware demonstration13:45 Session 3:

Netflow, and routing data as a source of measurementSteve Uhlig - Delft University of Technology

14:45 Afternoon tea15:00 Session 4:

Statistics for the measurement communitySteven Gilmour - Queen Mary, University of London

15:45 Wrap-up16:00 beer / NGN ProgNet06 workshop starts

Kick-off

Andrew Moore

Queen Mary, University of London

www.dcs.qmul.ac.uk/~awm

What we won’t cover

• Active measurement (AMP, ping, traceroute, rrt, planetlab)

• Exhaustive survey of current measurement research

• I’m happy to provide opinion on these things in a break, but

I am not an active-measurement expert, I don’t even play-one on television.

WHY Measure?

• Measuring something helps you understand it

Few would argue the Internet is important enough to understand

- Good data outlives bad theory- Jeff Dozier

- Measure what is measurable, make measurable what is not.

- after Galelio

Why?a non-exhaustive list

• Measurements are inputs to– validate a model– drive a simulation– test a new approach

• Measurements help understanding (fault-finding)

• Measurements are often part of the accounting process

Why so hard?

Wrong.

-Law

-Level 2 is not always

-accessible

-monitor-able

-Operations staff hate you

1Other monitoring boards are available

Pick your (Endace1) Dag board, plug it in and go. Right?

-Data on the wire is not the only first class measurement object

-Hardware doesn’t work

-Wrong Measurements

-Wrong Interpretation

-Wrong Problem

Where should I start?

• Ask WHY are you measuring?

“Measure twice & cut once”

great for carpenters but

“Think (at least) twice and measure once”

is better for us.

Pick the right tool for the right job

• Measurement of packets on a wire in your lab– Great for observing once specific use of

one set of applications in one place in the Internet

– Terrible for telling you how many mobile devices are used for IPtv in China, or the connectivity among world ISPs, or ….

Uh-Oh

• Who are you going to measure? 1 user? 1000 users?

• When? (what time of the day?)• Where? (your personal machine, a

campus? a country?)• How?

– How-long? a day? week? month?– What method are you going to use?

Law(I am Not a Lawyer and this is UK Law)

• If in doubt, seek out advice• Everything is illegal• Don’t ask a question you don’t want to know

the answer to.

• We care about– RIPA (Interception)– DPA (personal-data storage)

Many Thanks to Richard Clayton and Andrew Cormack

Data Protection Act 1998

• Overriding aim is protect the interests of (and avoid risks to) the Data Subject

• Data processing must comply with the eight principles (as interpreted by the regulator)

• All data controllers must “notify” (£35) the Information Commissioner (unless exempt)– Exceptions for “private use”, “basic business purpose”: see the website

Data Protection act (1998)

• Principle 7 is specially relevant– Appropriate technical and organization measures

shall be taken against unauthorized or unlawful processing of personal data and against accidental loss or destruction of, or damage to personal data

• The Information Commissioner advises that a risk-based approach should be taken in determining what measures are appropriate– Management and organizational measures are as

important as technical ones– Pay attention to data over its entire lifetime

RIP Act 2000

• Part I, Chapter I interception

• Part I, Chapter II communications data

• Part II surveillance & informers

• Part III encryption– not as relevant for this

• Part IV oversight– sets up tribunal and interception commissioner

RIP Act 2000 - Interception

• Tapping a telephone (or copying an email) is “interception”. It must be authorized by a warrant signed by the secretary of state.– SoS means the home secretary (or similar). Power

delegation is temporary. Product is not admissible in court

• Some sensible exceptions exist– Delivered data– Stored data that can be accessed by the production of

an order– Techies running a network– “Lawful business practice”

Lawful Business Practice

• Regulations prescribe how not to commit an offence under the RIP act. They do not specify how to avoid problems with DPA (or other legislation)

• Must make all reasonable efforts to tell all users of system that interception may occur

Law One-slider• If in doubt - ask someone!• Why do you want to do this?

– bare minimum, no “data for data’s sake”– the onus is on you at all times to justify what you

are doing

• Unless you want to keep the DPA happy; don’t keep any personal identifiers

• Use your University ethics committee

I am NOT a Lawyer!

(Good) Measurement Principles

• Check your methodology• Keep all Meta-data• Calibrate your experiments• Automate all processing

– it’s a documentation trail– cache those intermediate results; they tell

you where you went wrong

• Visualize your data at every stage– this helps ensure you didn’t goof

Check your Methodology

• Talk to people around you, find a mentor and even an antagonist

• Better they find something wrong than the external examiner or the reviewers of the paper

• Consider the scope of a reasonable measurement and the claims you can make

Meta-Data

• the filter you used on tcpdump is meta-data.

• your methodology is meta-data• the day/time of the week is meta-data• the hardware you used is meta-data• (possibly) how much alcohol in your

blood-stream is meta-dataKeep it all

Calibrate your experiments• Test your assumptions

• (been assuming the network is busiest at midday - okay this is the moment you find that 3:30 is the busy time)

• “bench-test” your setup; this is just good science – test your processing scripts many (many)

times

• Most departments do not have good test equipment, this is no excuse

Automate your processing

• Make is your friend

• intermediate processing (and the scripts/code that did it) are more meta-data

• critical when you want to reproduce your results (and have others reproduce your results)

Visualize your data

• visualize your data early and often

• scatter plots are always useful

• identify/understand those outliers now– problem? or expected result?

My first network monitor

• configurations– monitor and method

• gotcha

• backhaul network

• storage, archive, index

Configuration

• Hardware selection– How are you going to remote-admin this machine?

• OS / Software selection– Much work in unix domain; that doesn’t make it

good-work; Dinan – tcpdump/pcap is standard and lots of tools

• Not fast, loss-error prone, timestamps are junk,

– divorce the data representation from the method• tcpdump is a useful offline tool but dagtools, CoMo and

others (nprobe, etc) are simply better online

– consider the right tool for the task

Hardware (getting the traffic)

• Passive taps– invasive installation– no impact in operation– “stealing photons”

• Port Mirrors (e.g. Cisco SPAN)– be vewy vewy careful.

• jitter, loss, reordering

– fantastic for multiple/redundant links• multiple copies of packets

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Hardware 2

• Remember about physical layers?• Observing traffic at end systems is pretty

easy (but imposes an overhead)• intermediate networks may not be trivial to

monitor:– Packet over Ethernet, Packet over Sonet are not

the only possibilities

• Aside from weird layer-2s, maybe encrypted,

Getting the data to somewhere useful

• Out of Band backhaul

– Co-schedule Measurements– FedEx the disks

(realistically - postgrad-u-haul)

– Co-locate storage/processing• storage & processing = heat/power

– Dedicated backhaule.g. using (a piece of) the dedicated research net

Tools• tcpdump (libpcap) - but know the limitationsa) no records of lossb) microsecond accuracy only - and RARELY thatc) simultaneous arrival times are possibled) no record of precision or accuracy or filter or conditions

or monitor-circumstance or equipment failure or …

• gnuplot (or any plotting packet)scatter plot are always useful (combined with eye-

squared)

SharingProviding Access to the data

• Law may prevent access• Either need to control who gets dataOR• Ship code to monitor

(Mogul et al, MineNet 2005/6)

• One PlatformCoMo http://como.sourceforge.net

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

These guys do run the Internet(or why I should be nice to my ops guys)

• Looking for a real problem?• Wondering about actual impact?• Talk to your front line• Sysadmins and Operators are front-line• They are rarely stupid• Don’t have the time to “think outside the box”• they will be honest with you (brutally honest in

most cases)• www.nanog.org • www.ripe.org

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Next….

• Lets examine hardware and Operating Systems issues, specifically:– Windows: the other operating-system– Data-management: how to prevent success-

disaster

– So you want to monitor 10Gbps?

Suppliers

• NetOptics - fibre splitters

• Endace - capture hardware

UK specific resources

• Janet’s NDA and AUP:http://www.ja.net/development/traffic-data/

• Data Protection Act:http://www.hmso.gov.uk/acts/acts1998/19980029.htm

• RIPAhttp://www.legislation.hmso.gov.uk/acts/acts2000/20000023.htm

Specific references• Mark Crovella & Bala Krishnamurthy, Internet Measurement, Wiley

2006

• Walter Willinger, Pragmatic Approach to Dealing with High Variability, IMC 2004

• Vern Paxson, Sound Internet Measurement, IMC 2004

Very early “what I did with my measurements” paper; these papers grandparent much Internet measurement work

• kc claffy, etal, A parameterizable methodology for Internet traffic flow profiling, IEEE JSAC, 1995

• V. Paxson, End-to-End Routing Behavior in the Internet. IEEE/ACM Transactions on Networking, Vol.5, No.5, pp. 601-615, October 1997

top related