silk analysis handbook

121
Pittsburgh, PA 15213-3890 Using SiLK for Network Traffic Analysis ANALYSTS’ HANDBOOK for SiLK versions 2.1.0 and later Timothy Shimeall Sidney Faber Markus DeShon Andrew Kompanek September 2010 CERT R Network Situational Awareness Group i

Upload: arunaambalavanan

Post on 24-Apr-2015

75 views

Category:

Documents


5 download

TRANSCRIPT

Pittsburgh, PA 15213-3890

Using SiLK for Network Trac Analysis ANALYSTS HANDBOOKfor SiLK versions 2.1.0 and later Timothy Shimeall Sidney Faber Markus DeShon Andrew Kompanek

September 2010

CERT R Network Situational Awareness Group

i

This work is sponsored by the U.S. Department of Defense. The Software Engineering Institute is a federally funded research and development center sponsored by the U.S. Department of Defense.

Copyright 2005-2010 Carnegie Mellon University. NO WARRANTY THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN AS-IS BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. Use of any trademarks in this report is not intended in any way to infringe on the rights of the trademark holder. The authors wish to acknowledge the valuable contributions of all members of the CERT Network Situational Awareness Team, past and present, to the concept and execution of the SiLK Tool Suite and to this handbook. Many individuals contributed as reviewers and evaluators of the material in this handbook. Of especial mention are Michael Collins, Ph.D., who was responsible for the initial draft of this handbook and for the development of the earliest versions of the SiLK tool suite, and Mark Thomas, Ph.D., who transitioned the handbook from Microsoft Word to LaTeX, patiently and tirelessly answered many technical questions from the authors, and shepherded the maturing of the SiLK tool suite. The many users of the SiLK tool suite have also contributed immensely to the evolution of the suite and its tools, and are gratefully acknowledged. Lastly, the authors wish to acknowledge their ongoing debt to the memory of Suresh L. Konda, Ph.D., who lead the initial concept and development of the SiLK tool suite as a means of gaining network situational awareness.

ii

ContentsHandbook Goals 1 Networking Primer and Review of UNIX 1.1 TCP/IP Networking Primer . . . . . . . . 1.1.1 IP Protocol Layers . . . . . . . . . 1.1.2 Structure of the IP Header . . . . 1.1.3 IP Addressing and Routing . . . . 1.1.4 Major Protocols . . . . . . . . . . 1.2 Review of UNIX Skills . . . . . . . . . . . 1.2.1 Using the UNIX Command Line . 1.2.2 Using Pipes . . . . . . . . . . . . . Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 3 3 3 4 4 6 11 11 13 15 15 15 16 16 18 19 19 20 21 23 23 24 25 29 30 31 31 32 32 36 37 37 38 43 43 45

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

2 The SiLK Flow Repository 2.1 What Is Network Flow Data? . . . . . . . . . . . . . . 2.1.1 Structure of a Flow Record . . . . . . . . . . . 2.2 Flow Generation and Collection . . . . . . . . . . . . . 2.3 Introduction to Flow Collection . . . . . . . . . . . . . 2.3.1 Where Network Flow Data Is Collected . . . . 2.3.2 Types of Enterprise Network Trac . . . . . . 2.3.3 The Collection System and Data Management 2.3.4 How Network-Flow Data Is Organized . . . . . 2.4 SiLK support . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

3 Essential SiLK Tools 3.1 Suite Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Selecting Records with rwfilter . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 rwfilter Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Finding Low-Packet Flows with rwfilter . . . . . . . . . . . . . . 3.2.3 Using IPv6 with rwfilter . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Using Pipes with rwfilter . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Translating Signatures Into rwfilter Calls . . . . . . . . . . . . . 3.2.6 rwfilter and Tuple Files . . . . . . . . . . . . . . . . . . . . . . . 3.3 Describing Flows with rwstats . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Creating Time Series with rwcount . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Examining Trac Over a Month . . . . . . . . . . . . . . . . . . . 3.4.2 Counting by Bytes, Packets, and Flows . . . . . . . . . . . . . . . 3.4.3 Changing the Format of Data . . . . . . . . . . . . . . . . . . . . . 3.4.4 Using the --load-scheme Parameter for Dierent Approximations 3.5 Displaying Flow Records Using rwcut . . . . . . . . . . . . . . . . . . . . 3.5.1 Pagination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

3.6 3.7

3.5.2 Selecting Fields to Display . . . . 3.5.3 Selecting Fields for Performance 3.5.4 Rearranging Fields for Clarity . . 3.5.5 Field Formatting . . . . . . . . . 3.5.6 Selecting Records to Display . . Sorting Flow Records With rwsort . . . 3.6.1 Behavioral Analysis with rwsort, Counting Flows With rwuniq . . . . . . 3.7.1 Using Thresholds with rwuniq . 3.7.2 Counting IPv6 Flows . . . . . . . 3.7.3 Counting on Compound Keys . . 3.7.4 Using rwuniq to Isolate Behavior

. . . . . . . . . . . . . . . . . . . . . . . . rwcut . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . and rwfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

46 47 47 48 50 51 51 52 53 54 55 55 57 57 57 58 59 59 62 63 65 67 69 69 70 71 71 72 72 73 74 76 77 79 79 79 80 80 82 83 85 86 90 92 93 93 94 95 96 96 97

4 Using the Larger SiLK Tool Suite 4.1 Common Tool Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Structure of a Typical Command-Line Invocation . . . . . . . . . . 4.1.2 Getting Tool Help . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Manipulating Flow-Record Files . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Combining Flow Record Files with rwcat and rwappend . . . . . . 4.2.2 Merging While Removing Duplicate Flow Records with rwdedupe 4.2.3 Dividing Flow Record Files with rwsplit . . . . . . . . . . . . . . 4.2.4 Keeping Track of File Characteristics with rwfileinfo . . . . . . 4.2.5 Creating Flow-Record Files from Text with rwtuc . . . . . . . . . 4.3 Analyzing Packet Data with rwptoflow and rwpmatch . . . . . . . . . . . 4.3.1 Creating Flows from Packets Using rwptoflow . . . . . . . . . . . 4.3.2 Matching Flow Records With Packet Data Using rwpmatch . . . . 4.4 IP Masking with rwnetmask . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Summarizing Trac with IP Sets . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 What are IP Sets? . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Creating IP Sets with rwset . . . . . . . . . . . . . . . . . . . . . 4.5.3 Reading Sets with rwsetcat . . . . . . . . . . . . . . . . . . . . . 4.5.4 Manipulating Sets with rwsettool . . . . . . . . . . . . . . . . . . 4.5.5 Using rwsettool --intersect to Fine-Tune IP Sets . . . . . . . . 4.5.6 Using rwsettool --union to Examine IP Set Structure . . . . . . 4.5.7 Backdoor Analysis with IP Sets . . . . . . . . . . . . . . . . . . . . 4.6 Summarizing Trac with Bags . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 What Are Bags? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Using rwbag to Generate Bags from Data . . . . . . . . . . . . . . 4.6.3 Reading Bags Using rwbagcat . . . . . . . . . . . . . . . . . . . . 4.6.4 Using Bags: A Scanning Example . . . . . . . . . . . . . . . . . . 4.6.5 Manipulating Bags Using rwbagtool . . . . . . . . . . . . . . . . . 4.7 Labeling Related Flows with rwgroup and rwmatch . . . . . . . . . . . . . 4.7.1 Labeling Based on Common Attributes with rwgroup . . . . . . . 4.7.2 Labeling Matched Groups with rwmatch . . . . . . . . . . . . . . . 4.8 Adding IP Attributes with Prex Maps . . . . . . . . . . . . . . . . . . . 4.8.1 What are Prex Maps? . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2 Creating a Prex Map . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.3 Selecting Flow Records with rwfilter and Prex Maps . . . . . . 4.8.4 Working with Prex Values Using rwcut and rwuniq . . . . . . . . 4.8.5 Using a Country-Code Mapping via rwip2cc . . . . . . . . . . . . 4.8.6 Where to Go for More Information on Prex Maps . . . . . . . . . 4.9 Gaining More Features with Plug-Ins . . . . . . . . . . . . . . . . . . . . . iv

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 Using PySiLK For Advanced Analysis 99 5.1 rwfilter and PySiLK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2 rwcut, rwsort, and PySiLK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6 Closing 107

v

vi

List of Figures1.1 1.2 1.3 1.4 1.5 2.1 2.2 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 IP Protocol Layers . . . . . Structure of the IP Header TCP Header . . . . . . . . TCP State Machine . . . . UDP and ICMP Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 8 9 10 17 18 25 28 34 37 38 39 40 41 44 44 52 52 59 59 60 61 62 64 69 70 72 73 74 78 80 81 83 86 90 93 96

From Packets to Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default Trac Type for Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwfilter Parameter Relationships . . . . . . . . . . . rwfilter Partitioning Parameters . . . . . . . . . . . Summary of rwstats . . . . . . . . . . . . . . . . . . Summary of rwcount . . . . . . . . . . . . . . . . . . Displaying rwcount Output Using gnuplot . . . . . . Focusing gnuplot Output on a Single Hour . . . . . . Improved gnuplot Output Based on a Larger Bin Size Comparison of Byte and Record Counts over Time . . Dierences Between Load Schemes . . . . . . . . . . . Summary of rwcut . . . . . . . . . . . . . . . . . . . . Summary of rwsort . . . . . . . . . . . . . . . . . . . Summary of rwuniq . . . . . . . . . . . . . . . . . . . Summary of rwcat . . . . . . . . . . . . . . . . . Summary of rwappend . . . . . . . . . . . . . . . One Display of Large Volume Flows . . . . . . . Another Display of Large Volume Flows . . . . . Summary of rwdedupe . . . . . . . . . . . . . . . Summary of rwsplit . . . . . . . . . . . . . . . Summary of rwptoflow . . . . . . . . . . . . . . Summary of rwpmatch . . . . . . . . . . . . . . . Summary of rwset . . . . . . . . . . . . . . . . . Summary of rwsetcat . . . . . . . . . . . . . . . Summary of rwsettool . . . . . . . . . . . . . . Graph of Hourly Source IP Address Set Growth . Summary of rwbag . . . . . . . . . . . . . . . . . Summary of rwbagcat . . . . . . . . . . . . . . . Summary of rwbagtool . . . . . . . . . . . . . . Summary of rwgroup . . . . . . . . . . . . . . . Summary of rwmatch . . . . . . . . . . . . . . . Summary of rwpmapbuild . . . . . . . . . . . . . Summary of rwip2cc . . . . . . . . . . . . . . . vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

viii

List of Tables1.1 1.2 1.3 3.1 3.2 3.3 3.4 3.5 3.6 4.1 IPv4 Reserved Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IPv6 Reserved Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some Common UNIX Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwfilter Input Parameters . . . . . . . . . . . . . . rwfilter Selection Parameters . . . . . . . . . . . . Commonly-Used rwfilter Partitioning Parameters . rwfilter Output Parameters . . . . . . . . . . . . . Other Parameters . . . . . . . . . . . . . . . . . . . . Arguments for the --fields Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 12 25 26 27 28 29 46 97

Current SiLK Plug-ins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

x

List of Examples1-1 1-2 1-3 1-4 1-5 2-1 3-1 3-2 3-3 3-4 3-5 3-6 3-7 3-8 3-9 3-10 3-11 3-12 3-13 3-14 3-15 3-16 3-17 3-18 3-19 3-20 3-21 3-22 3-23 3-24 3-25 3-26 3-27 3-28 3-29 3-30 3-31 3-32 3-33 3-34 3-35 A UNIX Command Prompt . . . . . . . . . . . . . . . . . . . . . . . . Example Using Common UNIX Commands . . . . . . . . . . . . . . . A Simple Command Line . . . . . . . . . . . . . . . . . . . . . . . . . A Simple Piped Command . . . . . . . . . . . . . . . . . . . . . . . . . Using a Named Pipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using mapsid to Obtain a List of Sensors . . . . . . . . . . . . . . . . Using rwfilter to Count Trac to an External Network . . . . . . . Using rwfilter to Extract Low-Packet Flow Records . . . . . . . . . Using rwfilter to Process IPv6 Flows . . . . . . . . . . . . . . . . . . Using rwfilter to Detect IPv6 Neighbor Discovery Flows . . . . . . . rwfilter --pass and --fail to Partition Fast and Slow High-Volume rwfilter With a Tuple File . . . . . . . . . . . . . . . . . . . . . . . . Using rwstats To Count Protocols and Ports . . . . . . . . . . . . . . rwstats --sport --percentage to Prole Source Ports . . . . . . . . rwstats --dport --top --count to Examine Destination Ports . . . . rwstats --copy-input and --output-path to Chain Calls . . . . . . rwcount for Counting with Respect to Time Bins . . . . . . . . . . . . rwcount Sending Results to Disk . . . . . . . . . . . . . . . . . . . . . rwcount --bin-size to Better Scope Data for Graphing . . . . . . . . rwcount Alternate Date Formats . . . . . . . . . . . . . . . . . . . . . rwcount --start-epoch to Constrain Minimum Date . . . . . . . . . rwcount Alternative Load Schemes . . . . . . . . . . . . . . . . . . . . rwcut for Display the Contents of a File . . . . . . . . . . . . . . . . . rwcut Used With rwfilter . . . . . . . . . . . . . . . . . . . . . . . . SILK PAGER With the Empty String to Disable rwcut Paging . . . . . rwcut --pager to Disable Paging . . . . . . . . . . . . . . . . . . . . . rwcut Performance With Default --fields . . . . . . . . . . . . . . . rwcut --fields to Improve Eciency . . . . . . . . . . . . . . . . . . rwcut --fields to Rearrange Output . . . . . . . . . . . . . . . . . . rwcut ICMP Type and Code as dport . . . . . . . . . . . . . . . . . . rwcut --icmp Parameter and Fields to Display ICMP Type and Code rwcut --delim to Change the Delimiter . . . . . . . . . . . . . . . . . rwcut --no-title to Suppress Field Headers in Output . . . . . . . . rwcut --num-recs to Constrain Output . . . . . . . . . . . . . . . . . rwcut --num-recs and Title Line . . . . . . . . . . . . . . . . . . . . . rwcut --start-rec to Select Records to Display . . . . . . . . . . . . rwcut --start-rec, --end-rec, and --num-recs Combined . . . . . rwuniq for Counting in Terms of a Single Field . . . . . . . . . . . . . rwuniq --flows for Constraining Counts to a Threshold . . . . . . . . rwuniq --bytes and --packets with Minimum Flow Threshold . . . rwuniq --flows and --packets to Constrain Flow and Packet Counts xi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 13 13 14 19 24 30 30 31 31 32 33 35 35 36 36 37 37 42 42 43 45 45 45 45 47 47 47 48 48 49 49 50 50 50 50 53 53 54 54

3-36 3-37 3-38 4-1 4-2 4-3 4-4 4-5 4-6 4-7 4-8 4-9 4-10 4-11 4-12 4-13 4-14 4-15 4-16 4-17 4-18 4-19 4-20 4-21 4-22 4-23 4-24 4-25 4-26 4-27 4-28 4-29 4-30 4-31 4-32 4-33 4-34 4-35 4-36 4-37 4-38 4-39 4-40 4-41 4-42 4-43 4-44 4-45 4-46 4-47 4-48 4-49

Using rwuniq to Detect IPv6 PMTU Throttling . . . . . . . . . . . . . . . . . . . . . . . . . rwuniq --field to Count with Respect to Combinations of Fields . . . . . . . . . . . . . . Using rwuniq to Isolate Email and Non-Email Behavior . . . . . . . . . . . . . . . . . . . . A Typical Sequence of Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using --help and --version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwcat for Combining Flow-Record Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwdedupe for Removing Duplicate Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using rwsplit for Coarsely Parallel Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . Using rwsplit to Generate Statistics on Flow-Record Files . . . . . . . . . . . . . . . . . . rwfileinfo for Display of Data File Characteristics . . . . . . . . . . . . . . . . . . . . . . rwfileinfo for Showing Command History . . . . . . . . . . . . . . . . . . . . . . . . . . . rwtuc for Simple File Cleansing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwptoflow for Simple Packet Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwptoflow and rwpmatch for Filtering Packets Using an IP Set . . . . . . . . . . . . . . . . rwnetmask for Abstracting Source IPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwset for Generating a Set File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwsetcat to Display IP Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwsetcat --count-ip, --print-stat, and --network-description for Showing Structure rwsetbuild for Generating IP Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwsettool --intersect and --difference . . . . . . . . . . . . . . . . . . . . . . . . . . . rwsettool --union . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwsetmember to Test for an address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using rwset to Filter for a Set of Scanners . . . . . . . . . . . . . . . . . . . . . . . . . . . A Script for Generating Hourly Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Counting Hourly Set Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwsetbuild for Building an Address Space IP Set . . . . . . . . . . . . . . . . . . . . . . . Backdoor Filtering Based on Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . rwbag for Generating Bags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwbagcat for Displaying Bags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwbagcat --mincount, --maxcount, --minkey and --maxkey to Filter Results . . . . . . . rwbagcat --bin-ips to Display Unique IPs Per Value . . . . . . . . . . . . . . . . . . . . . rwbagcat --integer-keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using rwbag to Filter Out a Set of Scanners . . . . . . . . . . . . . . . . . . . . . . . . . . . rwbagtool --add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwbagtool --intersect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwbagtool Combining Threshold with Set Intersection . . . . . . . . . . . . . . . . . . . . . rwbagtool --coverset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwgroup to Group Flows of a Long Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwgroup --rec-threshold to Drop Trivial Groups . . . . . . . . . . . . . . . . . . . . . . . rwgroup --summarize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using rwgroup to Identify Specic Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . rwmatch With Incomplete ID Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwmatch With Full TCP Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwmatch for Mating TCP Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwmatch for Mating Traceroutes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwpmapbuild to Create a Spyware Pmap File . . . . . . . . . . . . . . . . . . . . . . . . . . rwfilter --pmap-saddress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwcut --pmap-file and sval Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using rwsort to sort ow records associated with types of spyware . . . . . . . . . . . . . . Using rwuniq to Count The Number of Flows Associated With Specic Types of Spyware . rwip2cc for Looking Up Country Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwcut ----plugin=cutmatch.so to Use a Plug-in . . . . . . . . . . . . . . . . . . . . . . . xii

55 55 55 57 58 60 63 64 65 66 67 68 70 71 71 72 73 74 74 75 76 76 76 77 77 79 79 80 81 81 82 82 83 84 84 85 85 87 88 89 90 91 91 92 92 94 94 95 95 95 96 97

5-1 5-2 5-3 5-4 5-5 5-6 5-7 5-8 5-9

ThreeOrMore.py: Using PySiLK for Memory in rwfilter partitioning . . . Calling ThreeOrMore.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vpn.py: Using PySiLK with rwfilter for Partitioning Alternatives . . . . . matchblock.py: Using PySiLK with rwfilter for Structured Conditions . Calling matchblock.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . delta.py: Using PySiLK with rwcut to Display Combined Fields . . . . . . Calling delta.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . payload.py: Using PySiLK for Conditional Fields With rwsort and rwcut Calling payload.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

100 100 101 102 103 104 104 105 105

xiii

xiv

Handbook GoalsThis analysts handbook is intended to provide a tutorial introduction to network trac analysis using the System for Internet-Level Knowledge (or SiLK) tool suite (http://tools.netsa.cert.org/silk/) for acquisition and analysis of network ow data. The SiLK tool suite is a highly-scalable ow-data capture and analysis system developed by the Network Situational Awareness group (NetSA) at Carnegie Mellon Universitys Software Engineering Institute (SEI). SiLK tools provide network security analysts with the means to understand, query, and summarize both recent and historical trac data represented as network ow records. The SiLK tools provide network security analysts with a relatively complete high-level view of trac across an enterprise network, subject to placement of sensors. Analysis using the SiLK tools has lent insight into various aspects of network behavior. Some example applications of this tool suite include (but are not limited to): Support for network forensics, identifying artifacts of intrusions, vulnerability exploits, worm behavior, etc. Providing service inventories for large and dynamic networks (on the order of a CIDR/8 block). Generating proles of network usage (bandwidth consumption) based on protocols and common communication patterns. Enabling non-signature-based scan detection and worm detection, for detection of limited-release malicious software and for identication of precursors. By providing a common basis for these various analyses, the tools provide a framework on which network situational awareness may be developed. Common questions addressed via ow analyses include (but arent limited to): Whats on my network? What happened before the event? Where are policy violations occurring? What are the most popular web sites? How much volume would be reduced by applying a blacklist? Do my users browse to known infected web servers? Do I have a spammer on my network? When did my web server stop responding to queries? Am I routing undesired trac? 1

Who uses my public DNS server? This handbook contains ve chapters: 1. The Networking Primer and Review of UNIX Skills provides a very brief overview of some of the background necessary to begin using the SiLK tools for analysis. It includes a brief introduction to Transmission Control Protocol/Internet Protocol (TCP/IP) networking and covers some of the UNIX command-line skills required to use the SiLK analysis tools. 2. The SiLK Network Flow Repository describes the structure of netow data, how netow trac data is collected from the enterprise network, and how it is organized. 3. Essential SiLK Tools describes how to use the SiLK tools for common tasks including data access, display, simple counting, and statistical description. 4. Trac Analysis Using the SiLK Tool Suite builds on the previous chapter and covers use of other SiLK tools for data analysis, including manipulating ow record les, packet-level analysis, and working with aggregates of ows and of IP addresses. 5. Using PySiLK For Advanced Analysis discusses how analysts can use the PySiLK scripting capabilities to facilitate more complex analyses eciently. This Analysts handbook is intended to be tutorial in nature, but it is not an exhaustive description of all options (or even all tools) in the SiLK tool suite. A more complete description (but less tutorial material) can be found in The SiLK Reference Guide (http://tools.netsa.cert.org/silk/reference-guide.html) or in the output resulting from using the --help or --man parameters with the various tools. The handbook deals solely with the analysis of network ow record data using an existing installation of the SiLK tool suite. For information on installing and conguring a new SiLK tool suite and on the collection of network ow record data for use in these analyses, the reader should consult the SiLK Installation Handbook (http://tools.netsa.cert.org/silk/install-handbook.pdf).

2

Chapter 1

Networking Primer and Review of UNIX SkillsThis chapter of the handbook provides a review of basic topics in Transmission Control Protocol/Internet Protocol (TCP/IP) and UNIX operation. It is not intended as a comprehensive summary of these topics, but it will help to refresh your knowledge and prepare you for using the SiLK tools for analysis. Upon completion of this chapter you will be able to: describe the structure of IP packets, and the relationship between the protocols that comprise the IP suite explain the mechanics of TCP, such as the TCP state machine and TCP ags use basic UNIX tools

1.1

TCP/IP Networking Primer

This section provides an overview of the IP networking suite. IP, sometimes called TCP/IP, is the foundation of Internetworking. All packets analyzed by the SiLK system use protocols supported by the IP suite. These protocols behave in a well-dened manner, and one of the primary signs of a security breach can be a deviation from accepted behavior. In this section, you will learn about what is specied as accepted behavior. While there are common deviations from the specied behavior, knowing what is specied forms a base for further knowledge. This section is a refresher; the IP suite is a complex collection of more than 50 protocols, and it comprises far more information than can be covered in this section. There are a number of on-line documents and printed books that provide other resources on TCP/IP to further your understanding of the IP suite.

1.1.1

IP Protocol Layers

Figure 1.1 shows a basic breakdown of the protocol layers in IP. If youre familiar with the Open Systems Interconnection (OSI) seven-layer model, you will notice that this diagram is slightly dierent. IP predates the OSI model, and the correspondence between them is not exact. 3

Figure 1.1: IP Protocol Layers As Figure 1.1 shows, IP is broken into ve layers. The lowest layer, Hardware, covers the physical connections between machines: plugs, electronic pulses, and so on. The next layer is the Link layer, and it refers to the network transport protocol, such as Synchronous Optical Networks(SONET), Ethernet, Asynchronous Transfer Model(ATM), or Fiber Distributed Data Interface(FDDI). The third layer is the Internet layer, which is the rst layer at which IP aects the passing of data. This layered representation leads to terminology such as IP over ATM or IP over SONET. The Link layer imposes several constraints on the Internet layer. The most relevant from an analysis perspective is the maximum transmission unit (MTU). The MTU imposes an absolute limit on the number of bytes that can be transferred in a single frame and, therefore, a limit on datagram and packet size. The vast majority of enterprise network data is transferred over Ethernet at some point, leading to an eective MTU of 1500 bytes. The layer above Internet Transport refers to the transport protocol, such as TCP, Internet Control Message Protocol(ICMP), or User Datagram Protocol(UDP). These three transport protocols comprise the bulk of trac crossing most enterprise networks. The nal layer, Application, refers to the service supported by the protocol. For example, Web trac is an HTTP application running on a TCP transport over IP over an Ethernet network.

1.1.2

Structure of the IP Header

IP passes collections of data as datagrams. Figure 1.2 shows the breakdown of IP datagrams. Fields that are not recorded by the SiLK data collection tools are grayed out.

1.1.3

IP Addressing and Routing

IP can be thought of as a very-high-speed postal service. If someone in Pittsburgh sends a letter to someone in New York, the letter passes through a sequence of postal workers. The postal worker who touches the mail may be dierent every time a letter is sent, and the only important address is the destination. Also, 4

Figure 1.2: Structure of the IP Header there is no reason that New York has to respond to Pittsburgh, and if it does, the sequence of postal workers could be completely dierent. IP operates in the same fashion: there is a set of routers between various sites, and packets are sent to the routers the same way that the postal system passes letters back and forth. There is no requirement that the set of routers be used to pass data in must be the same as the set used to pass data out, and the routers can change at any time. Most importantly, the only IP address that must be valid in an IP connection is the destination address. IP itself does not require a valid source address, but other protocols (e.g., TCP) cannot complete without a valid source and destination address because the source needs to receive the acknowledgment packets to complete a connection. (However, there are numerous examples of intruders using incomplete connections for malicious purposes.)

Structure of an IP Address The Internet has space for approximately 4 billion unique IP version 4 (IPv4) addresses. While these IP addresses can be represented as 32-bit integers, they are generally represented as sets of four decimal integersfor example, 128.2.118.3, where each integer is a number between 0 and 255. IPv4 addresses and ranges of addresses can also be referred to using CIDR blocks. CIDR, short for Classless Inter-Domain Routing, is a standard for grouping together addresses for routing purposes. When an entity purchases Internet Protocol address space from the relevant authorities, that entity buys a routing block, which is used to direct packets to their network. CIDR blocks are usually written in a dot/mask notation, where the dot value is the type of dotted set described above, and the mask is the number of xed bits in the address. For example, 128.2.0.0/16 would refer to all IP addresses from 128.2.0.0 to 128.2.255.255. CIDR sizes range from 0 (the whole address is a 5

network)1 to 32 (the whole address is a host). With the introduction of IP version 6 (IPv6), all of this is changing. IPv6 addresses are 128 bits in length, for a staggering 4 1038 (400 undecillion) possible addresses. IPv6 addresses are represented as sets of eight hexadecimal (base 16) integers for example: FEDC:BA98:7654:3210:FEDC:BA98:7654:3210 Each integer is a number between 0 and FFFF (the hexadecimal equivalent of decimal 65535). The address space for IPv6 is so large that the designers anticipated addresses containing strings of 0 values, so they dened a shorthand of :: that can be used once in each address to represent a string of zeros. The address FEDC::3210 is therefore equivalent to: FEDC:0:0:0:0:0:0:3210 The routing methods for IPv6 addresses are beyond the scope of this handbook see RFC 4291 (http: //www.ietf.org/rfc/rfc4291.txt) for a description. CIDR blocks are still used with IPv6 addresses, as these addresses have no predened classes in the protocol. CIDR sizes can range between 0 and 128 in IPv6 addresses. In SiLK, the support for IPv6 is controlled by conguration. If you need to use IPv6 addresses, check with the person responsible for maintaining your data repository as to the support available. Reserved IP Addresses While IPv4 has approximately 4 billion addresses available, large segments of IP space are reserved for the maintenance and upkeep of the Internet. Various authoritative sources provide lists of the segments of IP space that are reserved. One notable reservation list is maintained by the Internet Assigned Numbers Authority (IANA) at http://www.iana.org/assignments/ipv4-address-space. IANA also keeps a list of IPv6 reservations at http://www.iana.org/assignments/ipv6-address-space. In addition to this list, the Internet Engineering Task Force (IETF) maintains several request for comments (RFC) documents that specify other reserved spaces. The majority of these spaces are listed in RFC 3330, Special Use IPv4 Addresses, at http://www.ietf.org/rfc/rfc3330.txt. Table 1.1 summarizes major IPv4 reserved spaces. IPv6 reserved spaces are shown in Table 1.2. In general, private space (in IPv4, 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16), auto-cong (169.254.0.0/16), and loopback (127.0.0.0/8) destination IP addresses should not be routed across network borders. Consequently, the appearance of these address spaces at routers indicates a failure of routing policy. Similarly, trac should not come into the enterprise network from these address spaces; the Internet as a whole should not route that trac to the enterprise network.

1.1.4

Major Protocols

Transmission Control Protocol (TCP) TCP is the most commonly encountered protocol on the Internet. TCP is a stream-based protocol that reliably transmits data from the source to the destination. To maintain this reliability, TCP is very complex: the protocol is very slow and requires a large commitment of resources.1 CIDR/0 addresses used almost exclusively for empty routing tables, and are not accepted by the SiLK tools. This eectively means the range for CIDR blocks is 1-32 for IPv4 data.

6

Table 1.1: IPv4 Reserved Addresses Space 0.0.0.0/8 10.0.0.0/8 127.0.0.0/8 169.254.0.0/16 172.16.0.0/12 192.0.2.0/24 192.88.99.0/24 192.168.0.0/16 198.18.0.0/18 198.19.0.0/18 224.0.0.0/4 240.0.0.0/4 255.255.255.255 Reason Current Network (self-reference) addresses Reserved for private networks Loopback (self-address) addresses Autoconguration (address unavailable) addresses Reserved for private networks Reserved for Documentation (example.com or example.net) 6to4 Relay Anycast Prex (border between IPv6 and IPv4) Reserved for private networks Reserved for Router Input Ports Reserved for Router Output Ports Multicast Addresses Future Use address Limited Broadcast Address

Table 1.2: IPv6 Reserved Addresses Space 0::0 0::1 FC01::0/16 FC00::0/16 FE80::0/64 FF01-FF0F::0/16 Reason Unspecied Address Loopback Address Reserved for Local Addresses Reserved for Future Local Addresses Reserved for Link-Local Addresses Reserved Multicast Addresses

7

Figure 1.3 shows a breakdown of the TCP header. A TCP header adds 20 additional bytes to the IP header. Consequently, TCP packets will always be at least 40 bytes long. As the shaded portions of Figure 1.3 shows, most of the TCP header information is not retained in SiLK ow records.

Figure 1.3: TCP Header TCP is built on top of an unreliable infrastructure. IP assumes that packets can be lost without a problem, and that responsibility for managing packet loss is incumbent on services at higher layers. TCP, which provides ordered and reliable streams on top of this unreliable packet-passing model, implements this feature through a complex state machine as shown in Figure 1.4. The transitions in this state machine are described by stimulus / action format labels, where the top value is the stimulating event and the bottom values are actions taken prior to entry into the destination state. Where no action takes place, an x is used to indicate explicit inaction. We will not thoroughly describe the state machine in this handbook, but we do want to emphasize that because of TCPs requirements, ows representing well-behaved TCP sessions will behave in certain ways. For example, a ow for a complete TCP sessions must have at least four packets: one packet that sets up the connection, one packet that contains the data, one packet that terminates the session, and one packet acknowledging the other sides termination of the session2 . TCP behavior that deviates from this provides indicators that can be used by an analyst. An intruder may send packets with odd TCP ag combinations as part of a scan (e.g., with all ags set on). Dierent operating systems handle protocol violations dierently, so odd packets can be used to elicit information that identies the operating system in use. TCP Flags used ags: TCP uses ags to transmit state information among participants. There are six commonly

SYN: Short for synchronize, the SYN ag is sent at the beginning of a session to establish initial sequence numbers. Each side sends one SYN packet at the beginning of a session.2 It is technically possible for there to be a valid 3-packet complete TCP ow: one SYN packet, one SYN-ACK packet containing the data, and one RST packet terminating the ow. This is a very rare circumstance; most complete TCP ows have more than four packets.

8

Figure 1.4: TCP State Machine

9

ACK: Short for acknowledge, ACK ags are sent in almost all TCP connections and are used to indicate that a previously sent packet has been received. FIN: Short for nalize, the FIN ag is used to terminate a session. When a packet with the FIN ag is sent, the target of the FIN ag cleanly terminates the TCP session. RST: Short for reset, the RST ag is sent to indicate that a session is incorrect and should be terminated. When a target receives a RST ag, it terminates immediately. Some stacks terminate sessions using RST instead of the more proper FIN sequence. PSH: Short for push, the PSH ag was formerly used to inform a receiver that the data sent in the packet should immediately be sent to the target application (i.e., the sender has completed this particular send). The PSH ag is largely obsolete, but it still commonly appears in TCP trac. URG: Short for urgent data, the URG ag is used to indicate that urgent data (such as a signal from the sending application) is in the buer and should be used rst. Tricks with URG ags can be used to fool IDS systems. Reviewing the state machine will show that most state transitions are handled through the use of SYN, ACK, FIN, and RST. The PSH and URG ags are less directly relevant. There are two other rarely used ags: ECE (Explicit Congestion Notication Echo) and CWR (Congestion Window Reduced). Neither are relevant to security analysis at this time, although they can be used with the SiLK tool suite if required.

Major TCP Services Traditional TCP services have well-known ports: for example, 80 is Web, 25 is SMTP, and 53 is DNS. IANA maintains a list of these port numbers at http://www.iana.org/assignments/port-numbers. This list is useful for legitimate services, but it does not necessarily contain new services or accurate port assignments for rapidly-changing services such as those implemented via peer-to-peer networks. Furthermore, there is no guarantee that trac seen, for example, on port 80 is actually web trac, or that web trac cannot be sent on other ports.

UDP and ICMP After TCP, the most common protocols on the Internet are UDP and ICMP. UDP is a fast but unreliable message-passing mechanism used for services where throughput is more critical than accuracy. Examples include audio/video streaming, as well as heavy-use services such as the Domain Name Service(DNS). ICMP is a reporting protocol: ICMP sends error messages and status updates.

Figure 1.5: UDP and ICMP Headers

10

UDP and ICMP Packet Structure Figure 1.5 shows a breakdown of UDP and ICMP packets, as well as the elds collected by SiLK. UDP can be thought of as TCP without the additional state mechanisms; a UDP packet has both a source and destination port, assigned in the same way TCP assigns them, as well as a payload. ICMP is a straight message-passing protocol and includes a large amount of information in its rst two elds: the type and code. The type eld is a single byte indicating a general class of message, such as host unreachable. The code eld contains a byte indicating what the message is within the type, such as route to host not found. ICMP messages generally have a limited payload; most messages have a xed size based on type, with the notable exceptions being echo request (type 0, code 0) and echo reply (type 8, code 0). Ocially, ICMP is at the same protocol layer as IP, because its primary purpose is to issue IP error messages. However, it shares many similarities with transport layer protocols, such as having its own header embedded within the IP packet, and therefore is treated as a transport layer protocol in this handbook.

Major UDP Services and ICMP Messages UDP services are covered in the IANA URL listed above. As with TCP, the values given by IANA are slightly behind those currently observed on the Internet. IANA also excludes port utilization (even if common) by malicious software such as worms. Although not ocial, there are numerous port databases on the Web that can provide insight into the current port utilization by services. ICMP types and codes are well dened, and the most recent list is at http://www.iana.org/assignments/ icmp-parameters. This list is the denitive list, and includes references to RFCs explaining the types and codes.

1.2

Review of UNIX Skills

In this section, we provide a review of basic UNIX operations. SiLK is implemented on Linux and Solaris, and consequently you will need to be able to work with UNIX to use the SiLK tools.

1.2.1

Using the UNIX Command Line

When working on the command line, you should see a prompt like the following:

$ Example 1-1: A UNIX Command Prompt This example shows the standard command prompt for this document. The integer between angle brackets will be used to refer to specic commands in examples. Commands can be invoked by typing them directly at the command line. UNIX commands are typically abbreviated English words, and accept space-separated parameters; some parameters are prexed by one or two dashes. Table 1.3 lists some of the more common UNIX commands. More information on these commands can be found by typing man followed by the command name. Example 1-2 (and the rest of the examples in this handbook) shows the use of some of these commands. 11

Table 1.3: Some Common UNIX Commands Command cat cp cut date echo exit le head join kill ls man mv ps rm sed sort tail time top wait wc which Description copy a stream or le onto standard output (show le content) copy a le from one name or directory to another isolate one or more columns from a le show current day and time put arguments onto standard output terminate current command interpreter (log out) identify type of content in the le show rst few lines of a les content bring together columns in two les terminate a job or process list les in current (or specied) directory -l (for long) parameter indicates show all directory information show the on-line documentation on a command or le rename a le or transfer it from one directory to another list processes on the host remove a le edit the lines on standard input and put on standard output sort content of le into lexicographic order show last few lines of a les content show execution time of a command show running processes with highest CPU utilization wait for all background commands to nish count words (or, with -l parameter, lines) in a le locate a commands executable le

12

$ echo "Hello" > myfile $ cat myfile Hello $ ls -l myfile -rw-r--r-- 1 tshimeal none 6 Oct cat myfile a b c END_NEW_LINES $ wc -l myfile 4 myfile $ rm myfile

6 11:59 myfile

Example 1-2: Example Using Common UNIX Commands Some advanced examples in this handbook will use control structures available from the Bash shell (one of the UNIX command interpreters). The syntax for name in expression; do ...done indicates a loop where each of the values returned by expression is given in turn to the variable indicated by name (and referenced as $name), and the commands in between do and done are executed with that value. The syntax while expression; do ... done indicates a loop where the commands between do and done are executed as long as expression evaluates true. A backslash at the end of a line indicates that the command is continued on the following line. Example 1-3 shows how almost all SiLK applications are invoked: the user calls rwfilter (command 1), specifying some data of interest, and then the results are passed to another application (command 2).

$ rwfilter --start-date=2010/08/09:00 --end-date=2010/08/09:01 \ --type=in --proto=6 --pass=aug9.raw $ rwtotal --proto --sip-zero aug9.raw protocol| Records| Bytes| Packets| 6| 34428003| 114824656571| 387766604| Example 1-3: A Simple Command Line

1.2.2

Using Pipes

The SiLK tools are designed to intercommunicate via pipes, in particular the stdout (standard output) and stderr (standard error) pipes. Communication by pipes is done by redirection, where the data sent via one pipe is sent to a program, another pipe, or a le. Many of the examples in the following chapters use pipes. Example 1-4 shows the use of pipes to do the same thing as Example 1-3. 13

$ rwfilter --type=all --proto=6 --pass=stdout \ --start-date=2010/08/09:00 --end-date=2010/08/09:01 | \ rwtotal --proto --skip-zero protocol| Records| Bytes| Packets| 6| 98454957| 1675742086673| 2444828416| Example 1-4: A Simple Piped Command SiLK applications can also communicate via named pipes, which allow multiple channels of communication to be opened simultaneously. A named pipe is a special le that behaves like the stdout or stderr, and is created using the UNIX mkfo command (for MaKe First-In-First-Out). In the Example 1-5, we create a named pipe (in Command 1) that one call to rwfilter (in Command 2) uses to lter data concurrently with another call to rwfilter (in Command 3). Results of these calls are shown in Commands 4 and 5. Using named pipes, sophisticated SiLK operations can be built in parallel. However, the user needs to ensure that any command that will read from the named pipe is started after any command that writes to the named pipe.

$ mkfifo /tmp/test-output $ rwfilter --type=all --start-date=2010/08/09:00 --end-date=2010/08/09:01 \ --sensor=29 --proto=6 --pass=stdout --fail=/tmp/test-output | rwuniq --fields=5 > tcp.out & [1] 23695 23696 $ rwfilter --input-pipe=/tmp/test-output --proto=17 --pass=stdout \ | rwuniq --fields=5 > udp.out & [2] 23697 23698 $ wait [2] Done rwfilter --input-pipe=/tmp/test-output --proto=17 ... [1] + Done rwfilter --type=all --start-date=2010/08/09:00 ... $ cat tcp.out pro| Records| 6| 1409344| $ cat udp.out pro| Records| 17| 491309| Example 1-5: Using a Named Pipe

14

Chapter 2

The SiLK Flow RepositoryThis chapter introduces the tools and techniques used to store information about sequences of packets as they are collected on an enterprise network for SiLK (referred to as network ow or network ow data and occasionally just ow). This chapter will help an analyst become familiar with the structure of network ow data, how the collection system gathers network ow data from sensors, and how to access that data.

2.1

What Is Network Flow Data?

Netow is a trac-summarizing format that was rst implemented by Cisco Systems and other router manufacturing companies, primarily for billing purposes. Network ow data (or Network ow) is a generalization of netow. Network ow data is collected to support several dierent types of analyses of network trac (some of which are described later in this handbook). Network ow collection diers from direct packet capture, such as tcpdump, in that it builds a summary of communications between sources and destinations on a network. This summary covers all trac matching seven particular keys that are relevant for addressing: the source and destination IP addresses, the source and destination ports, the protocol type, the type of service, and the interface. We use ve of these attributes to constitute the ow label in SiLK: the source and destination addresses, the source and destination ports, and the protocol. These attributes, together with the start time of each network ow, distinguish network ows from each other. A network ow often covers multiple packets, which are grouped together under common labels. A ow record thus provides the label and statistics on the packets that the network ow covers, including the number of packets covered by the ow, the total number of bytes, and the duration and timing of those packets. Because network ow is a summary of trac, it does not contain packet payload data. Payload data is expensive to retain on a large, busy network. Each network ow we record is very small (it can be as low as 22 bytes, but is determined by several conguration parameters), and even at that size one may collect many gigabytes of trac daily on a busy network.

2.1.1

Structure of a Flow Record

A ow le is a series of ow records. A ow record holds all the data SiLK retains from the collection process: the ow label elds, start time, number of packets, duration of ow, and so on. 15

2.2

Flow Generation and Collection

Every day, SiLK may collect many gigabytes (GB) of network ow data from across the enterprise network. Given both the volume and complexity of this data, it is critical to understand how this data is recorded. In this section, we will review the collection process and show how data is stored as network ow records. A network ow record is generated by sensors throughout the enterprise network. The majority of these may be routers, although specialized sensors, such as yaf (http://tools.netsa.cert.org/yaf/), can also be used when it is desirable to avoid artifacts in a routers implementation of network ow or to use non-devicespecic network ow data formats, such as IPFIX (http://www.ietf.org/html.charters/ipfix-charter. html), or for more control over network ow record generation.1 A sensor generates network ow records by grouping together packets that are closely related in time and have a common ow label. Closely related is dened by the router, and is typically set to around 15 seconds. Figure 2.1 shows the generation of ows from packets. Case 1 in that gure diagrams ow record generation when all the packets for a ow are contiguous and uninterrupted. Case 2 diagrams ow record generation when there are several ows collected in parallel. Case 3 diagrams ow record generation when timeout occurs, as discussed below. Network ow is an approximation of trac, not a natural law. Routers and other sensors make a guess when they generate ow records, but these guesses are not perfect; there are several well-known phenomena in which a long-lived session will be split into multiple ow records: 1. Active timeout is the most common cause of a split network ow. Network ow records are purged and restarted after a congurable time of activity. As a result, all network ows have an upper limit on their duration that depends on the local conguration. A typical value would be around 30 minutes. 2. Cache ush is a common cause of split network ows for router-collected network ow records. Network ows take up memory resources in the router, and the router regularly purges this cache of network ows for housekeeping purposes. The cache ush takes place approximately every 30 minutes as well. A plot network ows over a long period of time shows many network ows terminate at regular 30-minute intervals, which is a result of the cache ush. 3. Router exhaustion also causes split network ows for router-collected ows. The router has limited processing and memory resources devoted to network ow. During periods of stress, the ow cache will ll and empty more often due to the number of network ows collected by the router. Use of specialized ow sensors can avoid or minimize cache-ush and router-exhaustion issues. All of these cases involve network ows that are long enough to be split. As we will show later, the majority of network ows collected at the enterprise network border are small and short-lived.

2.3

Introduction to Flow Collection

An enterprise network comprises a variety of organizations and systems. The ow data to be handled by SiLK is rst processed by the collection system, which receives ow records from the sensors and organizes them for later analysis. The collection system may collect data through a set of sensors that includes both routers and specialized sensors and is positioned throughout the enterprise network. Analysis is performed using a custom set of software called the SiLK analysis tool suite. The majority of this document provides training in the use of the SiLK tool suite. The SiLK project is active, meaning that the system is continuously improved as time passes. These improvements include new tools and revisions to existing analysis software, as well as changes in the data-collection systems.1 yaf may also be used to convert packet data to network ow records, via a script that automates this process. See Section 4.3.

16

Figure 2.1: From Packets to Flows

17

2.3.1

Where Network Flow Data Is Collected

While complex networks may segregate ow records based on where the records were collected (e.g., the network border, major points within the border, at other points), the generic implementation of the SiLK collection system defaults to collection only at the network border, as is diagrammed in Figure 2.2. The default implementation has only one class of sensors: all. Further segregation of the data is done by type of trac.

Figure 2.2: Default Trac Type for Sensors The SiLK tool mapsid produces a list of sensors in use for a specic installation, reecting its conguration. Example 2-1 shows calls to mapsid. When mapsid is called without parameters, it produces a list of all sensors (see command 1 in Example 2-1). When called with a space-delimited list of integers, it produces a map from those values to the corresponding sensor names (see command 3 in Example 2-1). When 18

called with a list of sensor names (see command 4 in Example 2-1), it produces a map from those names to sensor numbers. For an explanation of the exact physical location of each sensor, contact the person responsible for maintaining the data repository. If the installation supports diering classes of sensors, using the --print-class parameter can also give information as to what classes of data are produced by each sensor (see commands 2 and 5 in Example 2-1).

$ mapsid 0 -> SEN-CENT 1 -> SEN-NORTH 2 -> SEN-SOUTH 3 -> SEN-EAST 4 -> SEN-WEST $ mapsid --print-class | head -3 0 -> SEN-CENT [c1,c2] 1 -> SEN-NORTH [c1,c2,c3] 2 -> SEN-SOUTH [c1,c2] $ mapsid 0 2 4 0 -> SEN-CENT 2 -> SEN-SOUTH 4 -> SEN-WEST $ mapsid SEN-NORTH SEN-EAST SEN-NORTH -> 1 SEN-EAST -> 3 $ mapsid --print-class SEN-NORTH SEN-EAST SEN-NORTH -> 1 [c1,c2,c3] SEN-EAST -> 3 [c1,c2,c3] Example 2-1: Using mapsid to Obtain a List of Sensors

2.3.2

Types of Enterprise Network Trac

In SiLK, the term type refers to the direction of trac, rather than a content-based characteristic. In the generic implementation (as shown in Figure 2.2), there are six basic types: in and inweb, which is trac coming from the ISP to the enterprise network through the border router (Web trac is separated out, due to its volume); innull, which is trac from the upstream ISP that is not passed across the border router (either sent to the routers IP address, or dropped due to a router access control list); out and outweb, which is trac coming from the enterprise network to the ISP through the border router; and outnull, which is trac from the enterprise network that is not passed across the border router. These types are congurable, and congurations vary as to which types are in actual use see the discussion below on sensor class and type. There is also a constructed type all that selects all types of ows associated with a class of sensors.

2.3.3

The Collection System and Data Management

To understand how to use SiLK for analysis, it is useful to have some understanding of how data is collected, stored, and managed. Understanding how the data is partitioned can produce faster queries by reducing the amount of data searched. In addition, by understanding how the sensors complement each other, it is possible to gather trac data even when a specic sensor has failed. 19

Data collection starts when a ow is generated by one of the sensorseither a router or a dedicated sensor. Flows are generated when a packet relevant to the ow is seen, but a ow is not reported until it is complete or is ushed from the cache. Consequently, a ow can be seen some time (depending on timeout conguration, and on sensor caching, among other factors) after the start time of the rst packet in the ow. Data generated through dedicated sensors, as well as data from other routers, is sent to the central SiLK repository using transfer facilities called FloCap (ow capacitor). FloCap technology improves the reliability of ow transfer and prioritizes the ows that are sent to the repository in the case of an emergency. The primary focus of FloCap is to ensure that routed data arrives in as complete a form as possible. Once data is received by the repository, it is packed into the reduced format by the packing software.2 Packed ows are stored into les indicated by class, type, sensor and hour in which the ow started. So a sample path to a le could be /data/all/in/2005/11/01/allin-SEN1_20051101.15 for trac coming from the ISP through the border router on November 1, 2005 for ows starting between 3:00 and 3:59 p.m. Greenwich Mean Time (GMT).

Important Considerations When Accessing Flow Data While SiLK allows rapid access and analysis of network trac data, the amount of data crossing the enterprise network could be extremely large. There are a variety of techniques intended to optimize the queries and this section will go over some general guidelines for more rapid data analysis. Usually, the amount of data associated with any particular event is relatively small. All the trac from a particular workstation or server may be recorded in a few thousand records at most for a given day. Most of the time in an initial query involves simply pulling and analyzing the relevant records. As a result, query time can be reduced by simply manipulating the selection parameters, in particular --type, --start-date, --end-date, and --sensor. If it is known when a particular event occurred, then reducing the search time by using --start-date and --end-dates hour facilities will increase eciency (i.e., --start-date=2005/11/01:12 --end-date=2005/11/01:14 is more ecient than --start-date=2005/11/01:00 --end-date=2005/11/01:23). Another useful, but less-certain technique is to limit queries by sensor. Since routing is relatively static, the same IP address will generally enter or leave through the same sensor, which can be derived by using rwuniq --fields=sensor (see Section 3.7) and a short (1 hour) probe on the data to identify which sensors are associated with a particular IP address. This technique is especially applicable for long (such as multimonth) queries, and usually requires some interaction, since rerouting does occur during normal operation. To use this technique for long queries, start by identifying the sensors using rwuniq, query for some extensive period of time using those sensors, and then plot the results using rwcount. If an analyst sees a sudden drop in trac from those sensors, the analyst should check the data around the time of this drop to see if trac was routed through a dierent sensor.

2.3.4

How Network-Flow Data Is Organized

The data repository is accessed through the use of SiLK tools, particularly the rwfilter command-line application. An analyst using rwfilter should specify the type of data desired to view by using a set of ve selection parameters. This handbook will discuss selection parameters in more depth in Section 3.2; this section will briey outline how data is stored in the repository.2 The trac between FloCap and the repository is not excluded from collection by ow sensors, but unless multiple levels of sensors are being used within the Enterprise Architectures, it occurs in a way that will not pass a sensor.

20

Dates Repository data is stored in hourly divisions, which are referred to in the form YYYY/MM/DD:HH in Greenwich Mean Time. Thus, 11a.m. on May 23, 2005, in Pittsburgh would be referred to as 2005/05/23:15 when compensating for the dierence between Greenwich Mean Time and Eastern Daylight Time. In general, a particular hour starts being recorded at that hour and will be written to until some time after the end of the hour. Under ideal conditions, the last long-lived ows will be written to the le soon after they time out (e.g., if the active timeout is 30 minutes, the ows will be written out 30 minutes plus propagation time after the end of the hour). Under adverse network conditions, however, ows could accumulate on the sensor under FloCap until they can be delivered. So, we would expect that under normal conditions the le for 2005/03/22 20:00 GMT would have data starting at 3 p.m. in Pittsburgh and would stop being updated after 4:30 p.m. in Pittsburgh. Sensors: Class and Type Data is divided by time, and by sensor. The classes of sensors that are available are determined by the installation. By default, there is only one class all but based on analytical interest, other classes may be congured as needed. As shown in Figure 2.2, each class of sensor has several types of trac associated with it: typically in, inweb, out, and outweb. To nd out what classes and types are supported by the installation, look at the output of rwfilter --help that describes --class and --type. Data types are used for two reasons: (1) they group data together into common directions, and (2) they split o major query classes. As shown in Figure 2.2, most data types have a companion web type (i.e., in, inweb, out, outweb). Web trac generally constitute about 50% of the ows in any direction; by splitting the web trac into a separate type, we reduce query time. Most queries to repository data access one class of data at a time, but multiple types.

2.4

SiLK support

The SiLK tool suite is available in open-source form from http://tools.netsa.cert.org/silk/. The CERT Network Situational Awareness group also supports FloCon, a workshop devoted to ow analysis. More information on FloCon can be found at http://www.cert.org/flocon. The primary SiLK mailing lists are described below: [email protected]: silk-help is for bug reports and general inquiries related to SiLK. It provides relatively quick response from users and maintainers of the SiLK tool suite. While a specic response time cannot be guaranteed, silk-help has proved to be a valuable asset for bugs and usage issues. [email protected]: FloCommunity is a community of analysts built on the core of the FloCon conference (http://www.cert.org/flocon). The initial focus is on ow-based network analysis, but the scope will likely naturally expand to cover other areas of network security analysis. The list is not focused exclusively on FloCon itself, though it will include announcements of FloCon events. The general philosophy of this email list and site is inclusive: we intend to include international participants from both research and operational environments. Participants may come from universities, corporations, government entities, and contractors. Additional information is accessible via the FloCommunity Web page (http://www.cert.org/flocommunity/).

21

22

Chapter 3

Essential SiLK ToolsThis chapter describes analyses with the six fundamental SiLK tools: rwfilter, rwstats, rwcount, rwcut, rwsort, and rwuniq. These tools are introduced through example analyses, with their more general usage briey described. At the end of this chapter, the analyst will be able to

use rwfilter to select records understand the basic partitioning parameters, including how to express IP addresses, times, and ports be able to perform and display basic analyses using the SiLK tools and a shell scripting language

3.1

Suite Introduction

The SiLK analysis suite consists of more than 30 command-line Unix tools that rapidly process ow records. The tools can intercommunicate with each other and with scripting tools via pipes; redirection is supported using both stdin/stdout and with named pipes. Flow analysis is generally input/output boundthe amount of time required to perform an analysis is proportional to the amount of data read o of disk. The primary goal of the SiLK tool suite is to reduce that access time to a minimum. The SiLK tools replicate many standard functions from command-line tools that are common to the UNIX operating system, and from higher-level scripting languages such as Perl. However, the SiLK tools process this data in binary form and use data structures optimized specically for analysis. Consequently, most SiLK analysis consists of a sequence of operations using the SiLK tools. These operations typically start with in initial rwfilter call to retrieve data of interest, and culminate in a nal call to a text output tool like rwcut or rwuniq to summarize the data for presentation. Once text is generated, the analyst can create and run scripts on that text output at a much higher speed than would be possible if the text were generated at an earlier stage of the analysis. In some ways, it is appropriate to think of SiLK as an awareness toolkit. The repository provides large volumes of data and the tool suite provides the capabilities needed to process this data, but the actual insights are derived from analysts. 23

3.2

Selecting Records with rwfilter

rwfilter is the most used command in the SiLK analysis tool suite. It serves as the starting point for most analyses (as will be seen in the examples that follow). It both retrieves data and partitions data to isolate ow records of interest. It also has the most parameters (by far) of any command in the SiLK tool suite. These parameters have grown as the tool has matured, driven by users needs for more expressiveness in record selection. Most of the time, rwfilter is used in conjunction with other analysis tools. However, it is also a very useful analytical tool on its own. As a simple example, consider Example 3-1, which uses rwfilter to print volume information on trac from the enterprise network to an external network of interest over an eight-hour period1 . The results show that the enterprise network sent 3,288 ows to the external network, covering an aggregate of 16,316 packets containing a total of 968,011 bytes. Over time, an analyst can use calls like this to track trac to the external network.

$ rwfilter --type=out --start-date=2010/08/02:00 \ --end-date=2010/08/02:07 --daddress=10.5.0.0/16 --print-volume-stat | Recs| Packets| Bytes| Files| Total| 515359| 2722887| 1343819719| 180| Pass| 3288| 16316| 968011| | Fail| 512071| 2706571| 1342851708| | Example 3-1: Using rwfilter to Count Trac to an External Network Although parameters may occur in any order, a high-level view of the rwfilter command is rwfilter [input] [selection] [partition] [output] [other] Figure 3.1 shows a high-level abstraction of the control ows in rwfilter, as aected by its dierent parameters. Input parameters specify whether to pull ow records from one of a pipe, record les, or (the default) the repository. When pulling from the repository, selection parameters specify what parts of the repository from which to pull records. Each source accessed to pull records can be listed to standard error using --print-filenames. When pulling from a pipe or le, a restricted set of selection parameters can be used as partitioning parameters. The main eort in composing calls to rwfilter calls lies in the specication of records via partitioning parameters, and rwfilter supports a very rich library of these parameters. Once records are partitioned, those meeting or failing to meet the specied criteria can be sent either to a pipe or a le via the output parameters. Lastly, there are other parameters (such as --help) that can give useful information but do not access ow records.

1 The

command and its results have been anonymized to protect the privacy of the enterprise network

24

PIPE

INPUT PARAMETERS --print- lenames

--class --type --sensor -- owtypes

FILE PARTITIONING PARAMETERS SELECTION PARAMETERS

REPOSITORY

OUTPUT PARAMETERS PIPE

FILE OTHER PARAMETERS

Figure 3.1: rwfilter Parameter Relationships A simple example is the call to rwfilter in the initial example presented (Example 3-1). That call uses selection parameters to access all outgoing records in the default class that describe ows that started between 00:00:00 and 07:59:59 GMT on August 2, 2010. The --daddress parameter is the partitioning parameter, and the --print-volume-stat parameter is the output parameter.

3.2.1

rwfilter Parameters

Input parameters (described in Table 3.1) specify from where rwfilter obtains ow records: from the repository, from a pipe, or from ow record les. This example implicitly uses the default parameter, --data-rootdir, with its default argument (set by conguration) to pull from the repository. (Later examples will show other input parameters.) rwfilter can take input from zero or more previously generated ow record les. If a common set of input les is used several times, use the --xargs parameter, putting the list of input le names into a text le with one name per line. Calling rwfilter with zero ow record les requires that one of the other input options be specied. Table 3.1: rwfilter Input Parameters Description Read SiLK ow records from a pipe Root of data repository (default) File holding list of lenames to pull records from Name of le containing previously extracted data

Parameter --input-pipe --data-rootdir --xargs

Example stdin /data mylist.txt inle.raw

Selection parameters (described in Table 3.2) are used when rwfilter pulls data from the repository to specify what part of the repository from which to pull the data. In Example 3-1, the call to rwfilter uses 25

three selection parameters: --start-date, --end-date, and --type (--class is left to its default value, which in many implementations is all; --sensor is also left to its default value, which is all sensors of the class). The --start-date and --end-date parameters specify that this pull applies to eight hours worth of trac: 00:00:00 GMT to 07:59:59 GMT on August 2, 2010 (the parameters to --start-date and --end-date are inclusive and may be arbitrarily far apart, depending on what dates are present in the repository, although neither may be set beyond the current date and time). The --type parameter species that outgoing general ow records are to be pulled within the specied time range. Each unique combination of selection parameters (root directory, class, type, sensor, and time) maps to one or more ow record les in the repository (depending on the number of hours included in the time). In this example, 180 les are accessed. Specifying more selection parameters results in less data being examined and thus faster queries. Be sure to understand what trac is included in each available class and type, and to include all relevant types in any query, but to exclude as many irrelevant types for improved performance. --flowtypes is used to specify queries across multiple classes, while restricting the types of interest on each class. Use this parameter carefully, as it is easy to specify LOTS of records to lter, which reduces performance. Table 3.2: rwfilter Selection Parameters Example Description 2005/03/01:00 First hour of data to examine 2005/03/20:23 Final hour of data to examine all Sensor class to select data within times inweb,in,outweb,out Type of data within class and times c1/in,c2/all process data of specied classes and types 1-5 Sensor used to collect data

Parameter --start-date --end-date --class --type --flowtypes --sensor

Partitioning parameters are used to divide the input records into two groups: (1) pass records, which meet all the tests specied by the partitioning parameters, and (2) fail records, which do not meet at least one of the tests specied by the partitioning parameters. Each call to rwfilter must have at least one partitioning parameter. In Example 3-1, ow records to a specic network are desired, so the call uses a --daddress parameter with an argument of CIDR block 10.5.0.0/16 (the specic network). Occasionally, all records for a given set of selection parameters are desired, so (by convention) an analyst uses --proto with an argument of 0-255, which is a test that can be met by all IP trac, since this is the range allocated for IP protocols by IANA.2 Partitioning parameters are the most numerous, to provide a large amount of exibility in describing what ow records are desired. Later examples will show several partitioning parameters. (See rwfilter --help for a full listing; a few of the more commonly used parameters are listed in Table 3.3.) As shown in Figure 3.2, there are several groups of partitioning parameters. This section focuses on the parameters that partition based on elds of ow records. Section 4.5 discusses IP Sets and how to lter with those sets. Section 4.8 describes pmaps and country codes. Section 3.2.6 discusses tuple les and the parameters that use them. The use of dynamic libraries is dealt with in Section 4.9. Lastly, Section 5.1 describes the use of PySiLK plug-ins. Partitioning parameters specify a collection of acceptable options, such as the protocols 6 and 17 or the specic IP address 10.1.23.14. As a result, almost all partitioning parameters describe some group of values. These ranges are generally expressed in the following ways: Value range: Value ranges are used when all values in a closed interval are desired. A value range is two numbers separated by a dash, such as --proto=3-65, which indicates that ow records with protocol numbers from 3 through 65 (inclusive) are desired. Some partitioning parameters (such as --packets) demand a value range; if only a single value is desired, use the value on both sides of the2 See http://www.iana.org/assignments/protocol-numbers; in IPv4 this is the protocol eld in the header, but in IPv6 this is the next-header eld both have the range 0-255.

26

Parameter --protocol --packets --flags-all --saddress --daddress --any-address --sport --dport --aport

Table 3.3: Example 6 1-3 R/SRF

Commonly-Used rwfilter Partitioning Parameters Description Which protocol number (6=TCP, 17=UDP, 1=ICMP) to lter Filter ow records that are in the specied range of packet counts Filter ow records that have the specied ags set and not set (TCP only) 10.2.1.3,237 Filter ow records for source address 10.2.1.3-5 Like --saddress, but for destination 10.2.1.x Like --saddress, but for either source or destination 0-1023 Filter ow records for source port 25 Like --sport, but for destination port 80,8080 Like --sport, but for either source or destination

dash (--packets=5-5). A missing value on the end of the range (e.g., --bytes=2048-) species that any value greater than or equal to the other value is desired. Missing values at the start of a range are not permitted. Value alternatives: Fields that have a nite set of values (such as ports or protocol) can be expressed using a comma-separated list. In this format a eld is expressed as a set of numbers separated by commas. When only one value is acceptable, that value is presented without a comma. Examples include --proto=3 and --proto=3,9,12. Value ranges can be used as elements of value alternative lists. For example, --proto=0,2-5,7-16,18-255 says that all ow records that are not for ICMP, TCP or UDP trac are desired. Time ranges: Time ranges are two times, potentially down to the millisecond, separated by a dash; in SiLK, these times can be expressed in their full YYYY/MM/DD:HH:MM:SS.mmm form (e.g., 2005/02/11:03:18:00.005-2005/02/11:05:00:00.243). Times may be abbreviated with their natural interpretation: 2005/02/11 is equivalent to 2005/02/11:00:00:00.000. IP addresses: IP addresses are expressed in two ways. The most common expression is a list of value alternatives, separated by appropriate punctuation as described in Section 1.1.3. For example, 113.1.1.1 would select the addresses 1.1.1.1, 2.1.1.1, and so on until 13.1.1.1. For convenience, the letter x can be used to indicate all values in a section (equivalent to 0-255 in IPv4 addresses, 0-FFFF in IPv6 addresses). CIDR notation may also be used, so 1.1.0.0/16 is equivalent to 1.1.x.x and 1.1.0255.0-255. As explained in Section 1.1.3, IPv6 addresses use a double-colon syntax as a shorthand for any sequence of zero values in the address, as well as CIDR notation. TCP ags: The --flags-all, --flags-session and --flags-initial parameters to rwfilter use a compact, yet powerful, way of specifying lter predicates based on the state of the TCP ags. The argument to this parameter has two sets of TCP ags separated by a forward slash (/). The ag-set to the right of the slash contains the mask ; this set lists the ags whose status is of interest, and the set must be non-empty. To the left of the slash is the high ag-set; it lists the ags that must be set for the ow record to pass the lter. Flags listed in the mask-set but not in the high-set must be o. The ags listed in the high-set must be present in the mask-set. (For example, --flags-initial=S/SA species a lter for ow records that initiate a TCP session.) See Example 3-2 for another sample use of this parameter. Country codes: The --scc and --dcc parameters take a comma-separated list of two-letter country codes, as specied by the Internet Assigned Names Authority3 . There are also four special codes: -- for unknown, a1 for anonymous proxy, a2 for satellite provider, and o1 for other.3 http://www.iana.org/domains/root/db/

27

Flow Record Fields IP Sets User pmaps and Country Codes Tuples Dynamic Libs PySiLK

Figure 3.2: rwfilter Partitioning Parameters Attributes: The --attributes parameter takes any combination of the letters F, T, and C, expressed in high/mask notation just as for TCP ags. F indicates the collector saw additional packets after a packet with a FIN ag. (other than those with FIN-ACK) T indicates the collector terminated the ow collection due to time out. C indicates the collector produced the ow record to continue ow collection that was terminated due to time out. Output parameters to rwfilter specify what data should be returned from the call. There are ve output parameters, as described in Table 3.4. Each call to rwfilter must have at least one of these parameters, and may have more than one. In Example 3-1, the --print-volume-stat parameter is used to count the ow records and their associated byte and packet volumes. Table 3.4: rwfilter Output Parameters Description Send SiLK ow records matching partitioning parameters to pipe or le faildata.raw Like --pass, but for records failing to match inle.raw Like --pass, but all records Print count (default, to stderr) of records passing and failing outow-vol.txt Print counts of ows/bytes/packets read, passing and failing to named le 20 Indicate maximum number of records to return as matching partitioning parameters Example stdout

Parameter --pass --fail --all-dest --print-stat --print-vol --max-pass

One of the most useful tools available for in-depth analysis is the drilling-down capability provided by using rwfilter parameters --pass and --fail. Most analysis will involve identifying an area of interest (all the 28

IPs that communicate with address X, for example) and then combing through that data. Rather than pulling down the same base query repeatedly, store the data to a separate data le using the --pass switch. Occasionally, it is more convenient to describe the data not wanted than the desired data. The --fail switch allows saving data that doesnt match the specied conditions. Section 3.2 provides more information about these switches and explains how to select records. To help improve query eciency when only a few records are needed, the --max-pass parameter allows the analyst to specify the maximum number of records to return via the path specied by the --pass parameter. In multiprocessor installations, this is interpreted as the number per processor. In singleprocessor installations, even if multiple threads are used, this is interpreted as the maximum number overall. Other parameters are miscellaneous parameters to rwfilter that have been found to be useful in analysis or in maintaining the repository. These are somewhat dependent on the implementation, and they include those described in Table 3.5. None of these parameters are used in the example, but at times, these are quite useful. Table 3.5: Other Parameters Description Check parameters for legality without actually processing data Print description of rwfilter and its parameters Print name of each input le as it is processed Print names of missing input les to stderr Print version of rwfilter being used Specify number of threads to be used in ltering Specify whether IPv6 or IPv4 (the default) will be used

Parameter --dry-run --help --print-filenames --print-missing --version --threads --ip-version

The --threads parameter takes an integer scalar N to specify using N threads to read input les for ltering. The default value is 1, or the value of the SILK_RWFILTER_THREADS environment variable if that is set. Using multiple threads is preferable for queries that look at many les but return few records. Current experience is that performance peaks at about four threads per CPU on the host running the lter, but this result is variable with the type of query and the number of records returned from each le. The --ip-version parameter is useful if your collection structure includes both IPv6 and IPv4 data. If only one version is present, the SiLK conguration will set the appropriate default. If both are present, then this parameter allows the tools to process either IPv4 or IPv6 data. The argument is a single integer (either 4 or 6). See Example 3-3 for a sample call to rwlter using this parameter.

3.2.2

Finding Low-Packet Flows with rwfilter

The TCP state machine is complex (see Figure 1.4), and legitimate service requests require a minimum of four packets. There are several types of illegitimate trac (such as port scans and responses to spoofed-address packets) that involve TCP ow records with low numbers of packets. Occasionally, there are legitimate TCP ow records with l