by john mchugh presented by hongyu gao feb. 5, 2009
DESCRIPTION
Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory. By John Mchugh Presented by Hongyu Gao Feb. 5, 2009. Outline. Lincoln Lab’s evaluation in 1998 Critic on data generation Critic on taxonomy - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/1.jpg)
Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory
By John Mchugh
Presented by Hongyu Gao
Feb. 5, 2009
![Page 2: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/2.jpg)
Outline
Lincoln Lab’s evaluation in 1998 Critic on data generation Critic on taxonomy Critic on evaluation process Brief discussion on 1999 evaluation Conclusion
![Page 3: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/3.jpg)
The 1998 evaluation
The most comprehensive evaluation of research on intrusion detection systems that has been performed to date
![Page 4: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/4.jpg)
The 1998 evaluation cont’d
Objective: “To provide unbiased measurement of current
performance levels.” “To provide a common shared corpus of
experimental data that is available to a wide range of researchers”
![Page 5: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/5.jpg)
The 1998 evaluation, cont’d
Simulated a typical air force base network
![Page 6: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/6.jpg)
The 1998 evaluation, cont’d
Collected synthetic traffic data
![Page 7: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/7.jpg)
The 1998 evaluation cont’d
Researchers tested their system using the traffic
Receiver Operating Curve (ROC) was used to present the result
![Page 8: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/8.jpg)
1. Critic on data generation
Both background (normal) and attack data are synthesized.
Said to represent traffic to and from a typical air force base.
It is required that such synthesized data should reflect system performance in realistic scenarios.
![Page 9: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/9.jpg)
Critic on background data
Counter point 1 Real traffic is not well-behaved. E.g. spontaneous packet storms that are
indistinguishable from malicious attempts at flooding.
Not considered in background traffic
![Page 10: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/10.jpg)
Critic on background data, cont’d
Counter point 2 Low average data rate
![Page 11: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/11.jpg)
Critic on background data, cont’d
Possible negative consequences System may produce larger amount of FP in
realistic scenario. System may drop packets in realistic scenario
![Page 12: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/12.jpg)
Critic on attack data
The distribution of attack is not realisitic The number of attacks, which are U2R, R2L,
DoS, Probing, is of the same order
U2R R2L DoS Probing
114 34 99 64
![Page 13: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/13.jpg)
Critic on attack data, cont’d
Possible negative consequences The aggregate detection rate does not reflect
the detection rate in real traffic
![Page 14: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/14.jpg)
Critic on simulated AFB network
Not likely to be realistic 4 real machines 3 fixed attack target Flat architecture
Possible negative consequence IDS can be tuned to only look at traffic targeting
to certain hosts Preclude the execution of “smurf” or ICMP echo
attack
![Page 15: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/15.jpg)
2. Critic on taxonomy
Based on the attacker’s point of view Denial of service Remote to user User to root probing
Not useful describing what an IDS might see
![Page 16: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/16.jpg)
Critic on taxonomy, cont’d
Alternative taxonomy Classify by protocol layer Classify by whether a completed protocol
handshake is necessary Classify by severity of attack Many others…
![Page 17: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/17.jpg)
3. Critic on evaluation
The unit of evaluation Session is used Some traffic (e.g. message originating with
Ethernet hubs) are not in any session Is “session” an appropriate unit?
![Page 18: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/18.jpg)
3. Critic on evaluation
Scoring and ROC Denominator?
![Page 19: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/19.jpg)
Critic on evaluation, cont’d
An non-standard variation of ROC --Substitue x-axis with false alarms per day
Possible problem The number of false alarms per unit time may
increase significantly with data rate increasing Suggested alternative
The total number of alert (both TP and FP) Use the standard ROC
![Page 20: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/20.jpg)
Evaluation on Snort
![Page 21: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/21.jpg)
Evaluation on Snort, cont’d
Poor performance on Dos and Probe Good performance on R2L and U2R Conclusion on Snort:
Not sufficient to get any conclusion
![Page 22: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/22.jpg)
Critic on evaluation, cont’d
False alarm rate A crucial concern The designated maximum value (0.1%) is
inconsistent with the maximum operator load set by Lincoln lab (100/day)
![Page 23: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/23.jpg)
Critic on evaluation, cont’d
Does the evaluation result really mean something? ROC curve reflects the ability to detect attack
against normal traffic What does a good IDS consist of?
Algorithm Reliability Good signatures …
![Page 24: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/24.jpg)
Brief discussion on 1999 evaluation
Have some superficial improvements Additional hosts and host types are added New attacks are added
None of these addresses the flaws listed above
![Page 25: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/25.jpg)
Brief discussion on 1999 evaluation, cont’d
Security policy is not clear What is an attack, what is not? Scan, probe
![Page 26: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/26.jpg)
Conclusion
The Lincoln lab evaluation is a major and impressive effort.
This paper criticizes the evaluation from different aspects.
![Page 27: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/27.jpg)
Follow-up Work DETER - Testbed for network security technology.
Public facility for medium-scale repeatable experiments in computer security
Located at USC ISI and UC Berkeley. 300 PC systems running Utah's Emulab software. Experimenter can access DETER remotely to develop,
configure, and manipulate collections of nodes and links with arbitrary network topologies.
Problem with this is currently that there isn't realistic attack module or background noise generator plugin for the framework. Attack distribution is a problem.
PREDICT - Its a huge trace repository. It is not public and there are several legal issues in working with it.
![Page 28: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/28.jpg)
Follow-up Work KDD Cup - Its goal is to provide data-sets from
real world problems to demonstrate the applicability of dierent knowledge discovery and machine learning techniques. The 1999 KDD intrusion detection contest uses a
labelled version of this 1998 DARPA dataset, Annotated with connection features. There are several problems with KDD Cup.
Recently, people have found average TCP packet sizes as best correlation metrics for attacks, which is clearly points out the inefficacy.
![Page 29: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/29.jpg)
Discussion
Can the aforementioned problems be addressed? Dataset Taxonomy Unit for analysis Approach to compare between IDSes …
![Page 30: By John Mchugh Presented by Hongyu Gao Feb. 5, 2009](https://reader036.vdocuments.mx/reader036/viewer/2022070402/568138ca550346895da08477/html5/thumbnails/30.jpg)
The End
Thank you