using identity credential usage logs to detect anomalous service accesses daisuke mashima dr....

22
Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of Technology Atlanta, GA, USA ACM DIM 2009, Chicago, IL, 2009

Upload: todd-wade

Post on 12-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

Using Identity Credential Usage Logs to Detect Anomalous Service Accesses

Daisuke Mashima Dr. Mustaque Ahamad

College of Computing

Georgia Institute of TechnologyAtlanta, GA, USA

ACM DIM 2009, Chicago, IL, 2009

Page 2: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

2

Increasing Risk of Identity Theft

• Variety of online identity credentials– Passwords, certificates, SSN, credit card

number, etc.– Loss and theft are possible and common

• Consequence of online identity theft– Impersonation– Disclosure of sensitive information– Financial loss

Page 3: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

3

To counter such threats…

• Online service providers are required to– Analyze huge amount of log records to identify

suspicious service accesses– Investigate identified records extensively

• In reality…– Significant reliance on human experts– Not processed in real-time basis

• Automated mechanism to monitor identity usage (service accesses) is desired.

Page 4: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

4

Outline

• Observations from real data sets

• Our approach

• Anomaly-based risk scoring scheme

• Preliminary evaluation

• Conclusion / Future Work

Page 5: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

5

Buzzport Access Log

Page 6: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

6

Buzzport Access Log

• Contain only– (Anonymized) User ID– Login timestamp– Logout timestamp

380484533347391, 24/08/2007 14:07:05, 24/08/2007 14:18:46380484533347391, 27/08/2007 08:01:14, 27/08/2007 08:02:54380484533347391, 27/08/2007 08:04:36, 27/08/2007 08:16:05380484533347391, 27/08/2007 12:05:36, 27/08/2007 12:18:15380484533347391, 31/08/2007 14:31:43, 31/08/2007 14:38:08

Page 7: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

7

Another data set

• Log records of a portal of online trading company

• The following items are available:– User ID– Coarse Action Type (Login / Logout)– Timestamp– IP Address– Organization Name etc.

Page 8: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

8

Observations and Considerations

• Available information is quite limited.– Typical fraud detection systems rely on much

richer information

• Data are not labeled.– Supervised techniques are not available.

• Limited types of events can be observed.– Schemes relying on event sequence or state

transition have limited applicability.

Page 9: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

9

Our Approach

• Utilize attributes derived from an individual identity usage record– Timestamp (day-of–week etc.), IP address, etc.– Focus on categorical attributes

• Build user profile based on occurrence frequency of each attribute value

• Determine risk scores based on frequency information

Page 10: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

10

User Profile Management

• Defined as a frequency distribution of attribute values (categories)– One profile for one attribute– Multiple profiles can be defined per user.

• Day-of-week profile, hour-of-day profile, and so forth…

• Updated upon receipt of each log record– Simply increment occurrence counters corresponding

to the attribute values in the record

• Data aging can be easily implemented– Periodically multiply all counters with some     

decay factor

Page 11: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

11

Base Score and Weight

• Base score represents how unlikely an observed user’s access is.– BaseScore = -log (RelativeFrequency)

• Score weight quantifies the “effectiveness” of each attribute for profiling.– When an attribute well characterizes user’s

identity usage pattern, the value should be high.

• How can we quantify it?

Page 12: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

12

Score Weight

• Use “distance” between the frequency distribution and uniform distribution as weight– Bhattacharyya Distance etc.– Data aging is necessary.

1 3 5 7 9

11 13 15 17 19 21 23

0

0.05

0.1

0.15

0.2

0.25

RelativeFrequency

Hour of Day

Page 13: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

13

Score Aggregation

• Sub Score (a product of a base score and the corresponding weight) are computed.– Sub Score is computed for each profile.

• How can we combine Sub Scores?– Pick the MAX of Sub Scores– Weighted sum of Sub Scores– Others?

108 109 9

Page 14: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

14

Setting of Experiments

• Buzzport data set

• Profiling attributes– Week of month (5 categories)– Day of week (7 categories)– Hour of Day (24 categories)

• Scale Sub Scores in [0, 100)

• Use MAX of 3 Sub Scores as output

Page 15: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

15

Trends of Risk Scores

Page 16: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

16

Trends of Risk Scores with Data Aging

• Decay Factor = 0.5 is applied monthly.

Page 17: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

17

False Positive / True Positive Analysis

• Randomly pick 5 users with different access frequency

• Split each user’s log records into two:– Test data: last 1 month– Training data: Rest of them

• Analyze False Positive rate by using the same user’s training data and test data

• Analyze True Positive rate by using different users’ data sets (a.k.a Cross Profiling)

Page 18: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

18

False Positive / True Positive Results

* Each user’s threshold is determined based on the score range of the training data.

Page 19: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

19

Time / Storage Cost

• Measured on Linux PC with Intel Core 2 Duo E6600 and 3GM RAM

• Average time per record: 5ms– Good enough for real-time processing

• Storage space per user: 1.4KB– Potential to accommodate a large number of

users

Page 20: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

20

Conclusion

• Defined design principles for risk scoring based on identity usage logs

• Proposed a way to compute anomaly-based risk scores in real-time basis

• Presented a prototype system using time stamp information and showed that it has reasonably good accuracy

Page 21: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

21

Future Work

• Investigate other attributes (E.g. location)

• Conduct detailed experiments– Evaluate with other data sets– Find the optimum configuration

• Integrate into other security mechanisms

Page 22: Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of

22

Questions?

[email protected] http://www.cc.gatech.edu/~mashima

Thank you very much.