Download - Virus-AntiVirus Co-evolution - DTC
2006 Symantec Corporation, All Rights Reserved
AnonymizingAnonymizing FilesystemFilesystem Metadata Metadata for Analysisfor Analysis
Chris Xin
Symantec
Challenges of Filesystem Analysis
Real-time live-system monitoring is difficult.– performance degradation– security & privacy concerns– stability risk
Traces– difficult to reconstruct I/O dependencies– system states– security & privacy concerns
Benchmarks– “There are lies, damn lies and then there are benchmarks.”
Filesystem images– snapshot, backups– security & privacy concerns
Agenda
Challenges of filesystem analysis
Keeping filesystem images– metasave
Metadata anonymization– secure metasave
Measurement– space efficiency– time efficiency– resource consumption
Summary
Filesystem Images
Storing the whole system would be expensive.– large storage space– long time
Keeping metadata is a wise idea.– A good resource for understanding some characteristics of a file
system– Cumulative images can be obtained to track the change trend of a file
systemfile size, age, type informationfilesystem aging analysis
– Address some privacy concerns by eliminating user data
Some file systems already provide such a utility.– Ext2: e2image– Linux NTFS: ntfsclone --metadata– VxFS: metasave
Metasave Utility
The utility saves or restores the metadata of VxFS– Available in version 1 and later versions.– Metadata is kept in a way that the original geometry of a file system
is preserved and all the inode information is intact.– No user data is retained.– Metadata can be saved on top of a snapshot, a backup, or a live
system as an image file.– The image file can be deflated and metadata can be restored back to
a file or a device.
What do we do with images?– troubleshooting– debugging– file system analysis
Efficient Anonymization
But …your clients may say no …– Sensitive information is still in the file and directory names– Concerns of performance degradation
Solution: Anonymize clients’ information in metadata– Names of files and directories– Client information in file system intent logs
Requirements– Must be difficult to recover original information– Keep the geometry of the file system: retain the length of the
file/directory names– Time efficient– Space efficient– Minimum performance degradation
Secure Metasave
Enhanced metasave with encryption options– Evolved from metasave, a VxFS utility for saving/restoring
metadata of a file system– Online image saving– Use cryptographic message digest algorithm to obfuscate
client informationThe algorithm can be chosen by a client’s requirementDefault: SHA-1
Message Digest
Secure one-way hash function: e=H(M)– M: original message– H: hash function– e: digested message
Key properties– Given M, easy to compute e=H(M) – Given e, hard to compute M such that e=H(M)– Given M, hard to find M' (different from M) such that
H(M)=H(M') (minimum collision)
Implementation
OpenSSL libraryObfuscate a file/directory name
– Do it by individual pathname components/a/bc/bcd /x/rd/wyz
– Retain name lengthDigest works on a fixed length of characters at a time.
– 20 characters for SHA-1If len(name) > len(digest), process it in segments.If len(name) < len(digest) or len(final segment) < len(digest), digest the name string and remove some characters to preserve its original length.Digest can contain characters that are illegal in file/directorynames; map them to legal characters.
File/Directory Name Manipulation
Parse a name stringMessage digestChop it to its original length
Random number generator with a changeable seed
Character mapping
790
digests
0 67
original name string
20 6040
0 67
obfuscated filename
0
chop to org. length
67
Obfuscation Options
Full-name obfuscation
Retain file extension if any
Obfuscate extensions as well and make them consistent
original nameobfuscation option
foo1.c foo2.c
full-name abcde uwxyz
retain file extension jkis.c swdx.c
consistent extension jkis.x swdx.x
Further Handling
Multiple extensions and prefixes for name-only obfuscation option– Look at the last extension only
foo.c.bak abced.bak– retain extension of 4 or less; obfuscate anything bigger
Do not obfuscate the name of special administrative files or directories– lost+found
Rebuild directory indexes and block checksums after name obfuscationSymlinks
– Point to the same place within the file system– “..” is kept intact
Intent logs– Offers an option to not include intent logs in an image file.– If intent log is retained, file and directory names are obfuscated.
Collision Probability
What’s a collision?– Two files/directories with different names, say A and B, end up with
the same name after obfuscation.
Do we have to worry about it?– Not really– Collision only matters within individual directories.– Chance of collision is tiny
With SHA-1, 1 in 1024 possibility for a filesystem with a trillion file/directory names, and 1 in 1018 for quadrillion names.The character mapping and name length chopping increase the chance of collisions slightly.
– An optional name conflict check is followed after obfuscation for a file system with large directories.
Measurement
Three categories– Space consumption– Time consumption
encryption overhead– Resource consumption
Six filesystems measured– four customer filesystems– two filesystems on our production server (fs #2 and #6)
Experiment environment– Live production system
Sun Fire E690016 Sparc CPUs, 32GB memory, shared disks
– Test machineSun Fire V2402 Sparc CPUs, 2GB memory, single-user disks
Space Efficiency
The image of metadata usually takes about 1-5% of the filesystemsize.
storage efficiency
0.08 0.06 0.04 0.05
6.88
0.600.12 0.08
0.73 0.56
11.73
0.63
0
2
4
6
8
10
12
1 2 3 4 5 6
filesystem
% o
f im
age
over
fs s
ize
% of total cap.% of used cap.
Time Efficiency
How long does it take to get an anonymized file system image?– use “filename-only” option– on the live production system
about 30 minutes to get an encrypted metadata image from fs #6.5--8 secs for fs #2.
– on the test machine:time efficiency
1.9 1.7 0.267 6.4
108.33
273
0
50
100
150
200
250
300
1 2 3 4 5 6
filesystem
time
(sec
)
A closer look
The factors in play– # of inodes– total filesystem size– filesystem capacity
fs # files time (sec)
production
msv size/
used fs cap.
39 --
4
--
--
--
1836
742
0.12%
0.08%
0.73%
0.56%
11.73%
3,721
59,584
956,180
2,259,443 0.63%
time (sec)
test
msv size/
total fs cap.
total(GB) used(GB)
1.9
1.7
0.267
6.4
108.33
273.0
27.80.08%
0.06%
0.04%
0.05%
6.88%
49.5
9.0
150.0
3.9
0.60% 195.4
1 18.3
2 39.4
3 0.6
4 12.4
5 2.3
6 186.9
Encryption Overhead
Space efficiency is the same.
time efficiency– Little overhead introduced on a live production system
I/O boundedshared disk
– Noticeable computational overhead on the test machine.
Encryption Overhead on the Test Machine
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 2 3 4 5 6
file system
norm
aliz
ed ti
me
no-encryptionfull-obfuscationfilename-onlyconsistent-extension
Encryption Overhead on the Production System
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 2 3 4 5 6
file system
norm
aliz
ed ti
me
no-encryptionfull-obfuscationfilename-onlyconsistent-extension
Resource Consumption
Not much performance degradation during image saving
– 20 MB memory and 1% of CPU were utilized during the image dumping on a live production system.
Summary
A method of anonymizing filesystem metadata.– Obfuscate clients information to relieve privacy concerns– Cost 1-5% storage of the original file system size.– Fairly quick process and little performance degradation.
We encourage saving file metadata images with anonymization.
– Provide a good resource for file system analysis– Benefit both development and research
The anonymization scheme can be used in other file system utilities, such as trace collecting.
References
Bruce Schneier, Applied Cryptography. Second Edition, J. Wiley and Sons, 1996
Mark Ryan, “One-way secure hash functions”, Computer Security lecture notes, University of Birmingham.
Geoff Kuenning and Ethan L. Miller, "Anonymization Techniques for URLs and Filenames," Technical Report UCSC-CRL-03-05, University of California, Santa Cruz, September 2003.
Xiaoyun Wang, Yiqun Lisa Yin and Hongbo Yu, “Finding Collisions in the Full SHA-1”, CRYPTO 2005
http://www.linux-ntfs.org/
Acknowledgements
Thanks to Oleg Kiselev, John Colgrove, Craig Harmer, Chuck Silvers and George Mathew for discussions.
Thanks to Marianne Lent and Paul Massiglia for suggestions.
Thanks to Ken Zachmann for helping with experiments.
Questions