the new yorker, september 6, 1999, page 76
DESCRIPTION
The New Yorker, September 6, 1999, page 76. File System Usage in Windows NT 4.0. Werner Vogels Dept. of Computer Science Cornell University. Before. After. Goals of the study. Create a new data point with respect to the BSD/Sprite traces. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/1.jpg)
The New Yorker, September 6, 1999, page 76
![Page 2: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/2.jpg)
File System Usage in Windows NT 4.0
Werner VogelsDept. of Computer ScienceCornell University
![Page 3: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/3.jpg)
Before
After
![Page 4: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/4.jpg)
Goals of the study
Create a new data point with respect to the BSD/Sprite traces.
Perform a rigorous statistical analysis of the trace data.
Study behavior of Windows NT File I/O components.
Investigate the complexity of Windows NT operations.
![Page 5: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/5.jpg)
QuizNumber of files on a local file system?
Number of files added per day?
Which percentage of the sequential reads is satisfied by a single read ahead?
Cache read ahead size set by NTFS?
75% of files is open is less then ?
Most active directory?
![Page 6: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/6.jpg)
![Page 7: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/7.jpg)
Top 10 observations on Window NT file system usage
10. Using commercial data-mining tools for experimental data analysis was a big win.
![Page 8: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/8.jpg)
Some numbers … 4 groups of users
45 workstations
24 days of continuous tracing
1042 valid trace files
195 idle trace days
237 million trace records 31 million open requests 2.9 million failed open 410 GByte data requested 315 processes
7 million WinLogon 289 file types
3.4 million gifs
![Page 9: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/9.jpg)
Top 10 observations on Window NT file system usage
9. Executable, DLL’s & fonts dominate the local File System content.
10. Using commercial data-mining tools was very useful.
![Page 10: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/10.jpg)
Observations on file system content
Mandatory reading: Douceur and Bolosky, SIGMETRICS’99
“C:\” typically holds 24,000 to 45,000 files
File type distribution is highly variant
File size distributions are identical
File type weighted by size are similar:
Dominated by executables, DLL’s, fonts, etc.
Shifts only in extreme cases
![Page 11: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/11.jpg)
_mp _q_ ~bt 0 f 0 i 0 l 8ba 8be 8bf 8bi 8bp 8bx8by a a00 abr ace acf acl acm aco act ado aiaiff aim ann ans api aps art as as$ as0 asa asdasf asm asp au avi awx ax b b00 bak bas basebat bcp bin bm bm_ bmp boo bpd bsc btl btr cc00 cab cag cat cdf cdx cer cfg cgi chi chm chschw cl cla clas class cls clw cm cmd cnf cnt cnvco col com com. cor corn cpi cpl cpp cpp" cpx cssctl ctm cur d da dat dat. db dbg dbp dct debugdef dep desc dib dic dir dl dll doc dot drv dspdsw dtq dvi e edu elm en en_us enc ev ex ex_exc exd exe exe" exp fav fd fem flt fm fmt fnfon font for frm frx ftg fts g gc gi gid gifgpz grm gw gz h h" hhc hhk hiv hl hlp hmhst ht htm html htr htt htw htx hxx i ico ididb idl idq idx ilk imp in inc inf ini inl insint inv ipx isu ivi ivk ivt j jar jav java jnfjp jpe jpeg jpg js jss kbm key l lck ldb ldifled lex li lib lls ln lnk log lst m m1 makman map mb mbx mc mcd mch mcs mdb mdp mdw midmk mmm mod mon msc n nab nav ncb nch ne netnet. nick nls nmd not now nt o ob obj oca ocmocx odl oe olb old opt org osd ost p pab pakpc pcb pce pch pdb pdf pds pfb pi pif pkg pkppl plg pmc pnf png pol ppd ppt prf prp ps psdpsp pst pvk qt r ram rat raw rc rc2 rct rdbre reg res rgb rgs rh rhc rsc rtf rwz sbr scscc scf scr sdk sfl sht shtml sif sig sln sml snmspc sql src srg stf stp sty suo svj swf sym synsys t tbd tbl tdf tem tex text tif tip tlb tmptoc trg ttf twd txt u uin ur url ush utf8 vbdvbp vbr vbs vbw vbx vcf vcp vic vip vj vjp vsdvsdir vsk vsz vxd wa~ wab wav wbk wll wmf wpc wrixbm xls xml xs z zip
File type count
One more cookie …
![Page 12: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/12.jpg)
Top 10 observations on Window NT file system usage
8. The WWW cache is the hot spot in the local File System
9. Executable, DLL’s & fonts dominate the local FS
10. Using commercial data-mining tools was very useful.
.
![Page 13: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/13.jpg)
Observations on file system content – II Differences are in the profile tree
Downloaded from central server per user
Hot spot is the WWW cache in the profile 2000 – 9500 files, 5 – 45 Mbytes
Changes over time, daily pattern: 300 – 500 files added to the system (up to 3000) 93% in the WWW cache
![Page 14: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/14.jpg)
Top 10 observations on Window NT file system usage
7. On average files are open for longer periods.
8. The WWW cache is the hot spot in the local FS.
9. Executable, DLL’s & fonts dominate the local FS.
10. Commercial data-mining tools are very useful.
![Page 15: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/15.jpg)
Create, Cleanup & Close Open request arrivals
40% within 1 msec 90% within 30 msec
Open times 40% less than 1 msec 90% less than
10 second – data 1 second - average 20 msec - control
Strong heavy-tail
Variance is high Mainly depends on process,
not on type
Interarrival period of open requests
1 msec 10 msec 100 msec 1 sec 10 sec
Perc
enta
ge o
f the
requ
ests
0
20
40
60
80
100
open for I/Oopen for control
File session lifetime
1 msec 1 sec 16 min
Perc
enta
ge o
f fi
les
0
20
40
60
80
100
All usage typesFile open for control operationsFile open for data operations
![Page 16: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/16.jpg)
10 observations on Window NT file system usage
6. What are all those #$%@ control operations about?
7. Files are open for increasingly shorter periods
8. The WWW cache is the hot spot in the local FS
9. Executable, DLL’s & fonts dominate the local FS
10. Using commercial data-mining tools was very useful.
![Page 17: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/17.jpg)
File Control Operations
0
10
20
30
40
creaifop
ifovopen
ctldir
rorw
wo
74% of the open requests is to perform control operation
33 different major requests
Many originates in the runtime libraries (volume mounted).
Some are triggered by system components (SetEndOfFile).
Control operations can only be made on open files.
![Page 18: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/18.jpg)
Top 10 observations on Window NT file system usage
5. The FastIo path is extremely important
![Page 19: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/19.jpg)
…and somebody should document this soon …
![Page 20: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/20.jpg)
The importance of FASTIO Procedural interface
with 27 methods.
Provides a direct path to reading and writing of data directly from/to the cache.
Packet-based IO setups of the cache after which FastIO takes over.
0
10
20
30
40
50
60
70
IrpWrite
IrpRead
FastIoWriteFastIoRead
bytescount
time
![Page 21: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/21.jpg)
Top 10 observations on Window NT file system usage
4. The life-time expectation of new files has decreased by an order of magnitude.
5. The FastIo path is extremely important.
![Page 22: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/22.jpg)
Life-time of new files
Create & overwrite (37%)
75% overwritten within 4 milliseconds after Create
75% overwritten within 0.7 milliseconds after Close
94% of the processes that Create also overwrite.
80% of the newly created files is deleted within 4 seconds (30 seconds in Sprite)
Create & delete (62%)
72% deleted within 4 seconds after Create
60 % deleted within 1.5 seconds after close
36% of the processes that create also perform delete.
18% is opened multiple between create and delete
![Page 23: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/23.jpg)
Top 10 observations on Window NT file system usage
3. User activity and file access patterns appear to have changed less prominently.
4. The life-time expectation of new files has decreased by an order of magnitude.
5. The FastIo path is extremely important.
![Page 24: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/24.jpg)
User activity
Peak throughput per user in MBytes/sec
0
2
4
6
8
10
12
10 minute 10 second
SpriteWindows NT
Average throughput per user in KBytes/sec
0
5
10
15
20
25
30
35
40
45
50
10 minute 10 second
BSDSpriteWindows NT
![Page 25: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/25.jpg)
File access patterns - counts
0
20
40
60
80
read-only
write-only
read/writeSprite
Windows NT
perc
enta
geFile access (counts)
Whole file sequential - SpritePartial file sequential - SpriteRandom access - SpriteWhole file sequential - Windows NTPartial file sequential - Windows NTRandom acces - Windows NT
![Page 26: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/26.jpg)
File access patterns - bytes
0
20
40
60
80
read-only
write-only
read/writeSprite
Windows NT
Per
cent
age
Dispos
ition
Access type
File access (bytes)
Whole file sequential - SpritePartial file sequential - SpriteRandom access - SpriteWhole file sequential - Windows NTPartial file sequential - Windows NTRandom acces - Windows NT
![Page 27: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/27.jpg)
Top 10 observations on Window NT file system usage
2. Life isn’t a simple Poisson process …
3. User activity and file access patterns appear to have changed less prominently.
4. The life-time expectation of new files has decreased by an order of magnitude.
5. The FastIo path is extremely important.
![Page 28: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/28.jpg)
File access patterns revisited
File Usage TypeW - + S W - + S W - + S W - + S
Whole file 68 1 99 78 58 3 96 89Other 20 0 62 19 11 0 72 5Random 12 0 99 3 31 0 97 7Whole file 78 5 99 67 70 1 99 69Other 7 0 51 29 3 0 47 19Random 15 0 94 4 27 0 99 11Whole file 22 0 90 0 5 0 76 0Other 3 0 28 0 0 0 14 0Random 74 2 100 100 94 9 100 0
11 15 0 70Read/Write 3 0 16
80
Write-only 18 3 77 11 26 0 73 19
88 59 21 99Read-only 79 21 97
Accesses (%) Bytes (%) Accesses (%) Bytes (%)
Table 3: access patterns
![Page 29: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/29.jpg)
File access patterns revisited
File Usage TypeW - + S W - + S W - + S W - + S
Whole file 68 1 99 78 58 3 96 89Other 20 0 62 19 11 0 72 5Random 12 0 99 3 31 0 97 7Whole file 78 5 99 67 70 1 99 69Other 7 0 51 29 3 0 47 19Random 15 0 94 4 27 0 99 11Whole file 22 0 90 0 5 0 76 0Other 3 0 28 0 0 0 14 0Random 74 2 100 100 94 9 100 0
Accesses (%) Bytes (%) Accesses (%) Bytes (%)
Read-only 79 21 97 88 59 21 99 80
Write-only 18 3 77 11 26 0 73 19
Read/Write 3 0 16 11 15 0 70
Table 3: access patterns
![Page 30: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/30.jpg)
10 minute throughput revisited
![Page 31: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/31.jpg)
Process assumptions
0 20 40 60 80 100
Open
req
uest
s pe
r 1
seco
nd
0
200
400
600
800
1000
0 250 500 750 1000
Open
req
uest
s pe
r 10
sec
onds
0
2000
4000
6000
8000
10000
0 2000 4000 6000 8000 10000
Open r
eque
sts
per
100
seco
nds
0
10000
20000
30000
40000
50000
0 20 40 60 80 100
Arri
vals
per
1 s
econ
d
0
200
400
600
800
0 200 400 600 800 1000
Arri
vals
per
10
seco
nds
1000
2000
3000
4000
0 2000 4000 6000 8000 100000
10000
20000
30000
40000
50000
Arri
vals
per
100
sec
onds
Arrival rate of open requests in trace sample #239
Synthesized sample assuming a Poisson process
![Page 32: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/32.jpg)
Top 10 observations on Window NT file system usage
1. Black box analysis does not lead to relevant insights.
2. Life isn’t a simple Poisson process …
3. User activity and file access patterns appear to have changed less prominently.
4. The life-time expectation of new files has decreased by an order of magnitude.
5. The FastIo path is extremely important.
![Page 33: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/33.jpg)
The failure of Black Box data analysis
It is wrong to assume that all trace data combined can be seen as one unified trace representing the behavior of a single Windows NT workstation.
![Page 34: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/34.jpg)
The failure of Black Box data analysis - II No statistical proof that any two non-idle
traces draw their values from the same distribution.
The best result possible was that values come from the same type of distribution.
Using traditional statistical fitting of a predefined model to large amounts of trace data does not lead to any real insights
“on average each open requests transfers 7KBytes of data”
![Page 35: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/35.jpg)
The origin of the complexity No more “well behaved” applications. Too many variants. Pervasive presence of heavy-tails in the
distributions of almost all variables. (tail estimators indicate infinite variance).
Closed loop processing amplifies the presence of heavy-tail distributions found the file system content.
![Page 36: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/36.jpg)
Last wordsWhy you should read the paper …
The paper reports on an initial analysis of the result of a large scale file system usage study.
Detailed description of the methods used for tracing and analysis.
Historical comparison with BSD & Sprite traces.
Lots of data on file system content, cache behavior, IO specifics, etc.
Analysis of the presence of heavy tailed distributions in all of the traced variables.
![Page 37: The New Yorker, September 6, 1999, page 76](https://reader035.vdocuments.mx/reader035/viewer/2022070503/56816393550346895dd48bd6/html5/thumbnails/37.jpg)