the new yorker, september 6, 1999, page 76

37
The New Yorker, September 6, 1999, page 76

Upload: madison

Post on 22-Mar-2016

41 views

Category:

Documents


3 download

DESCRIPTION

The New Yorker, September 6, 1999, page 76. File System Usage in Windows NT 4.0. Werner Vogels Dept. of Computer Science Cornell University. Before. After. Goals of the study. Create a new data point with respect to the BSD/Sprite traces. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The New Yorker, September 6, 1999, page 76

The New Yorker, September 6, 1999, page 76

Page 2: The New Yorker, September 6, 1999, page 76

File System Usage in Windows NT 4.0

Werner VogelsDept. of Computer ScienceCornell University

Page 3: The New Yorker, September 6, 1999, page 76

Before

After

Page 4: The New Yorker, September 6, 1999, page 76

Goals of the study

Create a new data point with respect to the BSD/Sprite traces.

Perform a rigorous statistical analysis of the trace data.

Study behavior of Windows NT File I/O components.

Investigate the complexity of Windows NT operations.

Page 5: The New Yorker, September 6, 1999, page 76

QuizNumber of files on a local file system?

Number of files added per day?

Which percentage of the sequential reads is satisfied by a single read ahead?

Cache read ahead size set by NTFS?

75% of files is open is less then ?

Most active directory?

Page 6: The New Yorker, September 6, 1999, page 76
Page 7: The New Yorker, September 6, 1999, page 76

Top 10 observations on Window NT file system usage

10. Using commercial data-mining tools for experimental data analysis was a big win.

Page 8: The New Yorker, September 6, 1999, page 76

Some numbers … 4 groups of users

45 workstations

24 days of continuous tracing

1042 valid trace files

195 idle trace days

237 million trace records 31 million open requests 2.9 million failed open 410 GByte data requested 315 processes

7 million WinLogon 289 file types

3.4 million gifs

Page 9: The New Yorker, September 6, 1999, page 76

Top 10 observations on Window NT file system usage

9. Executable, DLL’s & fonts dominate the local File System content.

10. Using commercial data-mining tools was very useful.

Page 10: The New Yorker, September 6, 1999, page 76

Observations on file system content

Mandatory reading: Douceur and Bolosky, SIGMETRICS’99

“C:\” typically holds 24,000 to 45,000 files

File type distribution is highly variant

File size distributions are identical

File type weighted by size are similar:

Dominated by executables, DLL’s, fonts, etc.

Shifts only in extreme cases

Page 11: The New Yorker, September 6, 1999, page 76

_mp _q_ ~bt 0 f 0 i 0 l 8ba 8be 8bf 8bi 8bp 8bx8by a a00 abr ace acf acl acm aco act ado aiaiff aim ann ans api aps art as as$ as0 asa asdasf asm asp au avi awx ax b b00 bak bas basebat bcp bin bm bm_ bmp boo bpd bsc btl btr cc00 cab cag cat cdf cdx cer cfg cgi chi chm chschw cl cla clas class cls clw cm cmd cnf cnt cnvco col com com. cor corn cpi cpl cpp cpp" cpx cssctl ctm cur d da dat dat. db dbg dbp dct debugdef dep desc dib dic dir dl dll doc dot drv dspdsw dtq dvi e edu elm en en_us enc ev ex ex_exc exd exe exe" exp fav fd fem flt fm fmt fnfon font for frm frx ftg fts g gc gi gid gifgpz grm gw gz h h" hhc hhk hiv hl hlp hmhst ht htm html htr htt htw htx hxx i ico ididb idl idq idx ilk imp in inc inf ini inl insint inv ipx isu ivi ivk ivt j jar jav java jnfjp jpe jpeg jpg js jss kbm key l lck ldb ldifled lex li lib lls ln lnk log lst m m1 makman map mb mbx mc mcd mch mcs mdb mdp mdw midmk mmm mod mon msc n nab nav ncb nch ne netnet. nick nls nmd not now nt o ob obj oca ocmocx odl oe olb old opt org osd ost p pab pakpc pcb pce pch pdb pdf pds pfb pi pif pkg pkppl plg pmc pnf png pol ppd ppt prf prp ps psdpsp pst pvk qt r ram rat raw rc rc2 rct rdbre reg res rgb rgs rh rhc rsc rtf rwz sbr scscc scf scr sdk sfl sht shtml sif sig sln sml snmspc sql src srg stf stp sty suo svj swf sym synsys t tbd tbl tdf tem tex text tif tip tlb tmptoc trg ttf twd txt u uin ur url ush utf8 vbdvbp vbr vbs vbw vbx vcf vcp vic vip vj vjp vsdvsdir vsk vsz vxd wa~ wab wav wbk wll wmf wpc wrixbm xls xml xs z zip

File type count

One more cookie …

Page 12: The New Yorker, September 6, 1999, page 76

Top 10 observations on Window NT file system usage

8. The WWW cache is the hot spot in the local File System

9. Executable, DLL’s & fonts dominate the local FS

10. Using commercial data-mining tools was very useful.

.

Page 13: The New Yorker, September 6, 1999, page 76

Observations on file system content – II Differences are in the profile tree

Downloaded from central server per user

Hot spot is the WWW cache in the profile 2000 – 9500 files, 5 – 45 Mbytes

Changes over time, daily pattern: 300 – 500 files added to the system (up to 3000) 93% in the WWW cache

Page 14: The New Yorker, September 6, 1999, page 76

Top 10 observations on Window NT file system usage

7. On average files are open for longer periods.

8. The WWW cache is the hot spot in the local FS.

9. Executable, DLL’s & fonts dominate the local FS.

10. Commercial data-mining tools are very useful.

Page 15: The New Yorker, September 6, 1999, page 76

Create, Cleanup & Close Open request arrivals

40% within 1 msec 90% within 30 msec

Open times 40% less than 1 msec 90% less than

10 second – data 1 second - average 20 msec - control

Strong heavy-tail

Variance is high Mainly depends on process,

not on type

Interarrival period of open requests

1 msec 10 msec 100 msec 1 sec 10 sec

Perc

enta

ge o

f the

requ

ests

0

20

40

60

80

100

open for I/Oopen for control

File session lifetime

1 msec 1 sec 16 min

Perc

enta

ge o

f fi

les

0

20

40

60

80

100

All usage typesFile open for control operationsFile open for data operations

Page 16: The New Yorker, September 6, 1999, page 76

10 observations on Window NT file system usage

6. What are all those #$%@ control operations about?

7. Files are open for increasingly shorter periods

8. The WWW cache is the hot spot in the local FS

9. Executable, DLL’s & fonts dominate the local FS

10. Using commercial data-mining tools was very useful.

Page 17: The New Yorker, September 6, 1999, page 76

File Control Operations

0

10

20

30

40

creaifop

ifovopen

ctldir

rorw

wo

74% of the open requests is to perform control operation

33 different major requests

Many originates in the runtime libraries (volume mounted).

Some are triggered by system components (SetEndOfFile).

Control operations can only be made on open files.

Page 18: The New Yorker, September 6, 1999, page 76

Top 10 observations on Window NT file system usage

5. The FastIo path is extremely important

Page 19: The New Yorker, September 6, 1999, page 76

…and somebody should document this soon …

Page 20: The New Yorker, September 6, 1999, page 76

The importance of FASTIO Procedural interface

with 27 methods.

Provides a direct path to reading and writing of data directly from/to the cache.

Packet-based IO setups of the cache after which FastIO takes over.

0

10

20

30

40

50

60

70

IrpWrite

IrpRead

FastIoWriteFastIoRead

bytescount

time

Page 21: The New Yorker, September 6, 1999, page 76

Top 10 observations on Window NT file system usage

4. The life-time expectation of new files has decreased by an order of magnitude.

5. The FastIo path is extremely important.

Page 22: The New Yorker, September 6, 1999, page 76

Life-time of new files

Create & overwrite (37%)

75% overwritten within 4 milliseconds after Create

75% overwritten within 0.7 milliseconds after Close

94% of the processes that Create also overwrite.

80% of the newly created files is deleted within 4 seconds (30 seconds in Sprite)

Create & delete (62%)

72% deleted within 4 seconds after Create

60 % deleted within 1.5 seconds after close

36% of the processes that create also perform delete.

18% is opened multiple between create and delete

Page 23: The New Yorker, September 6, 1999, page 76

Top 10 observations on Window NT file system usage

3. User activity and file access patterns appear to have changed less prominently.

4. The life-time expectation of new files has decreased by an order of magnitude.

5. The FastIo path is extremely important.

Page 24: The New Yorker, September 6, 1999, page 76

User activity

Peak throughput per user in MBytes/sec

0

2

4

6

8

10

12

10 minute 10 second

SpriteWindows NT

Average throughput per user in KBytes/sec

0

5

10

15

20

25

30

35

40

45

50

10 minute 10 second

BSDSpriteWindows NT

Page 25: The New Yorker, September 6, 1999, page 76

File access patterns - counts

0

20

40

60

80

read-only

write-only

read/writeSprite

Windows NT

perc

enta

geFile access (counts)

Whole file sequential - SpritePartial file sequential - SpriteRandom access - SpriteWhole file sequential - Windows NTPartial file sequential - Windows NTRandom acces - Windows NT

Page 26: The New Yorker, September 6, 1999, page 76

File access patterns - bytes

0

20

40

60

80

read-only

write-only

read/writeSprite

Windows NT

Per

cent

age

Dispos

ition

Access type

File access (bytes)

Whole file sequential - SpritePartial file sequential - SpriteRandom access - SpriteWhole file sequential - Windows NTPartial file sequential - Windows NTRandom acces - Windows NT

Page 27: The New Yorker, September 6, 1999, page 76

Top 10 observations on Window NT file system usage

2. Life isn’t a simple Poisson process …

3. User activity and file access patterns appear to have changed less prominently.

4. The life-time expectation of new files has decreased by an order of magnitude.

5. The FastIo path is extremely important.

Page 28: The New Yorker, September 6, 1999, page 76

File access patterns revisited

File Usage TypeW - + S W - + S W - + S W - + S

Whole file 68 1 99 78 58 3 96 89Other 20 0 62 19 11 0 72 5Random 12 0 99 3 31 0 97 7Whole file 78 5 99 67 70 1 99 69Other 7 0 51 29 3 0 47 19Random 15 0 94 4 27 0 99 11Whole file 22 0 90 0 5 0 76 0Other 3 0 28 0 0 0 14 0Random 74 2 100 100 94 9 100 0

11 15 0 70Read/Write 3 0 16

80

Write-only 18 3 77 11 26 0 73 19

88 59 21 99Read-only 79 21 97

Accesses (%) Bytes (%) Accesses (%) Bytes (%)

Table 3: access patterns

Page 29: The New Yorker, September 6, 1999, page 76

File access patterns revisited

File Usage TypeW - + S W - + S W - + S W - + S

Whole file 68 1 99 78 58 3 96 89Other 20 0 62 19 11 0 72 5Random 12 0 99 3 31 0 97 7Whole file 78 5 99 67 70 1 99 69Other 7 0 51 29 3 0 47 19Random 15 0 94 4 27 0 99 11Whole file 22 0 90 0 5 0 76 0Other 3 0 28 0 0 0 14 0Random 74 2 100 100 94 9 100 0

Accesses (%) Bytes (%) Accesses (%) Bytes (%)

Read-only 79 21 97 88 59 21 99 80

Write-only 18 3 77 11 26 0 73 19

Read/Write 3 0 16 11 15 0 70

Table 3: access patterns

Page 30: The New Yorker, September 6, 1999, page 76

10 minute throughput revisited

Page 31: The New Yorker, September 6, 1999, page 76

Process assumptions

0 20 40 60 80 100

Open

req

uest

s pe

r 1

seco

nd

0

200

400

600

800

1000

0 250 500 750 1000

Open

req

uest

s pe

r 10

sec

onds

0

2000

4000

6000

8000

10000

0 2000 4000 6000 8000 10000

Open r

eque

sts

per

100

seco

nds

0

10000

20000

30000

40000

50000

0 20 40 60 80 100

Arri

vals

per

1 s

econ

d

0

200

400

600

800

0 200 400 600 800 1000

Arri

vals

per

10

seco

nds

1000

2000

3000

4000

0 2000 4000 6000 8000 100000

10000

20000

30000

40000

50000

Arri

vals

per

100

sec

onds

Arrival rate of open requests in trace sample #239

Synthesized sample assuming a Poisson process

Page 32: The New Yorker, September 6, 1999, page 76

Top 10 observations on Window NT file system usage

1. Black box analysis does not lead to relevant insights.

2. Life isn’t a simple Poisson process …

3. User activity and file access patterns appear to have changed less prominently.

4. The life-time expectation of new files has decreased by an order of magnitude.

5. The FastIo path is extremely important.

Page 33: The New Yorker, September 6, 1999, page 76

The failure of Black Box data analysis

It is wrong to assume that all trace data combined can be seen as one unified trace representing the behavior of a single Windows NT workstation.

Page 34: The New Yorker, September 6, 1999, page 76

The failure of Black Box data analysis - II No statistical proof that any two non-idle

traces draw their values from the same distribution.

The best result possible was that values come from the same type of distribution.

Using traditional statistical fitting of a predefined model to large amounts of trace data does not lead to any real insights

“on average each open requests transfers 7KBytes of data”

Page 35: The New Yorker, September 6, 1999, page 76

The origin of the complexity No more “well behaved” applications. Too many variants. Pervasive presence of heavy-tails in the

distributions of almost all variables. (tail estimators indicate infinite variance).

Closed loop processing amplifies the presence of heavy-tail distributions found the file system content.

Page 36: The New Yorker, September 6, 1999, page 76

Last wordsWhy you should read the paper …

The paper reports on an initial analysis of the result of a large scale file system usage study.

Detailed description of the methods used for tracing and analysis.

Historical comparison with BSD & Sprite traces.

Lots of data on file system content, cache behavior, IO specifics, etc.

Analysis of the presence of heavy tailed distributions in all of the traced variables.

Page 37: The New Yorker, September 6, 1999, page 76