ncar storage accounting and analysis possibilities

Post on 23-Feb-2016

51 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

NCAR storage accounting and analysis possibilities. David L. Hart, Pam Gillman, Erich Thanhardt NCAR CISL July 22, 2013 dhart@ucar.edu. Why storage accounting?. Big Data Increasing cost of storage with respect to compute NSF data management plan mandate Tools for users - PowerPoint PPT Presentation

TRANSCRIPT

NCAR storage accounting and

analysis possibilitiesDavid L. Hart, Pam Gillman, Erich Thanhardt

NCAR CISLJuly 22, 2013

dhart@ucar.edu

3

Why storage accounting?

• Big Data– Increasing cost of storage with

respect to compute • NSF data management plan

mandate– Tools for users

• Some info is better than no info– Some process is better than ad

hoc fire drills• Supports allocation processes

4

Accounting for archive storage

• NCAR has “charged” users for archive use for many years.– Archive accounting has institutional inertia

• NCAR HPSS details, June-July 2013Date Files

(M)PB

(unique)PB

(2nd copy) Users TB+

6/2/13 137.6 19.5 22.3 991 181

6/9/13 138.2 19.8 22.6 991 307

6/16/13 138.8 20.1 22.9 992 370

6/23/13 141.1 20.5 23.3 998 347

6/30/13 142.4 20.7 23.5 1002 266

7/7/13 142.5 20.9 23.6 1005 135

5

Archive storage record• Activity date – date record was collected• Activity type – Read, Write, Storage• Unix uid• Project code – project to charge• Number of files • Bytes – read, written, or stored• Class of service – e.g., single-copy, dual-copy• DNS – of client host• Frequency – interval, in days, between accounting runs

6

Collecting data from HPSS

• Read/write activity– Analyze logs from HSI and HTAR (since May 2013). Logs archived

daily, processed weekly.• Storage activity

– Weekly DB2 table scan and separate post-processing steps.• Accounting system impact

– Approx. 6,000 records per week• Major accounting requirements

– Use of HPSS accounting hooks to associate NCAR project code with HPSS file “account”

– Accounting system and HPSS enforce requirement for every user to have a “default project” to which files will be charged if no other project provided

7

Accounting for disk storage

• Focus on long-term project spaces, which are allocated– But mechanism captures scratch snapshots, too!

• GLADE total storage, June-July 2013

Date Files (M) PB Users TB+6/8/13 183.05 2.87 2,506 55.3

6/15/13 192.96 2.97 2,525 99.36/22/13 210.32 3.02 2,490 53.16/29/13 212.80 3.11 2,500 89.5

7/6/13 224.76 3.11 2,509 8.8

8

Disk storage record• Event time – date record was collected• Project directory• Group — Unix group• Username• Number of files• kB used• Period — reporting interval, in days• QOS — a quality of service field (for future use)

9

Collecting data from GPFS

• File systems don’t have concept of “project”, but GPFS has notion of “file sets”– Leverage file sets to map to project spaces– For scratch, work, home: report per-user data

• Process runs weekly, provides a storage snapshot– With GPFS tools, process requires only a few minutes to complete—full

file system scan not required• Accounting system impact

– Approx. 4,000 records per week• Major accounting requirements

– Agreements and processes between GLADE administrators and User Services about how spaces are created

– Deviation would break the system

10

ANALYSIS AND REPORTING

11

Storage growth over time (1)

HPSS growth in 2013 GLADE growth in 2013

1/6/13

1/31/13

2/25/13

3/22/13

4/16/13

5/11/136/5/13

6/30/130

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

PBPB (w/2nd copy)

0

500

1,000

1,500

2,000

2,500

3,000

3,500/glade/p/work /glade/project /glade/scratch

TB

12

Storage growth over time (3)

User reports show project by week and per-user breakdown

13

Top consumers

0-1 TB 1-10 TB 10-100 TB

100-1000 TB

>1000 TB

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

% Projects % Files % TB

Project holdings in HPSS

0-0.1 TB

0.1-1 TB

1-10 TB

10-100 TB

>100 TB

0%10%20%30%40%50%60%70%80%90%

100%

% Users % Files % TB

User holdings in GLADE

14

Aggregate behavior (1)

-50

0

50

100

150

200

250

300

350

400

450Weekly HPSS growth

TB

Net growth, 3/3-4/7 — ~261 TB

15

14-Oct-

08

24-Oct-

08

3-Nov-08

13-Nov-0

8

23-Nov-0

8

3-Dec-0

8

13-Dec-0

8

23-Dec-0

8

2-Jan-09

12-Jan-09

22-Jan-09

1-Feb-09

11-Feb-09

21-Feb-09

3-Mar-

09

13-Mar-

09

23-Mar-

09

2-Apr-0

9

12-Apr-0

9

22-Apr-0

9

2-May-0

9

12-May

-09

22-May

-09

1-Jun-09

11-Jun-09

21-Jun-09

1-Jul-0

90

10,000

20,000

30,000

40,000

50,000

60,000

70,000

TB written daily

Aggregate behavior (2)

Data written, 3/3-4/7 — 594 TB

16

Compute v. storage (1)

2012-47

2012-49

2012-51

2012-53

2013-02

2013-04

2013-06

2013-08

2013-10

2013-12

2013-14

2013-16

2013-18

2013-20

2013-22

2013-24

2013-260

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

HPC use Disk GB Tape GB

Year-Week

Core

-hou

rs o

r GB

(mill

ions

)

17

Compute v. storage use (2)

2012-43

2012-44

2012-45

2012-46

2012-47

2012-48

2012-49

2012-50

2012-51

2012-52

2012-53

2013-01

2013-02

2013-03

2013-04

2013-05

2013-06

2013-07

2013-08

2013-090.0

500,000.0

1,000,000.0

1,500,000.0

2,000,000.0

2,500,000.0

3,000,000.0

3,500,000.0

4,000,000.0HPC use disk GB tape GB

Year – week

Core

-hou

rs u

sed

or g

igab

ytes

stor

ed (m

illio

ns)

18

Big compute != Big data

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

HPC charges GB growth

Users, sorted by HPC charges

19

What is “Big Data”?

<0.1 GB<1 GB

<10 GB

<100 GB

<1000 GB0

100

200

300

400

500

Users

Average file size

Num

ber o

f use

rs

<1 GB

<10 GB

<100 GB

<1000 GB

<10000 GB

<100000 GB

<1000000 GB

>1000000 GB

050

100150200250300350400

Users

Data stored per user

Num

ber o

f use

rs

<0.1 GB

<1 GB

<10 GB

<100 GB

<1000 GB

02,000,0004,000,0006,000,0008,000,000

10,000,00012,000,000

GB

Average file size

GB st

ored

(mill

ions

)

<1 GB

<10 GB

<100 GB

<1000 GB

<10000 GB

<100000 GB

<1000000 GB

>1000000 GB

0

2,000,000

4,000,000

6,000,000

8,000,000

GB

Data stored per user

GB st

ored

(mill

ions

)

Average file size vs. Total data holdings

20

Managing “orphaned” files

• Verifying accounting records lets site operators identify files owned by inactive users or inactive projects

• On July 7, HPSS accounting showed 177 users with 885 TB of “orphaned” files

• Early outreach to users and project leads does translate to deletions and fewer files for whom an owner cannot be found– Users required to be “actively engaged” in the disposition of

their archive holdings.www2.cisl.ucar.edu/docs/hpss/policies

21

QUESTIONS?

top related