scientific computing division trends and directions of mass storage in the scientific computing...

25
ientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric Research

Upload: rosanna-flowers

Post on 01-Jan-2016

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

Scientific Computing Division

Trends and Directions of Mass Storage in the Scientific Computing Arena

CAS 2001

Gene HaranoNational Center for Atmospheric Research

Page 2: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric
Page 3: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

3CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Vision

• How do we accomplish that vision?• Handling large datasets – Analysis and

Visualization• Shared File Systems and Cache Pools• Middleware and layering• Management tools• Emerging Technologies• (To name a few)

Page 4: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

4CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Large Datasets

• The NCAR MSS was originally a tape based archive.• NCAR MSS average file size is 35 MBs (11 M files);

small due to historical restrictions (single volume datasets, model history files) and a large number (25%) of files < 1 MB (user backups)

• Single TB sized files are common for visualization and analysis• Currently these large files are sliced up prior to landing in

the archive.• Access is generally sequential, but some random access.

Page 5: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

5CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Large Datasets

• Are tape based archives obsolete?• No, but there is a need to reevaluate the

entire storage structure at NCAR.• Cache pools• Data warehouses, data sub-setting

• The NCAR MSS is being treated as a shared file system rather than an archive.

Page 6: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

6CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Shared File System

• Heterogeneous• High-Performance• High-Capacity• Doesn’t yet exist.

Shared Data

Web/GRID/servers

Programmatic

CommandLine

Page 7: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

7CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Cache Pools

• External to the archive• Minimize archive activity• Temporary data stays out of the archive• Customized for a smaller set of associated data

• Internal to the archive• Minimize tape activity• Improve response time• Federate and distribute• Repackage small files for tape storage under

system control

Page 8: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

8CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

MSS Proxy

Data analysis

GPFSShared File System

Advanced Research Computing System (IBM SP)

Terascale Modeling & Analysis

Page 9: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

9CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

VislabMSS Proxy Data analysis

Storage Area NetworkShared File System

Terascale Analysis & Visualization

Page 10: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

10CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

CDP/ESGData

Processor DSS server

Storage Area NetworkShared File System

Unidata,DODs

MSSProxy

Data Provisioning & Access

Page 11: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

11CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Internal Cache Pools

• NCAR MSS event log modeling (April 2000 – April 2001) – looking at tape activity

• 20 TB cache pool – can be federated and distributed• 30 day average cache residency• 70% reduction in tape read-backs• Greatly enhanced response time• Reduce the amount of tape resources or

redefine their use.

Page 12: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

12CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Middleware and Layering

• An Archive performs 2 basic functions• Reliably storing data• Returning data on demand

• Data analysis, data mining, data assimilation, distributed data servers, etc. are functions utilizing middleware that sits on top of an archive and should be implemented independent of the underlying archive.

Role of an archive

Page 13: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

13CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Middleware and Layering

• Separate archive functionality from• Visualization• Data servers• Data warehousing, data mining, data subsetting• Web and Grid access• Etc.

• Maximally enables the use of COTS• Allows (transparent) replacement of components

as needed• Fill the gaps with custom software

Page 14: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

14CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Future Data Services

File CacheServices Pools

NCAR MSS Archive

Data Analysis/Mining/Assimilation

Data Cataloging/Searching

Data Storage Data Storage

Digital Libraries, Data Servers

VisualizationWEB

Page 15: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

15CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Management Tools

• There is a need for better user and system management tools as MSS capacity scales.

• How does a single user manage 1 million files?

• How does a MSS administrator dynamically tune a system, predict workloads, find and correct bottlenecks?

Page 16: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

16CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Management Tools

• Defining new roles• Single ordinary user• MSS superuser• As users come and go, there is a need for:

• Project superuser (new)• Division data administrator (new)

• Web based metadata user tools• List, search, catalog holdings – metadata mining• Remove unwanted files

NCAR MSS tools

Page 17: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

17CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Management Tools

• From the system perspective – utilize data warehousing and data mining techniques• System modeling using event logs.

• Capacity planning• Identify bottlenecks

• Operational monitoring• Track errors, identify trends (media problems)• Intrusion detection• Dynamic system tuning

NCAR MSS tools

Page 18: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

18CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Emerging Technologies

• Data Path• Tape• Holographic Storage• Probe-Based MEMS• High-Density Rosetta (analog)

Page 19: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

19CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Data Path

• HIPPI in use today in the NCAR archive• Fibre Channel will replace our HIPPI in the

near term• FC SAN for RAID Cache Pools• FC SAN for Tape sharing

• Others• iSCSI• FC over IP• Infiniband

Page 20: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

20CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Tape

0

5

10

15

20

25

30

35

40

0 10 20 30 40 50 60 70 80 90 100

Dat

a R

ate

(MB

/sec

)

Linear

3590

3570

3590E

Mammoth DLT-7000

DTFSD-3

Helical Native Cartridge Capacity (GB)Native Cartridge Capacity (GB)

3480/90

AIT-29840

AIT

3570C Ultrium2001

9490 EEAccelis

Mammoth 2

SDLT

3490 E

DLT-4000

99409940

2H022H02

9840B9840B

OptOpt200320031 TB1 TB

200GB 1Q02200GB 1Q02

500GB 500GB 2003

1 TB,60MB,20041 TB,60MB,2004

Page 21: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

21CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Tape

• To be competitive with magnetic disk, magnetic tape must grow at 10x each 5 years.

• Achieved by a combination of increased areal density and longer (and possibly wider) tape.

(from a storage vendor)

Page 22: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

22CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Tape

• RAIT (Redundant Array of Independent Tapes)• Increased Performance• Higher Reliability with the use of parity• Higher single “volume” Capacity• Large datasets on a single “volume”

• RAIL (Redundant Array of Independent Libraries)• Greater total system capacity• Improved response time

• These are resource intensive solutions – dedicated libraries and drives

Page 23: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

23CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Holographic

• Large capacity – 10 GBs in a single cubic centimeter (10 Gbits/in2 for magnetic disk)

• High-speed – 2 Gigabits/sec• Low power• Billions of write cycles

Page 24: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

24CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

Probe-Based MEMS

• MEMS – Micro-Electrical Mechanical Systems

• Probe-based storage arrays• Dense• Highly parallel to achieve high bandwidth• Rectilinear 2D positioning• Commercial devices in the next several years

Page 25: Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric

25CAS 2001 – October 30, 2001

Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division

HD Rosetta

• Product marketed by Norsam Technologies

• Developed at Los Alamos National Lab• Analog

• Lifetime of 1000s of years• Can be read back with only a microscope• Stores text and images