enterprise storage our journey thus far john d. halamka md cio, harvard medical school and beth...
TRANSCRIPT
Enterprise StorageOur Journey Thus Far
John D. Halamka MD
CIO, Harvard Medical School and Beth Israel Deaconess Medical
Center
Agenda
• Exponential Growth– Issues & Resolution
• Disk Performance– Issues & Resolution
• File System Silos– Issues & Resolution
Exponential Growth“The Problem”
Exponential Growth
69,060,583 Files! 81.5 Terabytes
Exponential Growth“The Resolution”
Exponential Growth (Resolved)
Cluster Size (Number of
Nodes)
7 10 50 96
Capacity 252 TB
360 TB
1.8 PB
3.45 PB
Rack Units 28 40 200 384
Gordon Hall Data
Center
Markley Data
Center
Nodes Nodes
7 7
Capacity Capacity
252 TB 252 TBGlobally Coherent Cache
28 GB 28 GBGlobally Coherent Cache
Performance Bottlenecks“Disk Performance Issues”
Performance BottlenecksResearch computational requirements change constantly:
• Orchestra Cluster contains 179 Cluster Nodes (810 CPU Cores) today.
• Future Growth – – 50-75 nodes (400-600 processor cores) each year
– Stimulus Grants could realize an additional 392 nodes (3,136 processors).• Storage Array Cache (The Problem)• Cache on the storage arrays is not globally coherent, causing the
single array to fill up its cache, due to the delayed disk reads and writes.
• Data must be manually moved to increase spindle performance.
• Disk Spindle Contention (The Problem)• Data is striped across the disk spindles to meet the current performance
SLA’s.• Cluster jobs demand more reads and writes of the file system.• The disk spindles that once delivered acceptable performance are no longer
able to keep up.
Performance Bottlenecks“The Resolution”
12
AutoBalance: Automated data balancing across nodes
EM
PT
YE
MP
TY
EM
PT
YE
MP
TY
EM
PT
YF
UL
LF
UL
LF
UL
LF
UL
LB
AL
AN
CE
DB
AL
AN
CE
DB
AL
AN
CE
DB
AL
AN
CE
DB
AL
AN
CE
D• AutoBalance “automatically” migrates data to
newly added storage nodes while the system is online and in production.
• Requires NO manual intervention, NO reconfiguration, NO server or client mount point or application changes.
• HMS performance requirements increase:• Orchestra Cluster adds additional nodes and CPU
cores.• HMS Performance Solution:• Add Storage cluster nodes– Storage capacity is increased– Storage processor is increased– Globally coherent cache is increased
Performance BottlenecksResolved
File System Silos“The Problem”
File System Silos
Storage is assigned to individual Data
Movers, creating “Silos” of storage.
Storage is provisioned in 2 TB File Systems. This
provides maximum flexibility in the
event backups fail.
233 File Systems!
CPU and Memory are not globally coherent and shared, creating “Silos” of CPU and
Memory.
File System Silos“The Resolution”
File System Silos(Resolution) Expandable to more
than 3+ PetabytesIn
1 File System!
All Cluster Nodes have balanced connections!
Cluster can grow up to (96) nodes.
Traditional Backups“The Problem”
Traditional BackupsThe Last 365 Days
• Fulls-Cumaltives-Differentials:
•2,043,628,459.14 Megabytes
• 1,948.95 Terabytes
• 1.9 Petabytes
The Last 365 Days• Tapes• 2073 Tapes• $103,650.00 Tape Costs!
• Off-Site Tape Storage• $1,860.00 per month• $22,320.00 per year
Traditional Backups“The Resolution”
Replication Across Data CentersResolved
Data Protection“No Problem”
Current & Future Data ProtectionCurrent Data Protection (Research and Administrative)
Backup Strategy (Tape) Retention Days of Protection Description
Monthly Full Backups 90 Days 3 versions of the file are potentially recoverable, 1 for each month that the “Monthly Backup” was executed.
Weekly Cumulative 90 Days 12 version of the file are potentially recoverable, 1 for each week that the “Weekly Cumulative” was executed.
Daily Incremental 14 Days 14 versions of the file are potentially recoverable, 1 for each day that the “Daily Incremental” was executed
Checkpoints 3 Days 3 versions of the file are potentially recoverable, 1 for each day that the “Checkpoint” was executed.
Future Data Protection (Research Data)
Backup Strategy Retention Days of Protection Description
Replication Infinite 1 version of the file exists (in two locations) and represents the “live” copy of the file
Checkpoints 30 Days 30 Versions of the file are recoverable, 1 for each day that the “Checkpoint” was executed.
Future Data Protection (Administrative Data)
Folder Type Backup Strategy
Home Directories Monthly Full, Weekly Cumulative, Daily Incremental
Servers Monthly Full, Weekly Cumulative, Daily Incremental
Database Monthly Full, Weekly Cumulative, Daily Incremental
Microsoft Exchange Monthly Full, Weekly Cumulative, Daily Incremental
Storage Project Timeline
Storage Project Timeline