open science grid: more compute power alan de smet chtc@cs.wisc.edu

Post on 23-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Open Science Grid:More compute power

Alan De Smet chtc@cs.wisc.edu

chtc.cs.wisc.edu

(CPU days each day averaged over one month)

CHTC Cores In Use

1,500

chtc.cs.wisc.edu

(CPU days each day averaged over one month)

OSG Cores In Use

60,000

chtc.cs.wisc.edu

Open Science Grid

chtc.cs.wisc.edu

CHTC and OSG usage

(CPU days each day)

chtc.cs.wisc.edu

Challenges Solved

We worry about all of this.

You don’t have to.

›Authentication X.509 certificates, certificate authorities, VOMS

›Interface Globus, GridFTP, Grid universe

›Validation Linux distribution, glibc version, basic libraries

chtc.cs.wisc.edu

Using OSG

› Before

universe = vanilla

executable = myjob

log = myjob.log

queue

chtc.cs.wisc.edu

Using OSG

› After

universe = vanilla

executable = myjob

log = myjob.log

+WantGlidein = true

queue

chtc.cs.wisc.edu

Challenge: Opportunistic

› OSG computers go away without notice

› Solutions Condor restarts automatically Sub-hour jobs Self-checkpointing Automated checkpointing

• Condor’s standard universe

• DMTCPhttp://dmtcp.sourceforge.net/

chtc.cs.wisc.edu

Challenge: Local Software

chtc.cs.wisc.edu

Challenge: Local Software

› Bare-bones Linux systems

› Solution Bring everything with you CHTC provided MATLAB and R packages

• RunDagEnv/mkdag

chtc.cs.wisc.edu

Challenge: Erratic Failures

› Complex systems fail sometimes

› Solution Expect failures and automatically

retry DAGMan for retries DAGMan POST scripts to detect

problems• RunDagEnv/mkdag

chtc.cs.wisc.edu

Challenge: Bandwidth

› Solutions Only send what you need Store large, shared files in our web

cache Read small amounts of data on the fly

• Condor’s standard universe• Parrot

http://www.cse.nd.edu/~ccl/software/parrot/

top related