htcaas: a large-scale high-throughput computing by...

1
HTCaaS: A Large-Scale High-Throughput Computing by Leveraging Grids, Supercomputers and Cloud Seungwoo Rho, Seoyoung Kim, Sangwan Kim, Seokkyoo Kim, Jik-Soo Kim and Soonwook Hwang Supercomputing Center at the Korea Institute of Science and Technology Information (KISTI) High-Throughput Computing - Many loosely coupled tasks requiring a large amount of computing power - Independent, sequential jobs that can be scheduled on many different computing resources - Growing in the number of jobs and complexity drives HTC area into Many-Task Computing (MTC) Hard Problems / Issues - Hiding heterogeneity and complexity of leveraging different computing resources from users - Efficiently submitting a large number of jobs at once and managing them - Effective management and exploitation of all available computing resources Harnessing as many computing resources as possible is inevitable - Grids, Supercomputers and Clouds are available to the scientific community Our Approach - High-Throughput Computing As a Service (HTCaaS) for scientific computing - Meta-Job based automatic job split & submission (e.g., parameter sweeps) - User-level job scheduling - Pluggable interface to heterogeneous computing resources - Application independent Cloud Supercomputers Grids Desktop Grids HTCaaS server automatically produce a large amount of independent jobs HTC Problems (Large amounts of computing power for lengthy period) Application Client Application Client Application Client Web Service Interface for Clients Pluggable Resource Interface Design Philosophy - Ease of Use: minimize user overhead for handling a large amount of jobs & resources - Intelligent Resource Selection: automatic selection of more responsive and effective resources - Pluggable Interface to Resources: adopt GANGA’s plugin mechanism for accessing different resources without hardcoding - Support for Many Client Interfaces: a wide range of client interfaces are supported including a native WS-interface, Java API, CLI and GUI WS-Interface User Data Manager Job Manager Monitoring Manager Agent Manager Agent Submission Engine Application Client (Command Line, GUI) Grids Agent Supercomputers Cloud Data Manager API Account Manager Input/output data Job Information User Account data Job Queue Job Manager API Monitoring API Account Manager API System Architecture - Powerful Meta-job description based on the OGF JSDL Parameter Sweep Extension Specification - Agent-based multi-level scheduling & streamlined job dispatching - Service-Oriented Architecture (SOA) based on WS-Interface Job Submission & Execution Steps 1) User logins HTCaaS and uploads input data through User Data Manager 2) User submits a Meta-Job which can be composed of multiple tasks 3) HTCaaS automatically divides a Meta-Job into multiple tasks based on the specification 4) Agent Manager dispatches agents based on job requirements and resource availability 5) Agents proactively request tasks and process 6) Finished results are stored and notified to the user Agent Manager Agent submission Agent submission Monitoring Manager User Data Manager Job Manager Job Queue Account Manager Certificate/Confirm > htcaas Agent Agent Agent Agent Agent Grid Agent Clouds Agent Agent Agent PLSI Supercomputers JSDL: Parameter Sweep Job Extension gJob Request & Dispatch iJob Execution Monitoring Current Status & Future Work - Completed testing of integration with Grids, and Cloud - Seamless integration with PLSI (Partnership & Leadership for the nationwide Supercomputing Infrastructure) in Korea - System Scalability & Fault tolerance - Improving User-level Job Scheduling Utilizing Cloud for Dynamic & On- demand Resource Provisioning - Constructing a hybrid scientific cloud infrastructure for HTC on the fly HTCaaS Server HTCaaS Server Amazon EC2 Instance Pool Agent Manager Agent Submission Amazon EC2 EBS Snapshot Images 1. Request Cloud Instance WS-API 2. Retrieve image 3. Create instance Agent …. Job Manager Job Queue User Data Manager 5. Request Job 6. Execute Job 7. Store Job Results Applications Support - Virtual Screening (Docking) A target protein of 3CL-pro of SARS with 1.1 Million chemical compounds - 3D Visualization of Optimized Design Solution On average 500 CPU utilized, Completed in 2.8 days, totally 2.6 years of computation Agent Agent Agent Agent Grids Volunteer Computing Supercomputer Clouds [Figure 1: HTCaaS Concept. Bridging the various HTC applications and heterogeneous computing resources] [Figure 2: HTCaaS System Architecture. Jobs & input/output data are managed by Job Manager and User Data Manager. Agents are dispatched from Agent Manager and process jobs in Grids, Supercomputers and Clouds] [Figure 3: Job Submission & Execution Steps in HTCaaS] [Figure 4: HTCaaS in the Cloud (Amazon EC2)] [Figure 5: HTCaaS Target Applications]

Upload: others

Post on 23-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HTCaaS: A Large-Scale High-Throughput Computing by ...htcaas.kisti.re.kr/wiki/images/0/05/HTCaaS_Poster-4.5Fx3F(SC'12).pdf · Grids, Supercomputers and Cloud Seungwoo Rho, Seoyoung

HTCaaS: A Large-Scale High-Throughput Computing by Leveraging Grids, Supercomputers and Cloud

Seungwoo Rho, Seoyoung Kim, Sangwan Kim, Seokkyoo Kim, Jik-Soo Kim and Soonwook HwangSupercomputing Center at the Korea Institute of Science and Technology Information (KISTI)

High-Throughput Computing

- Many loosely coupled tasks requiring a large amount of computing power

- Independent, sequential jobs that can be scheduled on many different computing resources

- Growing in the number of jobs and complexity drives HTC area into Many-Task Computing (MTC)

Hard Problems / Issues- Hiding heterogeneity and complexity of

leveraging different computing resources from users

- Efficiently submitting a large number of jobs at once and managing them

- Effective management and exploitation of all available computing resources

Harnessing as many computing resources as possible is inevitable- Grids, Supercomputers and Clouds are

available to the scientific community

Our Approach- High-Throughput Computing As a Service

(HTCaaS) for scientific computing- Meta-Job based automatic job split &

submission (e.g., parameter sweeps)- User-level job scheduling- Pluggable interface to heterogeneous

computing resources- Application independent

CloudSupercomputersGrids Desktop Grids

HTCaaS server automatically produce a large amount of independent jobs

HTC Problems(Large amounts of computing

power for lengthy period)

ApplicationClient

ApplicationClient

ApplicationClient

Web Service Interface for Clients

Pluggable Resource Interface

Design Philosophy- Ease of Use: minimize user overhead for

handling a large amount of jobs & resources- Intelligent Resource Selection: automatic

selection of more responsive and effective resources

- Pluggable Interface to Resources: adopt GANGA’s plugin mechanism for accessing different resources without hardcoding

- Support for Many Client Interfaces: a wide range of client interfaces are supported including a native WS-interface, Java API, CLI and GUI

WS-Interface

User Data ManagerUser Data Manager

Job ManagerJob Manager Monitoring Manager

Monitoring Manager

Agent ManagerAgent

ManagerAgent

SubmissionEngine

Application Client(Command Line, GUI)

Application Client(Command Line, GUI)

Grids

Agent

Supercomputers Cloud

Data Manager API

Account ManagerAccount Manager

Input/outputdata

JobInformation

User Accountdata

Job Queue

Job Manager API Monitoring APIAccount Manager API

System Architecture- Powerful Meta-job description based on the

OGF JSDL Parameter Sweep Extension Specification

- Agent-based multi-level scheduling & streamlined job dispatching

- Service-Oriented Architecture (SOA) based on WS-Interface

Job Submission & Execution Steps1) User logins HTCaaS and uploads input data

through User Data Manager2) User submits a Meta-Job which can be

composed of multiple tasks3) HTCaaS automatically divides a Meta-Job

into multiple tasks based on the specification4) Agent Manager dispatches agents based on

job requirements and resource availability5) Agents proactively request tasks and process 6) Finished results are stored and notified to

the user

AgentManager

AgentManager

Agent submission

Agent submission

Monitoring Manager

Monitoring Manager

User Data Manager

User Data Manager

Job ManagerJob ManagerJob Queue

Account ManagerAccount Manager Certificate/Confirm

> htcaas

Agent

Agent

Agent

Agent

Agent

Grid

Agent

Clouds

AgentAgent

Agent

PLSI Supercomputers

JSDL:Parameter Sweep Job Extension

Job Request &Dispatch

Job ExecutionMonitoring

Current Status & Future Work- Completed testing of integration with Grids,

and Cloud- Seamless integration with PLSI (Partnership &

Leadership for the nationwide Supercomputing Infrastructure)in Korea

- System Scalability & Fault tolerance- Improving User-level Job Scheduling

Utilizing Cloud for Dynamic & On-demand Resource Provisioning

- Constructing a hybrid scientific cloud infrastructure for HTC on the fly

HTCaaS ServerHTCaaS Server

Amazon EC2 Instance Pool

Agent Manager

Agent Submission

Amazon EC2 EBS

Snapshot Images

1. Request Cloud Instance

WS-API

2. Retrieve image

3. Create instance

Agent

….

Job ManagerJob Manager

Job Queue

User Data Manager

5. Request Job

6. Execute Job

7. Store Job Results

Applications Support- Virtual Screening (Docking)

A target protein of 3CL-pro of SARS with 1.1 Million chemical compounds

- 3D Visualization of Optimized Design SolutionOn average 500 CPU utilized, Completed in 2.8 days, totally 2.6 years of computation

Agent Agent

Agent

Agent

GridsVolunteer

ComputingSupercomputerClouds

[Figure 1: HTCaaS Concept. Bridging the various HTC applications and heterogeneous computing resources]

[Figure 2: HTCaaS System Architecture. Jobs & input/output data are managed by Job Manager and User Data Manager. Agents are dispatched from Agent Manager and process jobs

in Grids, Supercomputers and Clouds]

[Figure 3: Job Submission & Execution Steps in HTCaaS]

[Figure 4: HTCaaS in the Cloud (Amazon EC2)]

[Figure 5: HTCaaS Target Applications]