htcaas: a large-scale high-throughput computing by...
TRANSCRIPT
HTCaaS: A Large-Scale High-Throughput Computing by Leveraging Grids, Supercomputers and Cloud
Seungwoo Rho, Seoyoung Kim, Sangwan Kim, Seokkyoo Kim, Jik-Soo Kim and Soonwook HwangSupercomputing Center at the Korea Institute of Science and Technology Information (KISTI)
High-Throughput Computing
- Many loosely coupled tasks requiring a large amount of computing power
- Independent, sequential jobs that can be scheduled on many different computing resources
- Growing in the number of jobs and complexity drives HTC area into Many-Task Computing (MTC)
Hard Problems / Issues- Hiding heterogeneity and complexity of
leveraging different computing resources from users
- Efficiently submitting a large number of jobs at once and managing them
- Effective management and exploitation of all available computing resources
Harnessing as many computing resources as possible is inevitable- Grids, Supercomputers and Clouds are
available to the scientific community
Our Approach- High-Throughput Computing As a Service
(HTCaaS) for scientific computing- Meta-Job based automatic job split &
submission (e.g., parameter sweeps)- User-level job scheduling- Pluggable interface to heterogeneous
computing resources- Application independent
CloudSupercomputersGrids Desktop Grids
HTCaaS server automatically produce a large amount of independent jobs
HTC Problems(Large amounts of computing
power for lengthy period)
ApplicationClient
ApplicationClient
ApplicationClient
Web Service Interface for Clients
Pluggable Resource Interface
Design Philosophy- Ease of Use: minimize user overhead for
handling a large amount of jobs & resources- Intelligent Resource Selection: automatic
selection of more responsive and effective resources
- Pluggable Interface to Resources: adopt GANGA’s plugin mechanism for accessing different resources without hardcoding
- Support for Many Client Interfaces: a wide range of client interfaces are supported including a native WS-interface, Java API, CLI and GUI
WS-Interface
User Data ManagerUser Data Manager
Job ManagerJob Manager Monitoring Manager
Monitoring Manager
Agent ManagerAgent
ManagerAgent
SubmissionEngine
Application Client(Command Line, GUI)
Application Client(Command Line, GUI)
Grids
Agent
Supercomputers Cloud
Data Manager API
Account ManagerAccount Manager
Input/outputdata
JobInformation
User Accountdata
Job Queue
Job Manager API Monitoring APIAccount Manager API
System Architecture- Powerful Meta-job description based on the
OGF JSDL Parameter Sweep Extension Specification
- Agent-based multi-level scheduling & streamlined job dispatching
- Service-Oriented Architecture (SOA) based on WS-Interface
Job Submission & Execution Steps1) User logins HTCaaS and uploads input data
through User Data Manager2) User submits a Meta-Job which can be
composed of multiple tasks3) HTCaaS automatically divides a Meta-Job
into multiple tasks based on the specification4) Agent Manager dispatches agents based on
job requirements and resource availability5) Agents proactively request tasks and process 6) Finished results are stored and notified to
the user
AgentManager
AgentManager
Agent submission
Agent submission
Monitoring Manager
Monitoring Manager
User Data Manager
User Data Manager
Job ManagerJob ManagerJob Queue
Account ManagerAccount Manager Certificate/Confirm
> htcaas
Agent
Agent
Agent
Agent
Agent
Grid
Agent
Clouds
AgentAgent
Agent
PLSI Supercomputers
JSDL:Parameter Sweep Job Extension
Job Request &Dispatch
Job ExecutionMonitoring
Current Status & Future Work- Completed testing of integration with Grids,
and Cloud- Seamless integration with PLSI (Partnership &
Leadership for the nationwide Supercomputing Infrastructure)in Korea
- System Scalability & Fault tolerance- Improving User-level Job Scheduling
Utilizing Cloud for Dynamic & On-demand Resource Provisioning
- Constructing a hybrid scientific cloud infrastructure for HTC on the fly
HTCaaS ServerHTCaaS Server
Amazon EC2 Instance Pool
Agent Manager
Agent Submission
Amazon EC2 EBS
Snapshot Images
1. Request Cloud Instance
WS-API
2. Retrieve image
3. Create instance
Agent
….
Job ManagerJob Manager
Job Queue
User Data Manager
5. Request Job
6. Execute Job
7. Store Job Results
Applications Support- Virtual Screening (Docking)
A target protein of 3CL-pro of SARS with 1.1 Million chemical compounds
- 3D Visualization of Optimized Design SolutionOn average 500 CPU utilized, Completed in 2.8 days, totally 2.6 years of computation
Agent Agent
Agent
Agent
GridsVolunteer
ComputingSupercomputerClouds
[Figure 1: HTCaaS Concept. Bridging the various HTC applications and heterogeneous computing resources]
[Figure 2: HTCaaS System Architecture. Jobs & input/output data are managed by Job Manager and User Data Manager. Agents are dispatched from Agent Manager and process jobs
in Grids, Supercomputers and Clouds]
[Figure 3: Job Submission & Execution Steps in HTCaaS]
[Figure 4: HTCaaS in the Cloud (Amazon EC2)]
[Figure 5: HTCaaS Target Applications]