the naregi resource management framework satoshi matsuoka professor, global scientific information...

The NAREGI Resource Management Framework

Satoshi MatsuokaProfessor, Global Scientific Information and

Computing Center, Deputy Director, NAREGI Project

Tokyo Institute of Technology / NII

National Research Grid Infrastructure (NAREGI) 2003-2007

• Petascale Grid Infrastructure R&D for Future Deployment– $45 mil (US) + $16 mil x 5 (2003-2007) = $125 mil total– Hosted by National Institute of Informatics (NII) and Institute of Molecular Science (IMS)– PL: Ken Miura (FujitsuNII)

• Sekiguchi(AIST), Matsuoka(Titech), Shimojo(Osaka-U), Aoyagi (Kyushu-U)…– Participation by multiple (>= 3) vendors, Fujitsu, NEC, Hitachi, NTT, etc.– Follow and contribute to GGF Standardization, esp. OGSA

AIST

Grid MiddlewareGrid Middleware

SuperSINETSuperSINET

Grid R&D Infrastr.Grid R&D Infrastr.15 TF-100TF15 TF-100TF

Grid and Grid and NetworkNetwork

ManagementManagement

““NanoGrid”NanoGrid”IMS ~10TFIMS ~10TF

(BioGrid(BioGridRIKEN)RIKEN)

OtherOtherInst.Inst.

National ResearchNational ResearchGrid Middleware R&DGrid Middleware R&D

NanotechNanotechGrid AppsGrid Apps

(Biotech(BiotechGrid Apps)Grid Apps)

(Other(OtherApps)Apps)

Titech

Fujitsu

NECOsaka-U

U-Kyushu Hitachi

Focused “Grand Challenge” Grid Apps Areas

U-Tokyo

Ind

us

try

/So

cie

tal F

ee

db

ac

k

Inte

rna

tio

na

l In

fra

str

uct

ura

l Co

llab

ora

tio

n

●

Restructuring Univ. IT Research ResourcesExtensive On-Line Publications of Results

Management Body / Education & Training

Deployment of NAREGI Middleware

Virtual LabsLive Collaborations

Towards a Cyber-Science Infrastructure for R & D

UPKI: National Research PKI Infrastructure

★

★

★★★

★★

☆

SuperSINET and Beyond: Lambda-based Academic Networking Backbone

Cyber-Science Infrastructure （ CSI)

Hokkaido-U

Tohoku-U

Tokyo-UＮＩＩ

Nagoya-U

Kyoto-U

Osaka-U

Kyushu-U

（ Titech, Waseda-U, KEK, etc.)

NAREGIOutput

GeNii (Global Environment forNetworked Intellectual Information)

NII-REO (Repository of ElectronicJournals and Online Publications

Participating Organizations

• National Institute of Informatics-NII– (HQ + Center for Grid Research & Development)

• Institute for Molecular Science-IMS– (Center for Computational Grid Nano‐science Applications)

• Universities and National Labs (Leadership & Deployment)– AIST, Titech, Osaka-u, Kyushu-u, etc. => Leadership– National Centers => Customers, Test Deployment– Collaborating Application Labs (w/IMS)

• Developer Vendors (Workforce)– Fujitsu, NEC, Hitachi, Somu, NTT, etc. => ~150FTEs

• Application Vendors– “Consortium for Promotion of Grid Apps in Industry”

FY2003 FY2004 FY2005 FY2006 FY2007

UNICORE/Globus/Condor-based R&D Framework

OGSA/WSRF-based R&D Framework

Utilization of NAREGI NII-IMS Testbed Utilization of NAREGI-Wide Area Testbed

PrototypingNAREGI Middleware

Components

Development andIntegration of

αrelease

Development and Integration of

βrelease

Evaluation on NAREGI Wide-area

Testbed Development ofOGSA-based Middleware Verification

& EvaluationOf Ver. 1

Apply ComponentTechnologies toNano Apps and

Evaluation

Evaluation of αrelease in NII-IMS

Testbed

Evaluation of βReleaseBy IMS and other

Collaborating Institutes

Deployment of β

αRelease(private)

βRelease(public, Ma

y 2006)

Version1.0

Release

Mid

po

int

Eva

lua

tion

NAREGI Middleware Roadmap

NAREGI Software Stack (Beta Ver. 2006)

Computing Resources

NII IMS Research Organizations etc

（（ Globus,Condor,UNICOREGlobus,Condor,UNICOREOGSA / WSRF)OGSA / WSRF)

SuperSINET

Grid-Enabled Nano-Applications

Grid PSEGrid 　 Programing

-Grid RPC -Grid MPI

Grid Visualization

Grid VM

Packag

ing

DistributedInformation Service

Grid Workflow

Super Scheduler

High-Performance & Secure Grid Networking

Data

• Work Package Structure (~150 FTEs): – Universities and National Labs: technology leadership– Vendors (Fujitsu, NEC, Hitachi, etc.): professional develop

ment

• WP-1: Resource Management:– Matsuoka(Titech), Saga (NII), Nakada (AIST/Titech)

• WP-2: Programming Middleware:– Sekiguchi(AIST), Ishikawa(U-Tokyo), Tanaka(AIST)

• WP-3: Application Grid Tools:– Usami (NII), Kawata(Utsunomiya-u)

• WP-4: Data Management (new FY 2005, Beta):– Matsuda (Osaka-U)

• WP-5: Networking & Security– Shimojo(Osaka-u), Kanamori (NII), Oie( Kyushu Tech.)

• WP-6: Grid-enabling Nanoscience Appls– Aoyagi(Kyushu-u)

R&D in Grid Software and Networking Area (Work Packages)

NAREGI Programming Models• High-Throughput Computing

– But with complex data exchange inbetween– NAREGI Workflow or GridRPC

• Metacomputing (cross-machine parallel)– Workflow (w/co-scheduling) + GridMPI– GridRPC (for task-parallel or task/data-parallel)

• Coupled Multi-Resolution Simulation– Workflow (w/co-scheduling) + GridMPI + Coupling Co

mponents• Mediator (coupled simulation framework)• GIANT (coupled simulation data exchange framework)

Nano-Science : coupled simluations on the Grid as the sole future for true scalability

The only way to achieve true scalability!

… between Continuum & Quanta.

10 -6 10-9 m

Material physics 的 (Infinite system) ・ Fluid dynamics ・ Statistical physics ・ Condensed matter theory　　…

Molecular Science ・ Quantum chemistry ・ Molecular Orbital method ・ Molecular Dynamics　　 …

Multi-Physics

Limit of Computing Capability

Limit of Idealization

Coordinates decoupled resources;

　　 Meta-computing, High throughput computing,

　 Multi-Physics simulationw/ components and data from different groups

within VO composed in real-time

Old HPC environment:・ decoupled resources,・ limited users,・ special software, ...

NAREGI Alpha Middleware (2004-5)

1. 　 Resource Management （ WP1 ）• Unicore/Globus/Condor as “Skeletons” and underlying service provider• OGSA-EMS Job Management, Brokering, Scheduling• Coupled Simulation on Grid (Co-Allocation/Scheduling)• WS(RF)-based Monitoring/Accounting Federation

2. 　 Programming (WP2 ）• High-Performance Standards-based GridMPI （ MPI-1/2, IMPI)

• Highly tuned for Grid environment (esp. collectives)• Ninf-G: Scientific RPC for Grids

3. 　 Applications Environment （ WP3 ）• Application Discovery and Installation Management （ PSE ）• Workflow Tool for coupled application interactions• WSRF-based Large Scale Grid-based Visualization

4. 　 Networking and Security （ WP5 ）• Production-Grade CA• Unicore and GSI security federation• Out-of-band Real-time network monitor/control

5 ． Grid-enabling Nano Simulations （ WP6)• Framework for coupled simulations

　　

First OGSA-EMS based impl. In the world

WSRF-based Terabyte interactive

visualization

Drop-in Replacement for SimpleCA

Large scale coupled simluation of proteins in a s

olvent on a Grid

GridRPC GGF Programming Standard

DistributedServers

LifeCycle of Grid Apps and Infrastructure

MetaComputing

Workflows and Coupled Apps / User

Many VO Users

Application Contents Service

HL WorkflowNAREGI WFML

Dist. Grid Info Service

GridRPC/Grid MPI UserApps

UserApps

UserApps

VO Application Developers&Mgrs

GridVM GridVMGridVM

　 SuperScheduler

Data 1 Data 2 Data n

Meta-data

Meta-data

Meta-data

Grid-wide Data Management Service (GridFS, Metadata, Staging, etc.)

Place & register data on the Grid

Assign metadata to data

NAREGI DistributedInformation Service

PSE

Registration & Deployment of Applications(OGSA-ACS)

Application Pool

⑦RegisterDeployment Info.

Server#1 Server#2 Server#3

Compiling OK!

Test Run NG! Test Run OK!Test Run OK!

ResourceInfo.

③Compile

Application Summary Program Source Files Input Files Resource Requirementsetc.

ApplicationDeveloper

⑥Deploy

④Send back Compiled

Application Environment

①Register Application②Select Compiling Host

⑤Select Deployment Host

Computing ResourceComputing ResourceComputing ResourceComputing Resource

GridVM(OGSA-SC)

GridVM(OGSA-SC)

Accounting

CIM

UR/RUS

GridVMGridVM

ResourceInfo.

Reservation, Submission,Query, Control…

ClientClient

ConcreteJSDL

ConcreteJSDL

Workflow

AbstractJSDL

Dynamic Resource and Exec. Mgmt. (OGSA-EMS + NAREGI

Extensions)

SuperScheduler

(OGSA-RSS)

DistributedInformation Service

(CIM)

NetworkInfo.

NetworkMonitoring & Control

DAI

ResourceQuery

Reservation basedCo-Allocation

CIM

SC

WFML2BPEL

SS

NAREGI JM(SS) 　 Java I/F module

NAREGI-WP3 WorkFlowTool

JM-Client

SubmitStatusDeleteCancel

CES

PBS, LoadLeveler

EPS

CSG

IS

RS

Cancel StatusSubmit

BPEL2WFST

CreateActivity(FromBPEL)GetActivityStatusControlActivity

CES

OGSA-DAI

CIM

　　　　 DB: 　　　 PostgreSQL

GenerateSQL QueryFromJSDL

SelectResourceFromJSDL MakeReservation

CancelReservation

S R

GRAM4

GridVM

AGG-SC

FTS-SC

Globus-url-copy

CES-SC

Uber-ftp

CES

CES

S

SR

R

CES-SC

GRAM4 specificGRAM4 specific

SCSC

CES-SC

Co-allocation/MPI Job

File Transfer

Simple Job

CreateActivity(FromJSDL)GetActivityStatusControlActivity

MakeReservationCancelReservation

GenerateCandidateSet

delete

JM (BPEL Engine)

S

NAREGI Execution Management

“Under the Hood”AbbreviationSS:JSDL: JM:EPS:SG:RS:IS:SC:AGG-SC:CES-SC:FTS-SC:BES:CES:CIM:

Super SchedulerJob Submission Description LanguageJob ManagerExecution Planning ServiceCandidate Set GeneratorReservation ServiceInformation ServiceService ContainerAggregate SCCo-allocation Execution SCFile Transfer Service-SCBasic Execution ServiceCo-allocation Execution ServiceCommon Information Model

JSDL

JSDL

JSDL

JSDL

JSDL

JSDL

JSDL

JSDL

JSDL JSDL

JSDL

JSDL

JSDL JSDL

JSDL

NAREGI-WFML

BPEL (include JSDL)

Invoke EPS

Invoke CES

JSDLRSL

JSDL

GRAM4 specific

OGSA-RUS

SC(OGSA)

■ Resource requirement: 2 OS types

OS: Linux# of CPUs: 12(# of MPI tasks)

# of nodes: 6Physical Memory: >= 32MB/CPUWalltime limit: 8 min…

OS: AIX# of CPUs: 8 (# of MPI tasks)

# of nodes: 1Physical Memory: >= 256MB/CPUWalltime limit: 8 min…

Linux Cluster4CPU, 2nodes

Linux PC2CPU, 1node



AIX server32CPU, 1node

Resources

Coupled job requirement

AttributesAttributes DescriptionsDescriptions/naregi-jsdl:SubJobID Subjob ID

/naregi-jsdl:CPUCount # of total CPUs

/naregi-jsdl:TasksPerHost # of tasks per host

/naregi-jsdl:TotalTasks # of total MPI tasks

/naregi-jsdl:NumberOfNodes # of nodes

/naregi-jdsl:CheckpointablePeriod Check point period

/jsdl:PhysicalMemory Physical memory for each process

/jsdl:ProcessVirtualMemoryLimit Virtual memory for each process

/naregi-jsdl:JobSatrtTrigger Job start time

/jsdl:WallTimeLimit Wall time limit

/jsdl:CPUTimeLimit CPU time limit

/jsdl:FileSizeLimit Maximum file size

/jsdl:Queue Queue name

NAREGI extensions for parallel jobs

NAREGI extension for co-allocation

■ Job Submission with JDSL

JSDL: GGF Job Submission Description Language and NAREGI Extensions

Meta computing scheduler is required to allocate and to execute jobs on multiple sites simultaneously.

The super scheduler negotiates with local RSs on job execution time and reserves resources which can execute the jobs simultaneously.

3, 4: Co-allocation and Reservation

ReservationService

ServiceContainer

LocalRS 1

ServiceContainer

LocalRS 2

Cluster (Site) 1

Cluster (Site) 2

Execution Planning Services

with Meta-Scheduling

Candidate Set Generator

Local RS #:Local Reservation Service #

AbstractJSDL(10)

①

AbstractJSDL(10)

②Candidates: Local RS 1 EPR (8) Local RS 2 EPR (6)

④

ConcreteJSDL(8)

(1) 14:00- (3:00)

++ConcreteJSDL(2)

(3)Local RS1 Local RS2 (EPR)

(4)Abstract Agreement Instance EPR

(2)

⑥

ConcreteJSDL(8)

15:00-18:00⑦

ConcreteJSDL(2)

15:00-18:00

⑨

Super SchedulerSuper Scheduler

GridVMGridVM

Distributed Information ServiceDistributed Information Service

Distributed Information Service

③

ConcreteJSDL(8)

⑧

create an agreement instance

ConcreteJSDL(2)

⑩


AbstractJSDL(10)


⑤

⑪

NAREGI Grid Information Service Overview

DistributedInformation ServiceClient

(Resource Broker etc.)

Clientlibrary

APP

Java-API

DecodeService

ORDB Light-weightCIMOMService

AggregateService

NotificationOS

Processor

File System

CIM Providers

●　●

RUSport

GWSDL(RUS::insert)

GridVM(Chargeable

Service)

Job Queue

Cell Domain Information Service

Node B

Node A

Node C

ACL

GridVM

Cell Domain Information Service

Information Service Node Information Service Node

… Redundancy for FT in ‘2005

… Hierarchical filtered aggregation... Distributed Query …

PerformanceGanglia

OGSA-DAI

Gridmap SS/RB,GridVM

ViewerUserAdmin.

Sample: CIM Schema and NAREGI Extensions for GGF Usage Record

1

Bold existed for useBold added for use*

0..1

*

ManagedElement（ SeeCoreModel)

ManagedSystemElement（ See Core Model)

LogicalElement（ See Core Model)

EnabledLogicalElement（ See Core Model)

Job

String Owner

datetime StartTimedatetime ElapsedTime

...

H_UsageRecordString KeyInfo;

String recordID {key};String createTime;

H_UsageRecordProcessors Int processorUsage;

String description{key}; Int metric;

float comsumptionRate;

H_UsageRecordDisk Int diskStorage;

String description{key}; String volumeUnit; String phaseUnit;

Int metric; int type;

H_UsageRecordMemory Int memoryUsage;



H_UsageRecordSwap Int swapUsage;



H_UsageRecordTimeDuration String timeDuration;

H_UsageRecordTimeInstant String timeInstant;

String description{key};

StatisticalDataInstanceID:string{key}ElementName:string

ElementStatisticalData

JobUsageRecord

ConcreteJobString InstanceID {key};

String Name;

BatchJobString JobID

int MaxCPUTime

uint16 JobState

datetime TimeCompletedString JobOrigination

String Task

H_UsageRecordPropertyDifferentiated

Int number {key};

H_UsageRecordIDString globalJobI

dString globalUserID

String JobStatusdatetime TimeSubmitted

uint32 JobRunTimes

datetime UntilTimeString Notify;

uint32 Priorityuint PercentComplete

datetime TimeOfLastStateChange

boolean DeleteOnCompletionuint16 ErrorCode

...

String storeSubjectName;String storeTimeStamp;

String description{key};

H_UsageRecordCPUDuration int type;

H_UsageRecordNodeCount Int nodeUsage;

String description{key}; Int metric;

H_UsageRecordNetwork Int networkUsage;

String description{key}; Int units; int metric;

JobDestination CreationClassName: string { key}

Name: string {override, key}

JobQueue QueueStatus: uint16 {enum} QueueStatusInfo: string DefaultJobPriority: uint32 MaxTimeOnQueue: datatime MaxJobsOnQueue: uint32

MaxJobCPUTime: uint32

System

ComputerSystem NameFormat: {override, enum} OtherIdentifyingInfo: string [ ] IdentifyingDescriptions: string [ ] Dedicated: uint16 [ ] {enum} ResetCapability: uint16 {enum}

PowerManagementCapabilities: uint16[ ] {D, enum} SetPowerState ( [IN] PowerState: uint32 {enum} [IN] Time: datetime) : uint32 {D}

HostedJobDestinationCreationClassName: string { key}

Name: string {override, key}

NameFormat: string

PrimaryOwnerName: string

PrimaryOwnerContact: string

Roles: string[]

JobDestination　　　 Jobs

UnitaryComputerSystem InitialLoadInfo: string LastLoadInfo: string WakeUpTypes: uint16 {enum}

ComponentCS

Bold new in ‘04

Usage Record Schema (GGF/UR-WG)

SystemPartition

Virtual execution environment on each site• Job execution services• Resource management services• Secure and isolated environment

Virtual execution environment on each site• Job execution services• Resource management services• Secure and isolated environment

Resource

info.

SuperScheduler

SuperScheduler

GridVMGridVM

InformationService

InformationService

Accounting

CIM

UR/RUS

GridVMGridVM

ResourceInfo.

Reservation, Submission,Query, Control…

WorkFlowTool

WorkFlowTool

ConcreteJSDL

ConcreteJSDL

AbstractJSDL

WSRF service

GridVM: OGSA-EMS Service Container for Local Job submissions & Mgmt

Platform independence as OGSA-EMS SC• WSRF OGSA-EMS Service Container interface for heterogeneous platforms and local schedulers• “Extends” Globus4 WS-GRAM • Job submission using JSDL• Job accounting using UR/RUS• CIM provider for resource information

Meta-computing and Coupled Applications• Advanced reservation for co-Allocation

Site Autonomy•WS-Agreement based job execution•XACML-based access control of resource usage

Virtual Organization (VO) Management• Access control and job accounting based on VOs (VOMS & GGF-UR)

GridVM Features

GridVM Architecture (β Ver.)

GridVMscheduler

RFT File Transfer

Local scheduler

GridVMEngine

Upper layerM/W+BES API

globusrun RSL+JSDL’

Delegate

Transferrequest

GRAMservices

DelegationScheduler

Event Generator

GRAMAdapter

GridVMJobFactory

GridVMJob

SUDO

Extension Service

Basic job management＋ Authentication, Authorization

Site

Combined as an extension module to GT4 GRAM Future Compliance and Extensions to OGSA-BES

Combined as an extension module to GT4 GRAM Future Compliance and Extensions to OGSA-BES

sandboxing


Grid-wide File System

MetadataManagement

Data Access Management

Data ResourceManagement

Job 1

Meta-data

Meta-data

Data 1

Grid Workflow

Data 2 Data n

NAREGI Data Grid beta (WP4)

Job 2 Job n

Meta-data

Job 1

Grid-wide Data Sharing Service

Job 2

Job n

Data Grid Components

Import data into workflow

Place & register data on the Grid

Assign metadata to data

Store data into distributed file nodes


Gfarm 1.2 PL4

Data Access Management

NAREGI WP5 Details

Job 1

Data SpecificMetadata DB

Data 1 Data n

Job 1 Job nImport data into workflow

Job 2

Computational Nodes

FilesystemNodes

Job n

Data ResourceInformation DB

OGSA-DAI WSRF2.0

OGSA-DAI WSRF2.0

Globus Toolkit 4.0.1

Tomcat5.0.28

PostgreSQL 8.0

MySQL 5.0

Workflow (WFML =>

BPEL+JSDL)

Super Scheduler (SS) (OGSA-RSS)

Data Staging

Place data on the Grid

Data Resource Management

Metadata Construction

OGSA-RSSFTS SC

GridFTP

GGF-SRM (beta2)

Highlights of NAREGI Beta (May 2006, GGF17/GridWorld)

• Professionally developed and tested• “Full” OGSA-EMS incarnation

– Full C-based WSRF engine (Java -> Globus 4)– OGSA-EMS/RSS WSRF components– Full WS-Agreement brokering and co-allocation– GGF JSDL1.0-based job submission, authorization, etc.– Support for more OSes (AIX, Solaris, etc.) and BQs

• Sophisticated VO support for identity/security/monitoring/accounting (extensions of VOMS/MyProxy, WS-* adoption)

• WS- Application Deployment Support via GGF-ACS• Comprehensive Data management w/Grid-wide FS• Complex workflow (NAREGI-WFML) for various coupled simulatio

ns• Overall stability/speed/functional improvements• To be interoperable with EGEE, TeraGrid, etc. (beta2)

～ 3000 CPUs～ 17 Tflops

Center for GRID R&D(NII)

～ 5 Tflops

ComputationalNano-science Center(IMS)

～ 10 Tflops

Osaka Univ.BioGrid

TiTechCampus Grid

AISTSuperCluster

ISSPSmall Test App

Clusters

Kyoto Univ.Small Test App

Clusters

Tohoku Univ.Small Test App

Clusters

ＫＥＫSmall Test App

Clusters

Kyushu Univ.Small Test App

Clusters

AISTSmall Test App

Clusters

NAREGI 　 Phase 1 Testbed

Super-SINET(10Gbps MPLS)

Dedicated TestbedNo “ballooning” w/production resourcesProduction to reach > 100Teraflops

Computer System for Grid Software Infrastructure R & DCenter for Grid Research and Development （５Ｔ flops ， 700GB)

　 File Server (PRIMEPOWER 900 ＋ ETERNUS3000 　　　　　　　　　　　　　　　＋ ETERNUS LT160 ） (SPARC64V1.3GHz)(SPARC64V1.3GHz)

1node1node ／ 8CPU

SMP type Compute Server (PRIMEPOWER HPC2500 ）　

1node (UNIX, 1node (UNIX, SPARC64V1.3GHz/64CPU)SPARC64V1.3GHz/64CPU)

SMP type Compute Server (SGI Altix3700 ）

1node 1node (Itanium2 1.3GHz/32CPU)(Itanium2 1.3GHz/32CPU)

SMP type Compute Server (IBM pSeries690 ）

1node 1node (Power4(Power4 1.3GHz 1.3GHz/32CPU)/32CPU)

InfiniBand 4X(8Gbps)

InfiniBand 4X (8Gbps)

Distributed-memory typeCompute Server(HPC LinuxNetworx ）　

GbE (1Gbps)

Distributed-memory type Compute Server(Express 5800 ）　

GbE (1Gbps)

GbE (1Gbps)

GbE (1Gbps)

SuperSINET

High Perf. Distributed-memory 　 Type Compute Server 　 (PRIMERGY RX200 ）

High Perf. Distributed-memory type Compute Server 　 (PRIMERGY RX200 ）　 Memory 130GB

Storage 　 9.4TB　

Memory 130GBStorage 　 9.4TB　

128CPUs128CPUs(Xeon, 3.06GHz)(Xeon, 3.06GHz)+Control Node+Control Node

Memory 　 65GBStorage 　 9.4TB　


128 CPUs128 CPUs(Xeon, 3.06GHz)(Xeon, 3.06GHz)+Control Node+Control Node

Memory 65GBStorage 4.7TB　








128 CPUs 128 CPUs (Xeon, 2.8GHz)(Xeon, 2.8GHz)+Control Node+Control Node


128 CPUs(Xeon, 2.8GHz)+Control Node128 CPUs(Xeon, 2.8GHz)+Control Node

Distributed-memory type Compute Server (Express 5800 ）　　


Distributed-memory typeCompute Server(HPC LinuxNetworx ）　

Ext. NW

Intra NW-AIntra NW

Intra NW-B

L3 SWL3 SW

1Gbps1Gbps

(upgradable (upgradable

To 10Gbps)To 10Gbps)

L3 SWL3 SW

1Gbps1Gbps

(Upgradable (Upgradable

to 10Gbps)to 10Gbps)

Memory 　　 16GB　 Storage 　　 10TB　 Back-up 　 Max.36.4TB

Memory 　　 16GB　 Storage 　　 10TB　 Back-up 　 Max.36.4TB

Memory 128GBStorage 441GBMemory 128GBStorage 441GB



SMP type Computer Server

Memory 　　 3072GB　 Storage 　　 2.2TB Memory 　　 3072GB　 Storage 　　 2.2TB

Distributed-memory type Computer Server(4 units)818 CPUs(Xeon, 3.06GHz)+Control Nodes818 CPUs(Xeon, 3.06GHz)+Control NodesMyrinet2000 　 (2Gbps)

Memory 1.6TBStorage 　 1.1TB/unitMemory 1.6TBStorage 　 1.1TB/unit

File Server 16CPUs (SPARC64 GP, 675MHz)(SPARC64 GP, 675MHz)

Memory 　 8GBStorage 　　 30TBBack-up 　　 25TB 　

Memory 　 8GBStorage 　　 30TBBack-up 　　 25TB 　

5.4 TFLOPS 5.0 TFLOPS16ways-50nodes (POWER4+ 1.7GHz)16ways-50nodes (POWER4+ 1.7GHz)Multi-stage Crossbar NetworkMulti-stage Crossbar Network

L3 SWL3 SW

1Gbps1Gbps

(Upgradable to 10(Upgradable to 10Gbps)Gbps)

VPN

Firewall

CA/RA Server

SuperSINETCenter for Grid R & D

Front-end Server

Computer System for Nano Application R & DComputational Nano science Center （ 10 Ｔ flops ， 5TB)

Front-end Server

Hokkaido UniversityInformation Initiative Center

HITACHI SR110005.6 Teraflops

Tohoku UniversityInformation Synergy Center

NEC SX-7NEC TX7/AzusA

University of TokyoInformation Technology Center

HITACHI SR8000HITACHI SR11000 6 TeraflopsOthers (in institutes)

Nagoya UniversityInformation Technology Center

FUJITSU PrimePower250011 Teraflops

Osaka UniversityCyberMedia Center

NEC SX-5/128M8HP Exemplar V2500/N1.2 Teraflops

Kyoto UniversityAcademic Center for Computing and Media Studies FUJITSU PrimePower2500

10 Teraflops

Kyushu UniversityComputing and Communications Center

FUJITSU VPP5000/64IBM Power5 p5955 Teraflops

University Computer Centers(excl. National Labs) circa Spring 2006

10Gbps SuperSINET Interconnecting the Centers

Tokyo Inst. TechnologyGlobal Scientific Informationand Computing Center

NEC SX-5/16, Origin2K/256

HP GS320/64 80~100Teraflops 2006

University of Tsukuba

FUJITSU VPP5000CP-PACS 2048 (SR8000 proto)

National Inst. of InformaticsSuperSINET/NAREGI Testbed17 Teraflops

~60 SC Centers in Japan

10Petaflop center by 2011

External Grid

Connectivity

External Grid

Connectivity

NEC/Sun Campus Supercomputing Grid: Core Supercomputer Infrastructure @ Titech GSIC - operational late Spring 2006 -

24+24Gbpsbidirectional

FileServer FileServer

C C

Storage BNEC iStorage S1800AT

Phys. Capacity 96TB RAID6All HDD, Ultra Reliable

200+200Gbpsbidirectional

2-1

External 10Gbps

Switch Fabric ・・・42 units

Storage Server ASun Storage (TBA), all HDD

Physical Capacity 1PB, 40GB/s

500GB48disks

500GB48disks

InfiniBand Network Voltaire ISR 9288 x 6

1400Gbps Unified & Redundunt Interconnect

By Spring 2006 deployment, planned ClearSpeed extension to 655 nodes, (1 CSX600 per SunFire node)

+a Small SX-8i> 100 TeraFlops (50 Scalar + 60 SIMD-Vector)Approx. 140,000 execution units, 70 cabinets

max config ~4500 Cards, ~900,000 exec. units,500Teraflops possible, Still 655 nodes, ~1MW

500GB48disks

ClearSpeed CSX600Initially 360nodes

96GFlops/Node

35TFlops (Peak)

Total 1.1PB

SunFire (TBA)655nodes16CPU/node

10480CPU/50TFlops (Peak)Memory: 21.4TB

NAREGI is/has/will…• Is THE National Research Grid in Japan

– Part of CSI and future Petascale initiatives– METI “Business Grid” counterpart 2003-2005

• Has extensive commitment to WS/GGF-OGSA– Entirely WS/Service Oriented Architecture– Set industry standards e.g. 1st impl. of OGSA-EMS

• Will work with EU/US/AP counterparts to realize a “global research grid”– Various talks have started, incl. SC05 interoperability meeting

• Will deliver first OS public beta in May 2006– To be distributed @ GGF17/GridWorld in Tokyo

NAREGI is not/doesn’t/won’t…• Is NOT an academic research project

– All professional developers from Fujitsu, NEC, Hitachi, NTT, …

– No students involved in development

• Will NOT develop all software by itself– Will rely on external components in some cases– Must be WS and other industry standards compliant

• Will NOT deploy its own production Grid– Although there is a 3000-CPU testbed– Work with National Centers for CSI deployment

• Will NOT hinder industry adoption at all costs– Intricate open source copyright and IP policies– We want people to save/make money using NAREGI

MW

Summary NAREGI Middleware Objectives Different Work Package deliverables implement Grid Services that combine to provide the followings:

・ Allow users to execute complex jobs with various interactions on resources across multiple sites on the Grid

・ E.g., nano-science multi-physics coupled simulations w/execution components & data spread across multiple groups within the nano-VO

・ Stable set of middleware to allow scalable andsustainable operations of centers asresource and VO hosting providers

・ Widely adopt and contribute to grid standards,and provide open-source reference implementations, for GGF/OGSA.

⇒ Sustainable Research Grid Infrastructure.

NAREGI Middleware

Allocation, scheduling

OGSA Grid Services

Requirements.

The Ideal World: Ubiquitous VO & user management for international

e-Science

Europe: EGEE, UK e-Science, …

US: TeraGrid, OSG,

Japan: NII CyberScience (w/NAREGI), … Other Asian Efforts (GFK, China Grid, etc.)…

Gri

d R

eg

ion

al In

frast

ruct

ura

l Eff

ort

sC

olla

bora

tive t

alk

s on

PM

A,

etc

.

HEPGridVO

NEES-EDGridVO

AstroIVO

Standardization,commonality in software platforms will realize this

Differentsoftware stacksbut interoperable

http://www.ggf.org/index.php

Backup

Summary: NAREGI Middleware

Users describe ・ Component availability within VO ・ Workflow being executed with abstract requirement of sub jobs.

NAREGI M/W 　　　・ discovers appropriate grid resources,　　・ allocates resources executes on virtualized environments　　・ as a ref. impl. of OGSA-EMS and others　　・ Programming Model : GridMPI & Grid RPC　　・ User Interface :　　 Deployment, Visualization, WFT

will be evaluated on NAREGI testbed.　　・～ 15TF, 3000 CPUs　　・ SuperSINET 10Gbps national backbone

Abstract Requirement

NAREGI M/W

Allocates jobs appropriately.

・ Users can execute complex jobs across resource sites, w/mulitple components and data from groups with a VO such as coupled simulations in nano-science.・ NAREGI M/W allocates the jobs to grid resources appropriately.

concretization

General-purpose

Research Grid e-Infra.

The NAREGI WP6 RISM-FMO-Mediator Coupled Application

RISM FMOReference Interaction Site Model Fragment Molecular Orbital method

Site A

MPICH-G2, Globus

RISM

RISM

FMOFMO

Site B

GridMPI

Data Transformationbetween Different Meshes

Electronic Structurein Solutions

Electronic StructureAnalysis

Solvent Distribution Analysis

“Mediator”“Mediator”

Run a job; Nano-Application

■ Job: Grid enabled FMO (Fragment Molecular Orbital method)　　　　　　 : a method of calculating electron density and energy of a big molecule, divided to small fragments,　　　　　　　　　 originally developed by Dr. Kitaura, AIST.

　　　　　　　　　　　↓ Workflow based FMO developed by NAREGI.

Total Energy

Fragment DB Search

Visualization

An example ofHigh throughput

computing.

■ ■ ■ 　 # of frag.

■ ■ ■ ■ ■

　　 # of pairs■ ■ ■ ■ ■

Dimer calculation(pair of monomers)

Monomer calculation

9: Grid Accounting⑬

“Cell Domain” Info.Service

Query

“Cell Domain” Info.Service

Query

GridVM_Engine

Process

Computing node

GridVM_Scheduler

usage usage

“Cell domain” Info.Service

UR - XML

RUS Query

Cluster / Site

Info.Service “Node”(Upper layer)

RUSQuery

Users

　 can search and summarize their URs by

・ Global User ID・ Global Job ID・ Global Resource ID・ time range …

collects and maintains Job Usage Records in the form of the GGF proposed standard

RUS : Resource Usage ServiceUR : Usage Record

GridVM_Engine

Process

Computing node

GridVM_Engine

Process

Computing node

usage

(GGF RUS,UR).

the naregi resource management framework satoshi matsuoka professor, global scientific information...

Documents

grid software

promotion of grid apps

application grid tools

saga nii

usami nii

molecular scienceimscenter

computing center

programming middleware