okinawa institute of science and technology graduate

22
Okinawa Institute of Science and Technology Graduate University Next Generation HPC Cluster Specification Contents 1. Background and Scope 2. Eligibility Criteria 3. Evaluation Criteria 4. Required Documentation 5. Warranty, Maintenance and Support 6. Network 7. High Performance Storage 8. Cluster node composition 9. Operating System and node configuration 10. Power consumption and rack layout 11. Physical Installation and acceptance 12. Appendices

Upload: others

Post on 16-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Okinawa Institute of Science and Technology Graduate

Okinawa Institute of Science and Technology Graduate University

Next Generation HPC Cluster Specification

Contents

1. Background and Scope

2. Eligibility Criteria

3. Evaluation Criteria

4. Required Documentation

5. Warranty, Maintenance and Support

6. Network

7. High Performance Storage

8. Cluster node composition

9. Operating System and node configuration

10. Power consumption and rack layout

11. Physical Installation and acceptance

12. Appendices

Page 2: Okinawa Institute of Science and Technology Graduate

Page | 1

1. Background and Scope

OIST seeks proposals for its next generation general purpose computing cluster system to provide

centralized computing resources to OIST scientific research. This general-purpose computing cluster will

follow and take over the existing "Sango" computing cluster. This document gives OIST minimum system

requirements to be considered in vendor proposals.

The successful applicant will be determined via the propose core-price value and proposal evaluation,

for which eligibility and evaluation criteria are described in the next sections.

Each proposal shall include the total cost for the hardware, delivery and physical install plan, OS

installation, configuration, license costs, staff training, and hardware support.

2. Eligibility Criteria

The vendor must have prior experience in HPC and storage deployments in Japan at the scale of this

system or larger.

The vendor must have at least three engineers having experience in installing, maintaining and

supporting HPC system with more than 1000 nodes in Japan. Those engineers must be regular

employees of the system vendor and cannot be outsourced. Moreover, they will be involved in the

design, implementation, operation and support of the delivered HPC system.

The vendor shall have prior experience of delivering maintenance and support in Japan.

OIST will evaluate the entire proposition using the criteria in the section below.

3. Evaluation Criteria

See the Score Sheet

4. The matters listed in the proposal

The following must be provided as part of the submitted document (documentation clearness, format

and completeness are taken into consideration during the evaluation):

• Evidence for eligibility

◦ Relevant experience and demonstrated ability to design, deliver and support HPC system

(computing and storage) in Japan.

◦ At least three names of engineers having experience installing, maintaining and supporting

HPC system with more than 1000 nodes in Japan.

• A complete set of quotes that include the unit cost for each item in the system

◦ A quote must be provided for the whole system.

◦ Unit cost shall be at offer price (not list price)

◦ Maintenance and support costs for each year of the 4 years warranty coverage of the

system.

Page 3: Okinawa Institute of Science and Technology Graduate

Page | 2

◦ Quotes for an optional one-year extension of maintenance and support.

• Evidence that the storage system can fulfill the requirements detailed in the specification

• Provide expected (best estimation) SPECfp/SPECfp_rate and SPECint/SPECint_rate performance

values (using version 2017) for each type of CPU present in the compute nodes 8-1 and 8-2 of

the proposed system, considering the BIOS setting in 9-1.

• Detailed faceplate and estimated (at peak performance) maximum power consumption of the

system. Total (estimated) power consumption at peak performance should not exceed the

400kW power capacity limit (storage system excluded).

• A basic acceptance testing procedure for the system, that includes the stress check

• Evidence of support response time of less than two business day for customer in Japan (from

first contact to problem resolution including part replacement lead-time).

• Delivery and acceptance test plan

5. Warranty, Maintenance and Support

All systems must be covered by a 4 years warranty that can optionally be extended to 5 years, on OIST

request.

OIST staff will physically replace the components listed below. The vendor must provide a minimum

spare part stock onsite for the items in the following table.

Component Minimum number of spare parts per component type

[Proposed number must be based on annualized failure rate (AFR)]

Memory 16

HDD 4

SSD/NVMe 2 SSD/ 2 NVMe

Power supply 1 (for each type, including switches)

PCI card 1 of each type

InfiniBand HCA (if not integrated) 4

NIC card (if not integrated) 4

All other failed hardware components must be replaced by the vendor onsite within the next two

business days.

Page 4: Okinawa Institute of Science and Technology Graduate

Page | 3

Technical support in either English or Japanese (whichever is available) must be available by telephone

and by email during business hours (weekdays, 9:00-17:00 minimum core time, Japan local time).

Maintenance and support for the storage for this new system should be at least equivalent (quality,

response time, implementation, etc.) to the maintenance and support provided to actual OIST tiered

storage systems (See the attached document “OIST_Storage_Maintenance_Spec_example.docx” for an

example maintenance and support specification).

Page 5: Okinawa Institute of Science and Technology Graduate

Page | 4

6. Network

The HPC system will have three networks. The Ethernet network and the management network will be

connected (uplink) to the OIST core network. However, they should stay operational even in the event

of a loss of connectivity with the OIST network.

The third network is an InfiniBand network meant to be private to the HPC system. The InfiniBand

network should consist of one monolithic switch, or set of switches, and should not be a composition of

different switches or technology from different manufacturer.

Ethernet and Management network port must be directly accessible, by switches, in each rack used for

the HPC system.

For the three types of network, all network switches must have redundant high efficiency power

supplies, and all the hardware components must have the latest working firmware.

6-1. Ethernet network

All systems must be connected to the OIST network via 10G links, aligning to OIST standards. The HPC

device uplinks will be connected to at least two physical network devices (Fabric extenders, Switches)

with all links forwarding. The switch devices will support Link Aggregation Control Protocol (LACP,

802.3ad).

The network devices (Fabric extenders, Switches) will be interconnected via at least 4 x 40G Multi-

Chassis Link Aggregation Group (MLAG, otherwise referred to as MC-LAG) connections (from leaf

switches to spine switches for a given row).

All switches of the Ethernet network must come with the appropriate DCNM (Data Center Network

Manager) licenses so they can be used with OIST spines switches. Moreover, for performance and

operation effectiveness, all provided switch OS version has to be aligned to OS version used by OIST

switches, and have their management port configured. OIST will provide the OS version, together with

the switch management port settings.

Regarding the support of a failed switch, when the vendor send a switch for replacement, the switch OS

version must be pre-aligned with OS switch version, and the management port of the switch must be

configured.

All network devices must provide or support the following:

A dedicated management via a discrete manage connection to the dedicated management network.

• MP-BGP, and VXLAN EVPN.

• VXLAN EVPN routing without loopback cables

• Both symmetric and asymmetric VXLAN EVPN routing.

• A distributed VXLAN EVPN anycast layer 3 gateway

• The network interconnects must not incur more than 4:1 oversubscription (can be up to 6:1

between the FEX and the compute nodes)

Network switches must support:

Page 6: Okinawa Institute of Science and Technology Graduate

Page | 5

• 48 x 1/10-Gbps SFP+

• 6 x 40-Gbps fixed QSFP+ ports

• Must fit within 1RU

• Maximum of 2 microsecond of latency with at least 1.28Tbps of bandwidth

• Minimum of 40MB of integrated buffer space.

• Integrate with existing 10 and 40 Gigabit Ethernet equipped Cisco

• Nexus 2300 fabric extenders.

The network topology “C-D” and “I-J” must strictly follow the OIST network topology standard (switches

number, distribution into rack, connectivity and cabling) given in the two figures below (topology and

rack layout) for the OIST existing system.

Fig. 1 OIST overall data center network layout for rows A,B,E,F,M,G,H,K, and L

(The switches actually used at OIST are listed below)

Spine: Nexus 9364C

Border Leaf: Nexus 7710

Leaf: Nexus 9372 PX, and Nexus 93180YC-EX

FEX: Nexus 2348UPQ

Page 7: Okinawa Institute of Science and Technology Graduate

Page | 6

Fig. 2 Distribution of existing network switches in OIST data center racks (top view of rows

A,B,E,F,M,G,H,K, and L (see Fig 5. for OIST datacenter physical rack layout)

All network devices for the racks used in the proposed solution must be provided. For the Ethernet

network the leaf switches must be provided for all rows used in the proposed solution (two switches per

row). For example (see Fig. 3), if rack C-01 to C-06 are used, the vendor should provide:

• 2 leaf switches for row C

• 6 FEX switches (one FEX switch per rack)

• the cables to connect together the two leaf switches

• the cables to connect the leaf switches to OIST spine switches (under floor)

• the cables to connect the leaf switches with the FEX switches

• the cables to connect the FEX to the servers

• any other required components (for example: patch panel converter)

Page 8: Okinawa Institute of Science and Technology Graduate

Page | 7

Fig. 3 Ethernet switches and cables example layout for row C

(OIST uses patch converters to aggregate 4 x 10G into 1 x 40G)

6-2. Management network

For remote management, all systems have to be connected to a separate out-of-band (OOB)

management network which supports at least 100M host connectivity with at least 1G network

interconnects between network devices. All network devices must support Ethernet and no KVM

switches or the like shall to be included.

The Management network must be connected via 2 x 1G links in LACP to the OIST core network switch

and support Multi-Chassis Link Aggregation Group for these uplinks. In addition, for future use, the

switch must have at least 2 x 10G ports capable of optical fiber connection and LACP configuration. The

management network can be oversubscribed.

For management purposes, a switch, say type-T (fully compatible with Cisco Catalyst3650-24 TS-S), must

be installed in each but the last rack of a given used row, and a switch, say type-L (fully compatible with

Catalyst3650-48 TS-S), must be installed in the last rack of this used row.

All type-T switch in each rack in a single row must be cabled to the type-L switch in the last rack in the

same row, and this type-L switch will be cabled via uplinks to an available OOB core switch (Cisco

Catalyst 3650-48PQ provided by OIST) in row M.

Page 9: Okinawa Institute of Science and Technology Graduate

Page | 8

6-3. InfiniBand network

All systems with InfiniBand network interfaces are to be connected to the InfiniBand network.

Connections to the servers and nodes are to have a bandwidth of not less than 100Gb/s (HDR100), and

must also be capable of 200Gb/s (HDR) without change or addition of any switches, and, only InfiniBand

configurations will be evaluated.

In order to create a consistent environment, all InfiniBand cards and switches must be from the same

manufacturer and have a proven record of performance and compatibility. All the InfiniBand cables

must be compliant with the InfiniBand network setup and must ensure the best performance. Given the

number of nodes and limited space, fiber-based InfiniBand cabling may be preferable to copper in at

least some locations, vendors should carefully consider cabling and space restrictions.

• The entire InfiniBand network must be symmetric, no more than 50% blocking, with a maximum

blocking ratio of 26:14.

• The InfiniBand network must be configured in a fat-tree topology, and allow further ports

addition up to a minimum of 832 ports in HDR100 or 416 ports in HDR. For example, final

configuration with 16 leaf switches and 7 spine switches where 2 switches are managed, would

lead to a proposal of 6 leaf switches and 7 spine switches, where 2 are managed.

• In case of using a set of spine/leaf switches for the Infiniband network, the switches must be

distributed into the “I-J” and “C-D” rows (as shown in Fig. 4, for the spine and leaf switches), in a

way to allow around 60% and 40% of the fat tree topology ports (minimum 832/HDR100 or

416/HDR) to be connected, respectively.

• The delivered Infiniband switch components must include all the spine switches that could allow

a minimum of 832 ports in HDR100 or 416 ports in HDR, and have enough ports count for all the

HPC systems, and must have a minimum of 64 ports free on the leaf switches for additional

devices to be connected.

• The switches should be distributed into the racks in a way to minimize cable length requirement

of future leafs and compute nodes expansion.

• All InfiniBand components must have a proven record of accomplishment and must be widely

known in the HPC community.

• All InfiniBand components (HCAs, switches) must be from the same manufacturer as the

InfiniBand switches and must have OFED drivers proven to be compatible with RHEL/CentOS 8.0

(or RHEL/CentOS 7.6 with Linux kernel 4.19) and the latest working firmware installed.

• All InfiniBand switches, cables and subnet manager licenses must be included in the proposal.

• All the InfiniBand switches must be remotely managed through an Ethernet interface to be

connected to the management network.

Page 10: Okinawa Institute of Science and Technology Graduate

Page | 9

Fig. 4 IB spine/leaf switches distribution

6-4. Cables, Cabling and Labeling work

All the cables required to connect all the system components must be included in the proposal.

The cabling design must be planned and documented before installation work starts, and reviewed by

OIST with the following requirements aligned too:

• Cabling must be neat and non-obstructive. Meaning cables should run down the side of racks

rather than the middle and not restrict the install or removal of existing or additional

infrastructure.

• The right length of cable should be used. Some slack is ideal but if there is more than 50cm, then

a more appropriate length cable should be utilized.

• All cabling between racks must run over / below the racks and approved by OIST.

• OOB cabling should be within the same Rack when possible.

• All cables must be clearly labeled, which identifies what it is connected too (need to standardize

this)

Page 11: Okinawa Institute of Science and Technology Graduate

Page | 10

• Bundle cables together in groups of relevance. When bundling or securing cables, use velcro-

based ties. (every 30-60cm)

• Ethernet and management network switches cabling to OIST switches must be underfloor;

whereas all inter-switches cabling between racks must be over-rack, and all switch to server

cabling must be inside the rack.

• Infiniband cabling must be inside the rack or over the rack.

Page 12: Okinawa Institute of Science and Technology Graduate

Page | 11

7. High Performance Storage

High performance storage will consist of two storage, a staging ultra-fast parallel storage and a data

large size parallel storage. Each storage should be individually serviceable without shutting down any

other storage.

All storage server components must have redundant high efficiency power supplies, and all the

hardware components must have the latest working firmware.

The storage servers will be connected to the Ethernet switches through LACP.

7-1. Ultra-fast parallel computing private storage

Should have at least 500TB of capacity using flash technology (NVMe, SSD, etc.) and equivalent RAID6

redundancy.

Read and write performance should be at least 160 GB/s and 130 GB/s, respectively, while providing at

less 6 million 4k random read IOPS.

This storage should be accessible from compute, login, transfer, scheduler, and management nodes and

only on the InfiniBand network. The storage must implement a distributed or parallel file system and

feature user and project (or directory) quota. The filesystem will be mounted on all the nodes through

“/work”.

The system must demonstrate POSIX shared directory file creation rates above 80K and unique directory

file creation rates above 550K.

The system should provide an ultra-fast scanning capability that allows rapid scans of the overall file

system for file system accounting and the implementation of purge policies. The ultra-fast scanning

capability should allow scan for 50M files in 100sec speed.

7-2. Data storage

Should have at least 6PB of capacity with equivalent RAID6 redundancy, and rack redundancy if the

storage uses several racks.

Read and write performance should be at least 60GB/s and 50GB/s, respectively.

Should have very fast access to login and transfer nodes on the InfiniBand network (to allow fast data

moving between staging and data storage) and provide SMB/CIFS/NFS services via the Ethernet network

from 2 x 10GBps LACP per head-server connected to the Ethernet network. The filesystem must provide

user, group and directory quota, and DMAPI capabilities. The filesystem will be mounted on the login,

transfer and management nodes through “/data”.

The storage must be able to be integrated with OIST existing “Research storage” and its tape

backup/archiving workflow, allowing it to act as an extension of the “Research storage” (confer to figure

below, more information available upon request).

For flexibility volume control, proposed storage system must allow future expansion, either by adding

HDDs, enclosures or unit.

Page 13: Okinawa Institute of Science and Technology Graduate

Page | 12

7-3. Data management and monitoring

The storage should have at least one dedicated monitoring server equipped with flash local storage (at

least SSD) in RAID configuration, with monitoring software that should at least have the capability of

logging and reporting per-job throughput and metadata-operation usage of the ultra-fast parallel

computing private storage, starting from the acceptance phase. The monitoring system should be fully

integrated with SLURM.

The monitoring system should provide basic monitoring functions for all storage components in the HPC

storage cluster.

7-4. Bridging of existing OIST HPC storage to the proposed new cluster

The vendor will propose and implement solution that allow bridging existing HPC storage on OIST Sango

system to the new cluster, and thus allow “long term” phase out of the Sango storage and other old

storage.

Fig. 5 OIST storage systems overview

The vendor must propose a solution to natively mount the “Saion” storage on the new compute cluster

at less than 15% performance degradation when compared to performance on the existing HPC system.

The implemented solution must allow the following (confer picture below)

• High-speed access from OIST Sango Lustre storage to the login and transfer nodes of the new

system

• High-speed access from OIST Saion Lustre storage to natively mount all the nodes of the new

system

• High-speed access from the proposed new system storage to natively mount OIST Saion

compute nodes

• Further Infiniband network migration capability of OIST “Research storage” to new system

storage Infiniband fabric

Page 14: Okinawa Institute of Science and Technology Graduate

Page | 13

• Any storage access must be able to be configured either as read-only or as read-write access

Fig. 6 Storage systems accessibility

7-5. Support for Storage

The vendor will propose support flow. When having contact from OIST about storage issue, the vendor

should fix the issue using remote login, via SSH, except HW issue.

Manufacturer direct support regarding HW issue is required to reduce the resolution time.

Page 15: Okinawa Institute of Science and Technology Graduate

Page | 14

8. Cluster node composition

All nodes must have at least 100GB of storage for the OS and OS related data and a local /scratch

partition with a least as much storage space as the node memory. For example, a 256GiB of memory

node must have at least 275GB (1GB ≈ 0.931GiB) of storage space available in its /scratch partition.

All node should be connected to the three networks described in the network section. Connection to the

Ethernet and management network should be single port and not make use of LACP.

Each node should be individually serviceable without shutting down any other node.

Each node should have optimal RAM (memory) configuration using module with highest memory

frequency available.

The OOB (out of band) management component of each server must provide the following secure

capability (not provided by unsecure IPMI) for protecting the server from inadvertent or malicious

changes: encrypted RoT (roots of trust) authentication must be available for BIOS and firmware during

the server boot process, with the ability to lock the configuration and prevent firmware update.

All node must have redundant high efficiency power supplies, and all the hardware components must

have the latest working firmware.

The nodes must have compact configuration so that 2 nodes can fit in one rack unit.

The total number of high core count nodes (8-1) and high FP performance nodes (8-2) must have a ratio

of approximatively 70% for 8-1 and 30% for 8-2. When exact 70% and 30% ratio are not possible, 8-1

must have the highest ratio.

8-1. High core count nodes

Each node will have minimum two 64-bit capable processor (dual-socket), x86 compatible, minimum 64-

core per processor, minimum 1.96GHz CPU base frequency, maximum TDP of 225W , and at least 384

GiB of memory, with AVX2 support. The motherboard of those nodes should be able to support DDR4

3200 MT/s speed memory. A single HDD or SSD can be used for local storage.

8-2. High FP performance nodes

Each node will have dual-socket 64-bit x86 compatible processor, maximum of 205W TDP, minimum 20-

core per socket, minimum 2.1GHz CPU frequency, minimum 27.5MiB of cache, and minimum 384 GiB of

memory per node, with AVX-512 FMA support. A single HDD or SSD disk can be used for local storage.

8-3. Login nodes

Four nodes with single- or dual-socket 64-bit x86 compatible processor having minimum 40-core per

node, and 192 GiB of memory per node. Local storage will consist of HDD or SSD with RAID1

configuration. The login nodes are also use for:

• fast data movement between the “ultra-fast parallel computing private storage” and the data

storage

• Fast data movement between OIST sango lustre storage and the proposed HPC system

Page 16: Okinawa Institute of Science and Technology Graduate

Page | 15

8-4. Data transfer nodes

Four nodes with single- or dual-socket 64-bit x86 compatible processor having minimum 40-core per

node, and 192 GiB of memory per node. Local storage can be a single HDD or SSD. The transfer nodes

will be used for:

• fast data movement between the “ultra-fast parallel computing private storage” and the data

storage

• Fast data movement between OIST sango lustre storage and the proposed HPC system

8-5. Management server

Server with single- or dual-socket 64-bit x86 compatible processor having minimum 40-core, and 384

GiB of memory. Local storage will consist of HDD with RAID1 configuration.

The management server must have a 1Gb/s Ethernet port available (for example, onboard 1G Ethernet

port) to connect to the management network, in addition to its integrated BMC port also connected to

the management network.

8-6. Scheduling servers

Two servers with dual-socket 64-bit x86 compatible processor having minimum 20-core per socket, and

384 GiB of memory per node. Local storage will consist of at least 1TB HDD or SSD with RAID1

configuration.

Page 17: Okinawa Institute of Science and Technology Graduate

Page | 16

9. Operating System and node configuration

All nodes and servers should be able to run CentOS 8.0 (or CentOS 7.6 with Linux kernel v4.19). SLURM

should be the installed as the cluster scheduler on the scheduling nodes, with high availability

configuration of SLURM controller daemons.

OIST SCDA staff should be fully able to rebuild any compute node with newer version of CentOS and

SLURM, and in this purpose, the vendor should provide them with all the required drivers, firmware and

tools and configuration files required for the rebuild.

9-1. BIOS configuration

For the compute nodes:

• Must be capable of PXE network boot and be configured to first try PXE and then hard disk at

boot time.

• Should be configured to remain powered off in the event of power loss and return.

• Must have hyper-threading feature OFF.

• Must not have BIOS settings configured for overclocking.

• Other settings when applicable read as follows

o Intel Turbo boost: Off

o AMD performance boost: Off (if it increases the TDP), On (otherwise)

o cTDP: Nominal

All BMC (baseboard management controller) of each server connected to the OOB (out of band)

network must have administrator password setup to a value different from the default one (it can be the

same password for all server). The password will be provide to OIST at the acceptance phase.

Page 18: Okinawa Institute of Science and Technology Graduate

Page | 17

10. Power consumption and rack layout

The estimate peak power consumption of the proposed system should be provided and should not be

higher than the 400kW maximum capacity available at OIST for this system.

The storage systems and InfiniBand network components shall be connected to the UPS source

(separate from the 400kW source) provided by OIST facility, for a total no higher than 50kW.

10-1. OIST data center layout

Available space in OIST DC for cluster installation are circled and marked as “I-J” and “C-D” from the top-

view layout below. OIST will provide all the racks and PDUs required to operate the system.

Fig. 7 Top view of OIST data center rack layout. Rows C-D and I-J will be available for the new

system

The racks rows will be preferably populated in the order as below (see pictures below):

• C and D rack rows: C01, D01, C02, D02, …, C06, D06

Page 19: Okinawa Institute of Science and Technology Graduate

Page | 18

• I and J rack rows: I10, J08, I09, J07, …, I01, J01

10-2. Free access floor

The data center uses a free access floor having the following specification:

Page 20: Okinawa Institute of Science and Technology Graduate

Page | 19

• Live load: 10000 Pa (1020 kg/m2)

• Floor board: A600L

• Floor height: 1000 mm

• Designed PGA: 0.75 G

• Ground floor (slab under the free access floor) maximum live load: 1200 kg/m2

10-3. Chilled water piping

Cooling will be done via chilled water to be connected to OIST facility available water pipe.

10-4. Rack and PDU specification

OIST will provide the racks and PDU for the I-J and C-D rows (see section 10-1.). Electrical power will be

available at the PDU socket to operate the system components.

10-4-1. Rack description

12 usable racks in C-D, and 18 usable racks in I-J (see 10-1 for rack populating order)

• Front width: 600 mm

• Depth: 1200 mm

• Height: around 2000mm (42U usable)

• Maximum weight: 1500 kg

• Cooling capacity: between 22kW and 24kW

10-4-2. PDU description

In each rack, two types of PDUs will be used.

0U vertical mount PDU:

• Rated input voltage: Single phase, 200V

• Rated input current: 30A

• Nominal power: 6kW

• Output outlets: IEC C19(6), C13(36)

2U horizontal mount PDU:

• Rated input voltage: Single phase, 200V

• Rated input current: 30A

• Nominal power: 6kW

• Output outlets: IEC C19(4), C13(12)

Page 21: Okinawa Institute of Science and Technology Graduate

Page | 20

11. Physical Installation and acceptance

A spreadsheet will be used to track any issue and changes happening during the physical installation and

acceptance, and daily report will be provided to OIST using this spreadsheet.

11-1. Physical installation

The vendor should bring all additional tools, power meters, parts, etc. required for the installation and

acceptance. Dedicated clean clothing must be used inside OIST data center and the vendor staffs must

wear the required protection for the work. All the servers and components will be unpacked outside the

data centers, and no card boxes, plastic bags, bubble wraps, polystyrene foams, paper sheets, etc. shall

be introduced inside the data center.

A kick-off meeting will be held prior the installation to discuss final technical aspects of the delivery,

such as final layout of system components (storage, servers, switches), etc.

11-2. Acceptance

The acceptance will consist of the verification of the entire requirement and the proposed items agreed

during the proposal evaluation. Moreover the following checks must be cleared for the acceptance of

the deliverable.

• OS and SLURM scheduler (slurmctld in HA configuration) installation.

• Operation verification of all the nodes and stress check using the following software (the system

should remain stable over 3 days of continuous run):

o OSU: http://www.nersc.gov/users/computational-systems/cori/nersc-8-

procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/omb-mpi-tests/

o HPL: http://www.netlib.org/benchmark/hpl/

▪ HPL stress test should make use of AVX/AVX2/AVX512 instruction when

available and optimized DGEMM implementation using MKL, BLIS, or OpenBLAS,

or etc., when available.

• Storage performance evaluation

• In addition to network related procedure provided by the vendor in its proposed “basic

acceptance testing procedure”, the network testing will included the following steps

o Vendor perform switch hardware operation test (power on/off, normal booting and

diagnostic, etc.)

o OIST will conduct the configuration “push” from the spine switches

o Vendor perform network tests (connectivity and data transfer) for each system

component

• The vendor will conduct power consumption measurement of the compute nodes (one node

from 8-1 and one node from 8-2) under stress check. OIST will be monitoring the overall power

consumption of the system during stress check.

• Show that measured SPECfp/SPECfp_rate and SPECint/SPECint_rate performance, for the BIOS

setting in 9-1, on the compute nodes in 8-1 and 8-2, meet expected values (equal or better)

As part of the acceptance, the following system details must be provided to OIST (as Excel sheets):

• Serial numbers (or service TAGs) and part numbers of all system hardware and components

Page 22: Okinawa Institute of Science and Technology Graduate

Page | 21

• List of all firmware versions for each hardware parts (including firmware release date).

• All configuration file and settings used for the installation of the OS and SLURM and for the

operation verification

• MAC addresses of all NIC devices together with their node/server names (before the delivery,

OIST will provide the name of the system that will be used for the nodes/servers naming)

• All configuration parameters used during the switches (Ethernet, IB, and Management)

installation

• Operation manual for the storage that include emergency safe shutdown procedure

The vendor will provide a report for each verification. In the event of failure to clear a verification, the

vendor should provide remedy at no cost.

11-3. Delivery deadline and place

The system must be delivered and accepted by February 29, 2020.

The system delivery place is OIST data center, at the address below:

Okinawa Institute of Science and Technology Graduate School

1919-1, Tancha, Onna-son, Kumigami-gun, Okinawa 904-0412, Japan

12. Appendices

Details information on OIST building facility, data center and actual computing system are available the

deadlines for Q&A.