status of bes -...

Ran Du <[email protected]>

2016-Sep-24

Status of BESIII Computing

Outline

• Current Status

• Migration from PBS to HTCondor

• Distributed Computing

• Distributed Monitoring

• High Performance Cluster

• Next Step

2 2016/9/23

Storage Stuck Issue Solved • Storage service unstable after the summer maintenance • Storage log mining work

– Collect and analyze the key words from storage File System logs

– Focus on the core switch provided by Ruijie.

• File System stuck issue disappeared after replacement of the core Ethernet switch ( Aug. 29th), Storage system is more stable now.

2016/9/23 3

Internal error number

Internal error number

• Ruijie Network, the switch vendor gives a preliminary conclusion: the chipset level monitoring results show that there are traffic load balancing algorithm problems.

New Storage Device

• 3.6PB storage device will be purchased by the end of this year.

• 2.5PB (available space) will be added.

2016/9/23 4

Job statistics

2016/9/23 5

• Most are analysis jobs.

• Similar to last season.

Cloud Computing 1/2 • Support multiple batch systems: PBS/Torque, HTCondor. • Dynamic VM provision: vms are created and destroyed on demand. • Fair-share algorithm: guarantee resources are equally distributed among

different experiments.

2016/9/23 6

Resource minimum

limit

Job queued, automatically

create virtual machines

Cloud Computing 2/2

• Based on OpenStack Kilo

• Two kinds of cloud services

• Infrastructure As A Service

– 14 compute nodes – 352 virtual cores

224 cores used, 169 virtual machines running

– User Oriented Self Service

• Virtual Computing Cluster

– 28 compute nodes - 672 cores

– Provide virtual machines on demand of real

computing requirements

– Transparent to users

2016/9/23 7

Outline

• Current Status





• Next Step

8 2016/9/23

Migration from PBS to HTCondor 1/3

• New architecture • Central management

• Integrated with monitoring

• New Monitor tool

• Easy to expand

• HTCondor has supported JUNO and CMS for more than 1 year • High job scheduler performance

• Running stable

• 1000 BES CPU cores have been migrated to HTCondor cluster from PBS in Aug.

2016/9/23 9


• HTCondor has been tested and improved according to users’ feedback.

• HTCondor optimization

• Keep almost the same way for users to manage jobs.

• User manual is ready.

• New share policy

• BES can have more CPU resources from other experiments to fit the peak requirement.

2016/9/23 10


2016/9/23 11

• Next

• 2000 CPU resources will be migrated to HTCondor in Oct.

• All BES resources will be migrated to HTCondor by the end of this year.

• User training will be held in Oct.

Outline

• Current Status





• Next Step

12 2016/9/23

BESIII Distributed Computing 1/3 • During August summer maintenance, DIRAC server has been successfully

upgraded from v6r13 to v6r15.

– Prepare to support multi-core jobs in the near future.

– VMDirac has been upgraded to 2.0, which greatly simplifies the procedure to adopt new cloud sites.

• New monitoring system has been put into production, which gives a clear view of real-time site status.

2016/9/23 13

BESIII Distributed Computing 2/3 • The 2th BESIIICGEM Cloud Computing Summer School has been successfully held

in July, Shandong University.

• About 30 people joined the school. – Teachers from INFN, IHEP, Zhejiang University, 99cloud. – Students from IHEP, SDU, JINR, Soochow, USTC.

• The summer school has greatly helped students gain knowledge of cloud and know how to use cloud for Physics analysis.

– Help push forward cloud applications in HEP.

2016/9/23 14

BESIII Distributed Computing 3/3

• During the last three months, about 224K BESIII jobs have been done on the platform. – 11 sites join the production

– ~40% from the UMN site

• Total data exchange among sites are about 68.8TB.

• About 70 user tasks have been done.

• BESIII distributed computing system keeps stable this season

• Multi-core support is on the way to meet future challenge

2016/9/23 15

Outline

• Current Status





• Next Step

16 2016/9/23

Distributed Monitoring System 1/2

• Motivation

• Many remote sites are short of man power to do maintenance work.

• IHEP can help on routine maintenance for the remote sites.

• Distributed monitoring is the cornerstone.

• Migrate monitoring server from Icinga to Nagios.

– Better support for distributed monitoring architecture .

• Enforced system security.

– Open port 80 of monitoring server for outer network.

– Update Apache from v2.2 to v2.4.

– Check system vulnerabilities regularly.

2016/9/23 17

Distributed Monitoring System 2/2

• Chengdu Site

• Monitor all hosts of Chengdu site from the central site (IHEP).

• HTCondor

• Monitor HTCondor Service of all the computing servers.

2016/9/23 18

Outline

• Current Status





• Next Step

19 2016/9/23

High Performance Cluster 1/2 • A new heterogeneous hardware platform : CPU, Intel Xeon Phi, GPU

• Parallel programming supports : MPI, OpenMP, CUDA, OpenCL …

• Potential usage cases : simulation, partial wave analysis …

2016/9/23 20

700TB (EOS/NFS)

150 GPU Cards 1000 CPU Cores 50 Xeon Phi Cards

Remote

Sites Users

Login Farm Job

Mellanox SX6012

Brocade

8770

SX6036

InfiniBand FDR 56Gb/s

Ethernet 10/40 Gb/s

112 Gb/s 80 Gb/s

Job

Job

Job

Job

Job

Job

High Performance Cluster 2/2

• SLURM as the scheduler.

• Test bed is ready: version 16.05

– Virtual machines: 1 control node, 4 computing nodes.

– Physical servers: 1 control node, 26 computing nodes(2 GPU servers included).

• Undergoing scheduler evaluation.

– Two scheduler algorithms evaluated: sched/backfill, sched/builtin.

– Undergoing integration with DIRAC.

• Network architecture & technologies – InfiniBand network for HPC test bed is already built.

2016/9/23 21

Outline

• Current Status





• Next Step

22 2016/9/23

Next Step

• ~2.5PB (available storage) will be added for BES.

• Migration from PBS to HTCondor will be finished by the end of this year.

• IHEP will provide routine maintenance service to more remote sites.

• HPC cluster will be provided next year.

2016/9/23 23

Conclusion

• Computing platform for BESIII is more stable after the core switch replacement.

• Optimized HTCondor cluster can satisfy BES computing requirements.

• Distributed monitoring system provides maintenance service to remote BES sites.

• HPC test bed has already been built and is now under evaluation.

2016/9/23 24

2016/9/23 25

Thanks & Questions ?

status of bes -...

Documents