research.kek.jpresearch.kek.jp/people/iwai/annualreport2013/submitted_v... · web viewin fy2013, we...

14

Click here to load reader

Upload: doxuyen

Post on 09-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: research.kek.jpresearch.kek.jp/people/iwai/annualreport2013/submitted_v... · Web viewIn FY2013, we prepared another perfSONAR server to cope with perfSONAR-MDM (Multi Domain Monitoring)

4.4.2. Computing Research Center

4.4.2.1. Overview

The Ccomputing Rresearch Ccenter (CRC) provides computing resources and computer networks to support research activities at KEK.

The Central Computer System (CCS) provides the staff of KEK and research collaborators with large amount data storage and CPUs for data analysis of experiments running in KEK. This system includes computer Grid system, which enables cooperative analysis in the global scale collaboration projects. The Ssupercomputer System is operated for large-scale simulation programs, mainly used in the field of computational physics.

The computing research centerCRC operates campus networks including the KEK-LAN in Tsukuba campus and JLAN in Tokai campus, and the HEPnet-J for high energy physics collaboration of domestic universities and laboratories. In these computer networks, the concerns over computer security have been steadily gaining importance.

To support communication in research activities, the computing research centerCRC provides an e-mail system and web systems. Such information systems provide a variety of services such as mailing lists, a Wiki, and the document management system (KDS-Indico), besides conventional services.

Computing Research CenterCRC promotes several research projects such as Grid systems, Manyo-Library for data

analysis tools, Geant4 for detector simulation, and GRACE for automatic theoretical calculations.

4.4.2.2. Computing Services

Central Computer SystemThe current Central Computer

SystemCentral Computer SystemCCS has been operated since FY2012. The system consists of data analysis system called KEKCC and information services of E-mail and web systems. KEKCC is a HPC (High Performance Computing) Linux cluster system, and provides 4,176 cores CPU and large-scale storage system. The storage system is composed of two types of systems, one is a disk system of the GPFS (General Parallel File System) with 7 PB capacity, and the other is a tape library system that can store data in tapes up to 16 PB. The HPSS (High Performance Storage System) is used as Hierarchical Storage Management (HSM) for accessing tape data, and that data I/O is automatically performed through the GPFS file system by the GHI (GPFS/HPSS Interface). This GHI GPFS/HPSS interface enables to access tape data in the same way asfor disk data. So far the amount of 1.4 PB and 4.5 PB data are stored in the disk and tape systems respectively.

The file system has been unstable against huge data accesses at the first. The system upgrade was performed in August 2013 to improve system stability. Another technical problem with the small file aggregation functionality of GHI was found, and the fix against this system bug was applied in February 2014.

1

Page 2: research.kek.jpresearch.kek.jp/people/iwai/annualreport2013/submitted_v... · Web viewIn FY2013, we prepared another perfSONAR server to cope with perfSONAR-MDM (Multi Domain Monitoring)

LSF is used as a job scheduler in the batch system. We made continuous efforts to improve job throughputs, monitoring jobs, and optimizing queue settings and parameters of fairshare scheduling policy. As a result, over 80% monthly CPU usage rates were achieved as shown in Fig. 4-4-2-2-1.

In the KEK mail system, phishing mails that target at Active!Mail users were sent every month after November 2013. More than 100 users received that kind of mails at each time. We alerted users to the phishing mails, and took measures of blocking accesses to the phishing site on the KEK firewall.

Large-scale Simulation ProgramKEK launched its Large-scale

Simulation Program in April 1996 to support large-scale simulations in the field of high-energy physics and other related areas. Under this program, KEK solicits proposals for projects that make use of the KEK Supercomputer System (KEKSC).

Two research periods overlapped the FYfiscal year 2013: the research periods of 2012-2013 and 2013-2014 each from October to next Septemberr in the next year. During the 2012-2013 research period, 26 proposals covering the following research areas were filed and approved by the Program Advisory Committee: lattice QCD (14), elementary particle physics (3), nuclear physics (3), material science (3), astrophysics (1), and accelerator science (2). In

In addition, 6 trial applications were also accepted. In the 2013-2014 research period, 23 proposals have been approved,

most of which are continuations of proposals filed in the last period. Four trial applications have also been approved thus far. (See http://ohgata-s.kek.jp/)

KEKSC also provides computing resources to the Computational Fundamental Science Project driven by the Joint Institute for Computational Fundamental Science (http://www.jicfus.jp/) ().

In FYfiscal 2013, 6 proposals covering computational high energy and nuclear physics and astrophysics were filed and approved as projects to use KEKSC.

The KEKSC currently consists of System A, a Hitachi SR16000 model M1, and System B, an IBM Blue Gene/Q. System-A started service in September 2011 at an off-site data center, and has been in operation at KEK since March 2012. System B started service in April 2012. KEKSC is connected to the Japan Lattice Data Grid, which provides fast transfer of data of data in lattice QCD among super-computer sites in Japan via HEPnet-J/sc, a virtual private network based on SINET4 provided by National Institute of Informatics.

There was an unauthorized access to KEKSC in the middle of October for which KEKSC was out of service from 1st

November to 10th

December. Details are described in the Sec. 4.4.2.3 for Security.

In November 1st, it was found that unauthorized accesses to KEKSC occurred in the middle of October. The KEKSC system was immediately isolated from outside and the countermeasures team started investigation. What was found is thatare the following: although there were

2

Page 3: research.kek.jpresearch.kek.jp/people/iwai/annualreport2013/submitted_v... · Web viewIn FY2013, we prepared another perfSONAR server to cope with perfSONAR-MDM (Multi Domain Monitoring)

three spoofed logins,, trials to get the administrator rights failed, and no damage to the system and users' data were found. With reinforcing security levels and education of the network security for users, the service was restarted on December 10th.

4.4.2.3. Network and Security (0.5-1.0p)

NetworkRenewal of the network system

The KEK network system is replaced in FY2013. The network structure of new network is almost same as that of the previous one, but the bandwidth and performance of core switches are upgraded more than twice. Now 10 Gigabit Ethernet links to the core switch can be operated without interfering to each other. In the case of previous system, packets may be dropped on the backplane on the switch when all of 10 GbE links are fully used simultaneously. Now the port density and the bandwidth of the backplane are upgraded to avoid such situation. This will help the mass data transmission over the connection from KEKCC via SINET. Previous firewall has 10 Gbps interfaces, but it limits the bandwidth of single TCP up to 5 Gbps. Now the limit of new firewall is 10 Gbps and there are two firewalls for the redundancy. Both of them will not be a bottleneck for the data transmission.

The access point for the wireless network connection is also increased from 150 to 200, and now the access points accepts connection using IEEE 802.11n. The Multi-Input and Multi-Output (MIMO) for the high-speed connection using multiple channels is enabled only in the 5

GHz radio band. The access points support MIMO in the 2.4 GHz radio band, but there are quite many 2.4 GHz access points operated by user in KEK Tsukuba Campus, so currently MIMO at 2.4 GHz is not scheduled.Stopping “tsubaki-II”

The wireless network “tsubaki-II” is started as the successor of “tsubaki” because the number of clients exceeds the limit of the network authentication device for tsubaki. It was just a successor, so the encryption method and encryption key are same as those for the original one. The encryption method “WEP” is too weak to prevent the security issue, so we stopped it at Jan 2014. Now “tsubaki-III” that requires “WPA2” encryption method is available instead of them.Retirement of DNS cache servers for the HEP community in Japan

Since the beginning of Internet connection of KEK, “kekux” and several DNS servers can be used not only from KEK but also from the HEP community in Japan. As recently it can be a source of Distributed Denial Of Service (DDoS) Attack, DNS cache servers should be prepared in each site. The preparation of DNS cache servers was requested to all sites connected to the HEPnet-J network in FY2012, and DNS cache servers are not usable from HEPnet-J sites since FY2013.PerfSONAR servers in KEK

Since few years ago, perfSONAR-PS servers are running on KEK-DMZ and KEKCC network to monitor the network performance. In FY2013, we prepared another perfSONAR server to cope with perfSONAR-MDM (Multi Domain

3

Page 4: research.kek.jpresearch.kek.jp/people/iwai/annualreport2013/submitted_v... · Web viewIn FY2013, we prepared another perfSONAR server to cope with perfSONAR-MDM (Multi Domain Monitoring)

Monitoring) servers in remote collaboration sites. The perfSONAR-PS server issues performance test in its own interval and it responds to the request from remote servers when there is no ongoing performance test. Therefore, if many sites are registered in the scheduler and the interval is not so long, the server will not allocate test time for request from the remote site. In the case of perfSONAR-MDM, it will not execute performance tests by itself. Instead of that, a central server sends a request to perfSONAR-MDM servers for the performance test between themselves. The schedule of performance tests is determined by the central server, and the history of test results is recorded in it only. This model reduces the resource requirement of the measurement node. Similar to the perfSONAR-PS, now KEK has two perfSONAR-MDM servers in KEK-DMZ and KEKCC network.

SecurityIn FY 2013, there occurred severe

security attacks to supercomputer systems in Japanese academic sites. The KEK Supercomputer SystemC had also received attacks. The credentials, pairs of the username and the password of users, stolen somewhere else were used to intrude SSH login servers in our supercomputer systemthe KEKSC. Although the attacks made no effect to the system and users’ data on the system, it took more than one month to restart computing service due to the investigation. Not only KEKSC supercomputer systems but also other SSH servers are becoming great targets. Management of credentials is becoming

more and more important among both system administrators and users.

The similar attack is applied to Web applications such as Webmail where a credential is required to login. One of ways to steal credentials is “phishing” in which a huge amount of fraudulent mails are sent to lead to the phishing website. In the last half of FY 2013, in about every month, over a hundred users of our mail system received mails pretending as if they were sent by KEK Webmail administrator. Some of users were deceived and put their credentials in the phishing website. Once they are stolen, they can be used to send SPAM mails or used to intrude into other Web applications. To minimize the risk from these social engineering attacks, KEK warned users many times to ignore such suspect mails requiring to enter credentials, and not to click the URL in them. KEK Webmail administrator never requests users to enter the credentials in such a manner.

4.4.2.4. J-PARC Information SystemSince FY2002, the J-PARC

infrastructure network, called JLAN, has been operating independent of KEK LAN and JAEA LAN in terms of logical structure and operational policy.structure and operational policy. The total number of hosts on JLAN reached over 3,800, and it has been increasing at a rate of 108% per year. The growth curves of the edge switches, wireless LAN access points, and hosts connected to JLAN are shown in Fig. 4-4-2-24-1. Fig. 4-4-2-34-2 shows the network usage in FY2013 from the Tokai site to the Tsukuba where the central computer systemCCSKEKCC is installed.

4

Page 5: research.kek.jpresearch.kek.jp/people/iwai/annualreport2013/submitted_v... · Web viewIn FY2013, we prepared another perfSONAR server to cope with perfSONAR-MDM (Multi Domain Monitoring)

The data transfer rate was achieved 6 Gbps bit/sec in 5 minutes. average and was approaching to the network bandwidth capacity of 8 Gbit/secps. The figure also shows that after June major network activities related J-PARC experiments were suspended due to the accident at the Hadron Experimental Facility on May 23.

4.4.2.5. Research and Development

Grid in Medical ApplicationsWhile Monte Carlo (MC) simulation is

believed to be the most reliable method of dose calculation in particle therapy, the simulation time is critical in attaining sufficient statistical accuracy for clinical applications.

In order to help a rapid development of Grid/Cloud aware MC simulations, CRC has developed the Universal Grid Interface (UGI) based on the Simple API for Grid Applications (SAGA). SAGA, which is standardized in the Open Grid Forum (OGF), defines API specifications to access distributed computing infrastructures. The UGI is a set of command line interfaces and APIs in the Python scripting language for job submission, file manipulation, and monitoring in multi-Grid middleware infrastructures as well as local resources managed by popular load management systems, e.g. PBS and LSF.

We have developed a common platform of MC dose calculation in Grid distributed computing systems to allow medical physicists to separate dose big-calculations into many small-calculations and process in parallel over the distributed systems. The platform is flexible and effective for dose

calculation in both clinical and research applications for particle therapy. The platform consists of the UGI and the Geant4-based Particle Therapy Simulation Framework (PTsim). We have achieved significant performance improvement in turn-around-time for dose calculation by parallelization of original calculation with this platform.

Object Oriented Data Analysis Environment for Neutron Scattering, “Manyo-Library”

The Materials and Life Science Facility (MLF) of J-PARC is a user facility providing neutron and muon sources for experiments. A data-analysis environment for each instrument in MLF has been developed on a software framework. The framework, Manyo-Library, has common and generic analysis functionalities for neutron-scattering experiments, and has developed and maintained by MLF computing environment group. The framework is a C++ class library, and is based on an object-oriented methodology. Manyo-Library provides many methods, for example, data input/output functions, data-analysis functions and distributed data processing, and can be used on a python user-interface. As the data container in Manyo-Library can be writtene in the NeXus format (see http://www.nexusformat.org/), the data files can be read in any other laboratory. Many data-analysis software programs have been developed for various instruments/experiments by adopting

5

Page 6: research.kek.jpresearch.kek.jp/people/iwai/annualreport2013/submitted_v... · Web viewIn FY2013, we prepared another perfSONAR server to cope with perfSONAR-MDM (Multi Domain Monitoring)

this framework.Manyo-Library has been installed to

the neutron scattering instruments in MLF and utilized as an infrastructure of software environment. The first official version of Manyo-Library, 0.3, was released in 2012, and it is improved in this year. A small workshop for beginners of Manyo-Library was held in the August, 2013, and the data analysis environment was installed into their own laptops. We will start discussing and designing a new data-file format based on HDF-5 (Hierarchical Data Format). Manyo-Library will work with Python ver.3 in the next year.

GRACEGRACE is an automatic computation

system that provides quantitative theoretical predictions of cross sections of elementary particles and event generators for high-energy physics experiments. An important extension of the GRACE system is the inclusion of higher order corrections. This is necessary to give more precise theoretical predictions in the Standard Model or beyond.

In ILC energy region, the correction at the one-loop level to electroweak processes with Higgs production has been estimated at around 10 percent of the tree level. Therefore the correction at the multi-loop level should also be considered. To estimate it, the implementation of calculation of multi-loop integrals into GRACE is demanded. In this extension, it is a challenging work to handle three kinds of divergence generating from infrared term, ultra violet term and kinematical

conditions.We have been developing the Direct

Computation Method (DCM) for multi-loop integrals. It is based on numerical multi-dimensional integration and numerical extrapolation. DCM is a fully numerical method and is applicable to multi-loop integrals with various physics parameters. Using DCM the divergence generating from infrared terms and kinematical conditions can be handled in a fully numerical way.

Besides fully numerical approach, combination of symbolical and numerical treatment of two-loop integrals has been studied. Using symbolic manipulation software, Feynman amplitudes have been obtained by n-dimensional regularization method. An automatic system has been already underway taking calculation of muon anomalous magnetic moment at the two-loop level as an example, where 1780 two-loop diagrams and 70 renormalized one-loop diagrams appear. TheThis gauge-fixing is used as a very efficient means to check the results.

At the same time, pure theoretical approach has been developed. It is known that any loop and any point function can be written as a linear combination of some hypergeometric series. We explain how one-loop integration is expressed in a hypergeometric series using recursion formulae. We also obtain an n-point function exactly expressed in terms of a hypergeometric series for arbitrary mass parameters and momentum in any space-time dimension. Since the singular points in hypergeometric functions are investigated well, a software library enabling very precise and stable computation of loop

6

Page 7: research.kek.jpresearch.kek.jp/people/iwai/annualreport2013/submitted_v... · Web viewIn FY2013, we prepared another perfSONAR server to cope with perfSONAR-MDM (Multi Domain Monitoring)

integrals can be expected.For ILC, other options such as electron-

electron, electron-photon and photon-photon colliders have been discussed. Each option will provide interesting topics such as the detailed measurement of the Higgs properties and the quest for the new physics beyond the Standard Model. We calculated the electroweak one-loop contributions to the scattering amplitude for e-γ → e- Higgs and expressed it in analytical form. We also analyze the cross section for the Higgs production for each combination of polarizations of the initial beams.

Geant4Geant4 is a toolkit for detector

simulation of the passage of particles through matter. It provides a comprehensive set of functionalities for geometry, material, particle, tracking particles, particle interaction, detector response, event, run, visualization, and user interface. Geant4 has flexibility and expansibility as generic simulation framework, and it is widely used in many different application domains from HEP experiments to medical and space applications. Its versatility has gathered attention forom fields beyond particle physics.

In FY2013, the new version 10.0 was released on December 6th, and several patches were also released. We successfully released a multi-thread version of Geant4 (G4MT) in this release. Thanks to G4MT, Geant4 simulation can be processed concurrently in threads, and CPU cores can be utilized efficiently in multi-core environments. G4MT is designed prudently

so that the migration to multi-threaded applications can be done with minimum changes of user codes. Also from the performance viewpoint, G4MT shows good scalability with increase of CPU cores. We succeeded to run G4MT applications on Intel Xeon Phi new many-core architecture showing high scalability.

We also continuously support user community. We organized a 3-days user-training course in December, and there were about 100 participants. In the tutorial, we provided a Japanese language version of Geant4 training materials compliant to the latest version (10.0) of Geant4. These educational materials are expected to be very helpful for many Japanese novices users.

As for new development, we continued a project in the framework of Japan/US Cooperation Program, collaborating with SLAC Geant4 team and some experiment groups. We make efforts on speed-up of Geant4 for future experiments. The project challenges the improvement of Geant4 kernel using recent new computer technologies such as multi-core CPU, many-core CPU, GPU computing. Another direction of Geant4 parallelism using GPU is also in developing. The project set a target to electromagnetic physics in lower energy region under voxel geometry, especially used in radiation dosimetry. Electromagnetic processes in Geant4 were implemented in a parrallel computing platform CUDA, and we achieved 40 times speed-up comparing to simulations by single CPU.

7

Page 8: research.kek.jpresearch.kek.jp/people/iwai/annualreport2013/submitted_v... · Web viewIn FY2013, we prepared another perfSONAR server to cope with perfSONAR-MDM (Multi Domain Monitoring)

Figures

Fig. 4-4-2-2-1 History of monthly CPU usage rates

8

Page 9: research.kek.jpresearch.kek.jp/people/iwai/annualreport2013/submitted_v... · Web viewIn FY2013, we prepared another perfSONAR server to cope with perfSONAR-MDM (Multi Domain Monitoring)

Fig. 4-4-2-4-12 Growth of J-PARC network

9

Page 10: research.kek.jpresearch.kek.jp/people/iwai/annualreport2013/submitted_v... · Web viewIn FY2013, we prepared another perfSONAR server to cope with perfSONAR-MDM (Multi Domain Monitoring)

Fig. 4-4-2-4-23 Tokai-Tsukuba network bandwidth usage

10