ams computing y2001-y2002 ams technical interchange meeting mit jan 22-25, 2002 vitali choutko,...

25
AMS Computing Y2001- Y2002 AMS Technical Interchange Meeting MIT Jan 22-25, 2002 Vitali Choutko, Alexei Klimentov

Upload: naomi-caldwell

Post on 26-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

AMS Computing Y2001-Y2002

AMS Technical Interchange Meeting

MIT Jan 22-25, 2002

Vitali Choutko, Alexei Klimentov

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 2

Outline AMS Production Farm requirements architecture prototyping test of HW and SW components HW and SW evaluation for AMS02

Ground Segment Data Transmission SW Y2002 Milestones

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 3

AMS Ground Centers

Science Operations Center

POCCPOCCPOIC@MSFC AL

AMS Remotecenter

RT data CommandingMonitoringNRT Analysis

NRT Data Processing Primary storage Archiving DistributionScience Analysis

MC productionData mirror archiving

Exte

rnal

Com

mu

nic

ati

on

s

ScienceOperationsCenter

XTermHOSC Web Server and xterm

TReK WS

commandsMonitoring, H&S dataFlight Ancillary dataAMS science data (selected)

TReK WS“voice”loop

Video distribution

Production Farm

AnalysisFacilities

PC Farm

Data Server

AnalysisFacilities

GSE D S

A eT rA v e r

GSEBuffer dataRetransmitTo SOC

AMS Station

AMS Station

AMS Station

GSE

MC production

cmds archive

AMS Data, NASA data,

metadata

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 4

AMS Production Farm (requirements)

Reliability – High (24h/day, 7days/week) Performance goal – process data “quasi-online” (with

typical delay < 1 day) Disk Space – 12 months data “online” Minimal human intervention (automatic data handling,

job control and book-keeping) System stability – months Scalability Price/Performance

Complex system that consists of computing components including I/O nodes, worker nodes, data storage and networking switches. It should perform as a single system.Requirements :

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 5

AMS Production Farm (considerations)

Uniform node architecture ( dual-CPU Pentiums and AMDs) Uniform Operating System (RedHat Linux) Computing capacity equivalent to 400x450MHz PII processors (including 20% contingency and

reprocessing) Total of 10 Tbyte data stored online Two types of computers :

Considerations based on AMS01 data processing experience and MC production Y2000-2001 :

“Processing node” with cheap IDE disks used for transient data storage

“Server node” with IDE and SCSI RAID disks for persistent data storage

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 6

Y2001 milestonesHW evaluation to make a choice of platform and architecture (“official” AMS02 simulation/reconstruction code been used for the benchmarking)

Functional Goal : AMS01 STS91 Data Rerun and AMS02 MC production using production farm prototype and SW

7AMS TIM, MIT, Jan 22-25 2002V.Ch, A.K.

AMS02 Benchmarks

Executive time of AMS “standard” job compare to CPU clock

1) V.Choutko, A.Klimentov AMS note 2001-11-01

1)

Brand, CPU , Memory

Intel PII dual-CPU 450 MHz, 512 MB RAM

OS/Compiler

RH Linux 6.2 / gcc 2.95

“Sim”

1

“Rec”

1

Intel PIII dual-CPU 933 MHz, 512 MB RAM RH Linux 6.2 / gcc 2.95 0.54 0.54

Compaq, Quad α-ev67 600 MHz, 2 GB RAM RH Linux 6.2 / gcc 2.95 0.58 0.59

AMD Athlon,1.2GHz, 256 MB RAM RH Linux 6.2 / gcc 2.95 0.39 0.34

Intel Pentium IV 1.5GHz, 256 MB RAM RH Linux 6.2 / gcc 2.95 0.44 0.58

Compaq dual-CPU PIV Xeon 1.7GHz, 2GB RAM

RH Linux 6.2 / gcc 2.95 0.32 0.39

Compaq dual α-ev68 866MHz, 2GB RAM Tru64 Unix/ cxx 6.2 0.23 0.25

Elonex Intel dual-CPU PIV Xeon 2GHz, 1GB RAM

RH Linux 7.2 / gcc 2.95 0.29 0.35

AMD Athlon 1800MP, dual-CPU 1.53GHz, 1GB RAM

RH Linux 7.2 / gcc 2.95

0.24 0.23

8 CPU SUN-Fire-880, 750MHz, 8GB RAM Solaris 5.8/C++ 5.2 0.52 0.45

24 CPU Sun Ultrasparc-III+, 900MHz, 96GB RAM

RH Linux 6.2 / gcc 2.95 0.43 0.39

Compaq α-ev68 dual 866MHz, 2GB RAM

RH Linux 7.1 / gcc 2.95

0.22 0.23

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 8

AMS01 STS91 Data Rerun (performance)

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 9

AMS02 Benchmarks (summary) α-ev68 866MHz and AMD Athlon MP 1800+ have nearly the same

performance and are the best candidates for “AMS processing node”

(the price of system based on α-ev68 is twice higher than the similar one based on AMD Athlon)

Though PIV Xeon has lower performance, resulting 15% overhead comparing with AMD Athlon MP 1800+, the requirements of high reliability for “AMS server node” dictates the choice of Pentium machine.

SUN and COMPAQ SMP might be the candidates for AMS analysis computer (the choice is postponed up to L-12 months)

Conclusion : The total power of AMS02 processing

farm must be equivalent to 50 AMD Athlon MP 1800+ computers.

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 10

Production Farm (“AMS processing node” architecture)

ProcessorChip setMemorySystem DiskDisk ControllerDisks (transient storage)Ethernet Adapters

“public” “AMS private”

dual-CPU 1.5+GHzcurrently AMD1 GB RAMLVD SCSI3Ware IDE RAID6x120+GB IDE

100 Mbit/sec2x1 GBit/sec

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 11

Production Farm (“AMS server node” architecture)

ProcessorChip setMemorySystem DiskDisk ControllerDisks (permanent storage)Disk ControllerDisks (transient storage)Ethernet Adapters

“public” “AMS private”

dual-CPU 1.4+GHzcurrently Intel1 GB RAMLVD SCSIIPC SCSI RAID 8x180+GB SCSI3Ware IDE RAID7x120+GB IDE

100 Mbit/sec2x1 GBit/sec

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 12

Production Farm HW Tape Drive (“raw” data backup)

• IBM LTO Ultrium (connected to “server node” prototype) data transfer (write) RAID 5 array -> tape 11MByte/sec data transfer (read) tape -> Null device 19MByte/sec tape -> RAID 5 array 11MByte/sec tape capacity 200GB

(see also http://cscct.home.cern.ch/cscct/ultrium)

13

Archiving and Staging

Analysis FacilitiesData Server

Cell #1

#2

#8

PC Linux2x2GHz+

PC Linux2x2GHz+

PC Linux2x2GHz+

PC Linux2x2GHz+

PC Linux2x2GHz+

TapeServer

PC Linux2x2GHz+

PC Linux Server2x2GHz, SCSI RAID

TapeServer

DiskServer

DiskServer

DiskServer

Gigabit Switch (1 Gbit/sec)

Gigabit Switch (1 Gbit/sec)

Gigabit Switch (1 Gbit/sec)

PC Linux2x2GHz+

2xSMP,(Q, SUN)

AMS dataNASA datametadata

AMS Science Operation Center Computing Facilities

Production Farm

DiskServer

DiskServer

Sim

ula

ted

data

A.Klimentov Jan 15,2002

MC Data Server

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 14

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 15

AMS Computing Y2001 (SW) AMS production process/process communication and control SW (PPCC) and monitoring

Data Handling ORACLE DB to store metadata and catalogues (M.Boschini, A.Klimentov)

Data transmission package

Client/Server Corba technology (V.Choutko)Process Monitoring package (M.Boschini, V.Choutko, A.Klimentov)

Based on bbftp (A.Elin, A.Klimentov AMS note 2001-11-02)

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 16

AMS Production Highlights Excellent HW stability ( uptime more than 3 months) AMS01 STS91 data rerun (10 Linux boxes, 19 CPUs) Average efficiency 95% (cpu time/elapsed time) Processes communication and control via Corba LSF for process submission Oracle server on AS4100 Alpha and Oracle clients on Linux. Oracle RDBMS

Tag DB with 100M entries Conditions DB with 100K entries Bookkeeping Production

status Runs history File catalogues

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 17

Data Transmission SW High Rate Data Transfer between MSFC and POCC/SOC, POCC and SOC, SOC and MasterCopy

repositary(s) will become a paramount importance (tests with TReK between MIT and CERN, TReK is the best candidate for AMS commanding and transferring of data samples)

What should be used for the bulk data transfer ? Why not FileTransferProtocol (ftp) or ncftp , etc ? to speed up data transfer to encrypt sensitive data and not encrypt bulk data to run in batch mode with automatic retry in case of failure … starting to look around and came up with bbftp in September (bbftp developed in BaBar and used to transmit data from SLAC to IN2P3@Lyon) adapted it for AMS, wrote

service and control programs

1)

1) A.Elin, A.Klimentov AMS note 2001-11-022) P.Fisher, A.Klimentov AMS Note 2001-05-02

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 18

Data Transmission SW (the inside details) Server copy data files between directories (optional) scan data directories and make list of files to be transmitted purge successfully transmitted files and do book-keeping of transmission sessions

Client periodically connect to server and check if new data available bbftp new data and update transmission status in the catalogues.

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 19

Data Transmission SW (tests)

Location Line Mbit/sec

Program

RateMbit/sec

Prevessin ->Meyrin 10 ftpbbftpbbcp

5.87.88.0

Prevessin -> Prevessin

100 ftpbbftpbbcp

21.040.042.0

Prevessin -> Milano 16 bbftp 6.0

Server and client – dual-CPU Intel PIII , Linux OS. bbftp release 2.1.2 Transmit AMS01 “raw” data and AMS01 data summary files (Ntuples)Duration 12-24h

1)

1) M.Boschini installed bbftp in INFN Milano

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 20

AMS Computing Y2001

Y2001 milestones are fulfilled

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 21

AMS Computing Y2002

Build AMS02 “ production cell ” and use it for MC production

Build AMS02 “ analysis cell ” AMS02 process and data control SW (migrate

from OpenSource Corba to the licensed version)

“bbftp” tests between MIT and CERN, GSC@MSFC and MIT/CERN

Evaluate archiving and staging system for AMS (Jan 2002 - 4TB)

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 22“ Server Node #1 ”

“Processing Node #1”

AMS Computing Y2002 (“production cell”)

“Processing Node #5”

IDE

RAID

IDE

RAID Dual-CPU AMD

Processing Nodes 1-5Dual-CPU Athlon 1900+

1GB RAM3Ware IDE Raid

6x120GB Western Digital1 Gbit/sec ethernet

2x100MBit/sec ethernet

Server Node 1Dual-CPU Xeon or PIII

1GB RAM3Ware IDE Raid

7x120GB Western DigitalIPC SCSI Raid

8x160GB WD disks1 Gbit/sec ethernet

2x100MBit/sec ethernet

1Gbit/ses AMS private

100

Mb

it/sec C

ER

N

backb

on

e

Dual-CPU Intel

Dual-CPU AMD

IDE

RAID

IDE

RAID

Analysi

s programs

SCSI

RAID

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 23

AMS Computing Y2002 (“analysis cell”)

2 dual-CPU AMD Athlon dedicated for AMS analysis and Geant4 simulation.

Architecture is similar to “AMS processing node” (but 4 channels IDE RAID controller with 4x120GB WD HDD)

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 24

Y2002 Milestones AMS computers upgrade (1Q) AMS “production cell” (1Q) AMS “analysis cell” (2Q) Data transmission tests (2Q) Evaluation of archiving and staging

systems (technical meeting with CASPUR Feb/Mar, system choice 3Q)

AMS data handling and PPCC SW, Licensed CORBA package (3Q)

V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 25

Growth of computers and data storage in Science Operation Center