puppet camp cern geneva

25
A Puppet Infrastructure at CERN Steve Traylen CERN IT Department [email protected] Puppet Camp, Geneva, CH. 11 July 2012

Upload: steve-traylen

Post on 11-May-2015

839 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Puppet Camp CERN Geneva

A Puppet Infrastructure at CERN

Steve Traylen CERN IT Department [email protected]

Puppet Camp, Geneva, CH.

11 July 2012

Page 2: Puppet Camp CERN Geneva

Outline

•  CERN and Computing for High Energy Physics

•  Today’s CERN IT Deployment –  Why and What’s changing

•  Adoption of Puppet, Foreman, … –  Progress, Integration –  Difficulties –  Future

Puppet Camp Geneva - CERN

Page 3: Puppet Camp CERN Geneva

CERN

§  Conseil Européen pour la Recherche Nucléaire §  aka European

Laboratory for Particle Physics

§  Facilities for fundamental research

§  Between Geneva and the Jura mountains, straddling the Swiss-French border

§  Founded in 1954

Page 4: Puppet Camp CERN Geneva

The Large Hadron Collider

§  Accelerator for protons against protons – 14 TeV collision energy §  By far the world’s

most powerful accelerator

§  Tunnel of 27 km circumference, 4 m diameter, 50…150 m below ground

§  Detectors at four collision points

Page 5: Puppet Camp CERN Geneva

The  LHC  Computing  Challenge  

�  Data volume è 15 PetaBytes of new data

each year �  Global compute power

è 250k CPU cores è 100 PB of disk storage

�  Worldwide analysis & funding �  Distributed computing

infrastructure to provide the production and analysis environments for the LHC experiments

�  Managed and operated by a worldwide collaboration between the experiments and the participating computer centres

�  Distributed for funding and sociological reasons Puppet Camp Geneva -

CERN

Page 6: Puppet Camp CERN Geneva

Motivation to Change Tools

•  CERN data centre is reaching its limits: –  IT staff numbers remain fixed –  more computing capacity is needed

•  Inefficiencies exist but root cause cannot be easily identified –  Tools becoming increasingly brittle and difficult to adapt

•  E.g porting of tools to IPv6 would need a development project

–  Some core components cannot be scaled up

Puppet Camp Geneva - CERN

Page 7: Puppet Camp CERN Geneva

Second CERN Data Centre

•  Wigner Institute in Budapest, Hungary •  Hands off facility, hardware support only •  Deploying 2012 to 2014

Puppet Camp Geneva - CERN

Page 8: Puppet Camp CERN Geneva

Infrastructure Tools Evolution

•  We had to develop our own toolset in 2002 –  “Extremely Large Fabric Management System” or http://cern.ch/ELFms –  Included Quattor for configuration

•  Nowadays, –  CERN compute capacity is no longer leading edge –  Many options available for open source fabric management –  We need to scale to meet the upcoming capacity increase

•  If there is a requirement which is not available through an open source tool, we should question the need –  If we are the first to need it, contribute it back to the open source tool

Puppet Camp Geneva - CERN

Page 9: Puppet Camp CERN Geneva

Infrastructure as a Service •  Goals

–  Improve repair processes with virtualisation –  More efficient use of our hardware –  Better tracking of usage –  Enable remote management for new data centre –  Support potential new use cases , e.g Cloud –  Sustainable support model

•  At scale for 2015 –  15,000 servers –  90% of hardware virtualized. –  300,000 VMs needed.

•  Plan = OpenStack Adoption

Puppet Camp Geneva - CERN

Page 10: Puppet Camp CERN Geneva

Chose Puppet for Configuration

•  The tool space has exploded in the last few years –  In configuration management and ops –  Large, shared ‘tool forges’, and lots of experience

•  Puppet and Chef are the clear leaders for the ‘core’ tool •  Many large-scale enterprises use Puppet

–  Its declarative approach fits better with what we are used to in Quattor. –  Large installations: friendly, wide-base community and commercial support

and training –  You can buy books on it –  You can employ people who know puppet better than you do

Puppet Camp Geneva - CERN

Page 11: Puppet Camp CERN Geneva

Deployed System

Page 12: Puppet Camp CERN Geneva

Starting with Puppet

•  Puppet was and is trivial to setup: –  Anyone can do it in a day:

•  Configuring something with puppet is easy •  What’s hard:

–  Deciding module scope and interaction with one another. •  Three modules editing grub.conf or one

–  We started early 2012 with very little plan in the area of module organization

Puppet Camp Geneva - CERN

Page 13: Puppet Camp CERN Geneva

Downloading Puppet Modules

•  Expectation at start – all done for us: –  ssh, iptables , sysctl , apache, mysql all done –  example42 or similar can do everything.

•  Reality –  Modules often not quite correct.

•  Too simple, –  e.g. I want my sshd_config to be different in two places.

•  Too much abstraction –  I want to use puppet and not some abstraction of 100s of

variables covering every possible case »  e.g puppet with(out) passenger. I only want one

–  Parameterized classes and Foreman don’t really work •  Resulting modules are not shareable – ENC globals vs params

Puppet Camp Geneva - CERN

Page 14: Puppet Camp CERN Geneva

Sharing and Fixing Modules

•  Not as easy as it should be: –  Our modules are littered with CERNisms

•  ntpservers, subnets, authorization systems, .. •  Adaption to work with foreman •  All of us learning puppet and doing things quickly (badly)

•  Hiera is being used now: –  Provides the code vs data separation we had with

Quattor –  Dozens of ways to setup and (ab)use hiera –  Little experience with this anywhere yet –  Hiera should make modules more sharable across sites

•  Looking forward to it becoming the normal standard thing that modules use and every one benefits from

Puppet Camp Geneva - CERN

Page 15: Puppet Camp CERN Geneva

Sharing Modules With All

•  A big aim is to share our modules as much as possible with everyone but in particular: –  CERN IT not the only puppet deployment at CERN

•  ATLAS Point 1 farm at CERN runs puppet

–  ATLAS analysis in the cloud has used puppet –  International HEP Labs use or are switching to puppet –  Puppet was the “winner” at recent CHEP fabric session

•  Presentations from CERN, BNL, PIC, ATLAS

•  We will share here but its early days: –  http://github.com/cernops

Puppet Camp Geneva - CERN

Page 16: Puppet Camp CERN Geneva

Organizing Modules On Disk

•  Started with all modules in one directory in git: –  Obviously wrong, great confusion for new comers

•  Current situation two directories in git: –  Modules – reusable items – e.g firewall, apache, sysctl, .. –  Manifests – top level service, e.g batch machine, public

login machine •  Future plans:

–  Split up modules into local and downloaded •  modules like puppetlabs-firewall mixed with our own junk •  Will allow us to track /contribute to upstream better

–  Inline with puppet’s upcoming vendor path

Puppet Camp Geneva - CERN

Page 17: Puppet Camp CERN Geneva

Configuration Complexity,

•  We have many configurations of service. –  Puppet handles this diversity well

•  We have many administrators >= 300 –  These admins change, are on different continents –  Less obvious what to do with Puppet

150 clusters ranging form 1 to 3000 hosts.

Puppet Camp Geneva - CERN

Page 18: Puppet Camp CERN Geneva

Trust Amongst SysAdmins

Git Repository

Puppet Master(s) for SysAdmin Team A

Puppet Master (s) for SysAdmin Team B

Team A’s Nodes

Team B’s Nodes

All share one git repository Rely on code review. git branches and environments.

Teams use their own puppet masters. hiera-gpg key for each team. Host acl on puppet masters.

•  The full implications of this lack of trust between admins is unclear –  Interested to hear what others have done.

Page 19: Puppet Camp CERN Geneva

Change Control, Dev Cycle

•  Core team maintaining OS and basics: –  Hardware monitoring, ntp configuration, accounts, ..

•  Specialized teams maintaining services on top: –  They are ultimately responsible for service stability –  We don’t want NTP configured 150 different ways

•  Requirements: –  Some services will follow core updates –  Some service will choose when to take core updates –  Parts of services may follow latest updates –  LHC has physical shutdowns for doing timely updates

Puppet Camp Geneva - CERN

Page 20: Puppet Camp CERN Geneva

Change Control , Dev Cycle

•  Puppet Environments map to Git Branches: –  Nodes in Production, Testing and Devel branches –  Big new configurations being tested in feature branches

•  A few nodes in these feature branches

–  Some services live isolated in their own branch •  Risk of divergence

•  Current process: –  A blind weekly devel -> production merge

•  Next Process: –  Use Atlassian’s Crucible and Fisheye products to code

review puppet configuration

Puppet Camp Geneva - CERN

Page 21: Puppet Camp CERN Geneva

Crucible Reviewing Manifest

•  Atlassion themselves use puppet and do this –  http://blogs.atlassian.com/2011/09/puppet_change_management_for_devops/

Puppet Camp Geneva - CERN

Page 22: Puppet Camp CERN Geneva

Hardware Provisioning

•  Up to now a homegrown tool in use: –  Has strong similarities to puppet labs new Razor

•  Razor is being followed, tracked for the moment –  Final step of tool adds host to foreman

•  We are using foreman – happy with it: –  Kickstart templating is great –  Organising hosts into hostgroups is great –  We will now invest time to integrate foreman with CERN

services: •  CERN network database , our master for switches, DNS, … •  AIMS kerberos managed tftp server •  CERN CA – We have our own CA used by other services also

– We will use this for puppet also

Puppet Camp Geneva - CERN

Page 23: Puppet Camp CERN Geneva

Virtual Machine Provisioning

•  Existing Microsoft HyperV infrastructure: –  3000 Virtual Machines of which 70 puppet managed –  VMs pre-seeded into a foreman hostgroup –  VMs being kickstarted onto puppet and foreman

•  Puppet managed OpenStack Nova –  Today aiming at 200 hypervisors with up to 4000 puppet

managed VMs. –  Machine Images created with Oz –  Machines NOT pre-seeded in foreman or puppet

•  Register at boot time –  amiconfig and cloud-init for contextualizing

•  pass puppet server and foreman hostgroup to image

Puppet Camp Geneva - CERN

Page 24: Puppet Camp CERN Geneva

Next Steps till End of Year

•  Migrate to PuppetDB –  (300,000 nodes => 300 GB RAM)

•  Look at puppet dashboard •  Use mcollective for something:

–  Necessary as node number increases –  Currently set up but not being used particularly

•  Check Foreman’s integration with OpenStack •  Migrate more services from Quattor to Puppet •  Decide a scheme for secure blob delivery:

–  hiera-gpg or ACL’ed puppet fileserver

Puppet Camp Geneva - CERN

Page 25: Puppet Camp CERN Geneva

Conclusions

•  Migrating to Puppet –  Largest change in our deployment for 5 years

•  Has all been fairly painless: Difficulties: –  forced to integrate to existing stuff sometimes –  Doing things wrong first time

•  lack of in house experience

•  300,000 VMs in 2015? –  puppet easy to scale, more hardware can be added –  We expect to dedicate up to 100 of cores to puppet

•  It’s a joy to work with an active community

Puppet Camp Geneva - CERN