ansible at scale - il - ansible at scale.pdf · ansible at scale ansible israel, may 9, 2016 david

Download Ansible at Scale - IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David

Post on 26-Jul-2018

215 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Ansible at Scale

    Ansible Israel, May 9, 2016

    David Melamed

    Senior Research Engineer, CTO Office, CloudLock

    dmelamed@cloudlock.com @dvdmelamed

  • Who is this guy?

  • 4 B

    Where is he working?

    Founded: 2011

    Corporate Headquarters: Waltham, Mass. (U.S.A.)

    R&D Headquarters: Tel Aviv

    Employees: 140 (30 in TLV)

    Trusted by major brands:

    157K APPS

    10 MUSERS ACTIVITIES

  • 01 Ansible main notions

  • What is Ansible?

    Open-source configuration automation tool Written in Python and easily extensible Agent less (only requires SSH / WinRM) Idempotent modules Ad hoc task execution Reusable list of tasks Code deployment

  • Inventory

    WEB SERVERS DAEMON SERVERS FILE SERVERS

    COMPUTING CLUSTER

    [webservers]192.168.1.12192.168.1.13192.168.1.19

    [daemonservers]192.168.1.34192.168.4.24

    [vpc]webserversdaemonservers

    Static inventory

    VPC

  • Task, play & playbook

    - name: check server is aliveaction: ping

    - name: update app configurationaction: copy src=myapp.conf dest=/etc/myapp/prod.conf

    ...

    task

    play

    playbook

  • Role

    - tasks main.yml

    - handlersmain.yml

    - templatestemplate.conf.j2

    - filesfile1.txt

    - varsmain.yml

  • Vault

    Put all secrets in one place Store secrets into git

  • 02 Our requirements

  • CloudLock requirements

    Multiple environments (AIO vs. VPC, AWS vs. AppEngine) Multiple environment types (local / stage / prod) 10 different VPCs with different access levels VPCs with ~ 100 machines of several types Multiple small repos (python package) with dependencies Zero-downtime deployment as much as possible

  • Multiple stacks & environments

    Web server(Angular app)

    My laptop(OSX)

    Your laptop(Ubuntu)

    Multi-tier env.in AWS

    AIOin AWS

    Multi-tier env.in AWS

    LOCAL STAGE PROD

    API server(Flask app)

    Database(PostgreSQL or RDS)

    Cache server(Redis or ElastiCache)

    Message Queue(RabbitMQ)

    PRE-PROD

    Multi-tier env.in AWS

  • 03 Ansible profiling

  • Profiling Ansible (1)

    Install callback plugin https://github.com/jlafon/ansible-profile

    Other interesting plugins: Human-readable plugin Ansible-report

    https://github.com/jlafon/ansible-profilehttp://blog.cliffano.com/2014/04/06/human-readable-ansible-playbook-log-output-using-callback-plugin/http://blog.cliffano.com/2014/04/06/human-readable-ansible-playbook-log-output-using-callback-plugin/http://pythonhackers.com/p/sfromm/ansible-reporthttp://pythonhackers.com/p/sfromm/ansible-report

  • Profiling Ansible (2)PLAY [Deploy | Ensure database and user] *************************** Thursday 15 October 2015 09:51:01 +0000 (0:00:01.786) 0:00:12.318 ****** ===============================================================================

    TASK: [storage/postgresql-database | Create | Ensure database from database variable] *** Thursday 15 October 2015 09:51:01 +0000 (0:00:00.011) 0:00:12.329 ****** ok: [sandbox]

    TASK: [storage/postgresql-database | Create | Ensure database user from database.user variable] *** Thursday 15 October 2015 09:51:01 +0000 (0:00:00.163) 0:00:12.493 ****** ok: [sandbox]

    TASK: [storage/pgbouncer | Start pgBouncer] *********************************** Thursday 15 October 2015 09:51:09 +0000 (0:00:00.242) 0:00:20.782 ****** ok: [sandbox]

    TASK: [storage/pgbouncer | Bump file descriptor limits] *********************** Thursday 15 October 2015 09:51:09 +0000 (0:00:00.177) 0:00:20.960 ****** changed: [sandbox] => (item=hard)changed: [sandbox] => (item=soft)

    ...

    PLAY RECAP ******************************************************************** module1 | Install | Ensure modules ------------------------------------- 13.14smodule2 | Install pgBouncer --------------------------------------------- 7.51smodule3 | Install | Clean/uninstall modules ----------------------------- 6.85smodule4 | Install | Ensure core installed ------------------------------ 4.66s...Thursday 15 October 2015 09:52:49 +0000 (0:00:00.023) 0:02:00.236 ****** =============================================================================== sandbox : ok=142 changed=82 unreachable=0 failed=0

  • 04 Tips for scale support(faster & easier to maintain)

  • Factors impacting ansible speed

    SSH connection Facts gathering Tasks performed serially Redundant tasks

  • Improving SSH speed

    Persistent connection (default on for SSH) ControlMaster=auto ControlPersist=60s

    SSH pipelining (1 connection per task) Requires disabling requiretty

  • Ansible configuration

    Commit your ansible.cfg Control facts gathering (gathering)

    implicit (default) - always discover the facts explicit - use facts cache, not used unless defined in play smart - use facts cache, discover facts for new hosts

    Control the number of parallel processes (forks) default is 5 we use 25

    SSH args / SSH pipelining

  • Inventory

    Make your ansible code environment agnostic Machine grouping by environment or by role type Hierarchical inventory Vault per environment Dynamic inventory for better cloud support Use dedicated machine to deploy (ansible-workstation)

  • CloudLock static inventory overview

    inventory/ | |---- environments | |----- allinone |----- beta |----- demo |----- dev1 |----- dev2 |---- qa1 |----- qa2 |---- group_vars | |----- allinone/ |----- beta/ |----- demo/ |----- dev1/ |----- dev2/ |---- qa1/ |----- qa2/

    + use of route53 for internal DNS

  • EC2 dynamic inventory

    Python script using boto List of instances + hostvars Use instance names or IPs Groups by instance tags, vpc, List cached

    "ec2": [ "52.", "52.", "52.", ], "tag_Environment_prod": [ "52.", "52..", "54.." ], "tag_Name_prod_bastion": [ "54." ], "tag_Name_Report_Decryptor": [ "52.." ], "tag_Name_devpi": [ "52.." ]

  • Playbooks

    Tasks executed synchronously Segment roles/groups to leverage parallel forks

    Use tags to add modularity (i.e. config, deploy) Name each task Limit conditional execution in roles, put them in the

    playbooks instead

  • Tasks & Roles

    Make your role generic and simple Role should be decoupled from inventory Keep your configuration separate Tasks should be idempotent Use include for sub-roles Try to avoid redundant tasks (use AMI) Share handlers with a global role Avoid using command and shell and use appropriate modules instead

    - roles/

    ci/

    jenkins/

    jobs/

    monitor/

    cloudwatch/

    nagios/

    platform/

    base/

    component-a/

    component-b/

    events/

    setup/

    teardown/

    system/

    web/

  • Vault

    Encrypt only what is necessary No way to merge 2 encrypted files Several tools to improve vault management

    https://github.com/building5/ansible-vault-tools https://gist.github.com/benzado/7bf5aa15e15d2d0d0380

    https://github.com/building5/ansible-vault-toolshttps://github.com/building5/ansible-vault-toolshttps://gist.github.com/benzado/7bf5aa15e15d2d0d0380https://gist.github.com/benzado/7bf5aa15e15d2d0d0380

  • ansible-playbook vs ansible-pull

    Regular mode: connect to server and deploy Pull mode: pull from repo on remote and execute Syntax: ansible-pull -U git://github.com/REPO.git -d DEST_DIR Example of cron install using ansible

    https://github.com/ansible/ansible-examples/blob/master/language_features/ansible_pull.yml

    https://github.com/ansible/ansible-examples/blob/master/language_features/ansible_pull.ymlhttps://github.com/ansible/ansible-examples/blob/master/language_features/ansible_pull.yml

  • CI for Ansible

    Test locally with vagrant / docker PR reviews (issue with vault changes) Jenkins job deploying to AIO + github hook

    Coming soon: unit tests (ansible-kitchen)

    https://github.com/neillturner/kitchen-ansible

  • Ansible 1.9 vs. Ansible 2.0

    Some breaking changes A lot of new cloud modules (i.e. ECS, VPC)

  • Results

    Before: deployment to VPC took several hours After: ~ 20 min for a full deployment

  • More about Ansible

    Awesome Ansible: https://github.com/jdauphant/awesome-ansible

    Ansible for DevOpshttps://leanpub.com/ansible-for-devops

    https://github.com/jdauphant/awesome-ansiblehttps://github.com/jdauphant/awesome-ansiblehttps://github.com/jdauphant/awesome-ansiblehttps://leanpub.com/ansible-for-devopshttps://leanpub.com/ansible-for-devops

  • Cloudlock is looking for talents

  • Questions/feedback

Recommended

View more >