there and back again: how we drank the chef kool-aid, sobered up, and learned to cook responsibly
TRANSCRIPT
Charity Majors @mipsytipsy
Charity Majors @mipsytipsy
There and back again: a Chef tale
How we drank the Kool-Aid, sobered up, and learned to cook responsibly.
Mobile apps platform
500k+ apps
AWS
MongoDB, Cassandra, Mysql, Redis
ruby & rails => golang
Our mission:
• Support relentless growth
• Ship products fast
• Solve mobile apps naively at scale
Active monthly Parse installations
API requests per second
• Support relentless growth
• Ship products fast
• Solve mobile apps naively at scale
Our mission:
our mission
your mission
Chef the Base System!!
• bootstrapping nodes with knife-ec2
• configuring system packages
• managing deb versions
• ec2 hostname tags from chef node names
• route53 DNS records from hostname tags
• cron jobs, batch jobs
Chef the Services!!
• haproxy configs
• generate yaml files
• generate host lists
• manage config files for Parse services
• monitoring and graphing based off roles
Chef the Databases!!
• creating/managing mongo replica sets
• provisioning & assembling RAID devices
• assigning cassandra initial tokens
• backups, snapshotting & restores
• community cookbooks for mysql, redis
Chef the Deploys!!
• deploy Parse services?
….??????
wait …
1) Things we did with chef badly
2) Things that chef was not the right tool for
mistakes were made …
• Overloading roles with too much work
• Confusion between role vs instantiation of service
• Using definitions instead of providers
• Using lots of data bags
• One attribute per config entry instead of a hash of all entries
• Using knife search extensively
mistakes were made …
• Forking + modifying community cookbooks
• Importing community cookbooks with too many custom dependencies
• Not using repo-per-cookbook / Berkshelf
• Not investing the time into vagrant, unit tests, staging environment, versioning
• Where is my source of truth?!
but these are all solvable problems.
but these are all solvable problems.
what isn’t?
sometimes, chef just ain’t enough.
• Provisioning from scratch
• Service registration & discovery
• Managing software & configs
• Databases
Problem areas
bootstrapping from vanilla AMIs
launching instances with knife-ec2
Provisioning
bootstrapping from vanilla AMIs
launching instances with knife-ec2
Provisioning
Solution: bake AMI with chef, use ASGs
realtime search needs realtime data
Service discovery
realtime search needs realtime data
Service discovery
Solution: zookeeper, consul, etcd, etc
Service discovery
avoid snowflake hosts
use distributed locking for cron jobs
Managing software & configs
• System software (debs, rpms)
• Developer-owned services
• Internal operations software
Managing software & configsSystem software
Managing software & configsDeveloper-owned services
• Do not tie code deploys to system changes
• Perform the minimal set of changes
• Configs *are* software. Version together.
Managing software & configsInternal operations software
• Treat software engineering like software engineering
• Treat systems-y packages like systems packages
• Package and version “util” scripts
• Manage package versions with Chef
Databases at scale
DatabasesDBA operations
Not really what chef is best at.
Imperative commands
Automatic remediation
Coordinating actions across nodes
DatabasesDBA operations
• Create, tear down replica sets or nodes
• Verify backups
• Rolling version upgrade
• Elect new primary / switch masters
• Enable/disable query killer
• Change schemas or indexes
• Compaction, rotation
• Version replica set state
• Etc
DatabasesDBA operations
If you don’t have to do a ton of DBA ops, Chef can manage databases.
Don’t over-engineer in advance of your actual needs.
DatabasesSeparation of configuration and state
Base system => chef
Detect and publish state changes => chef, zk
Generate monitoring configs => chef
Imperative commands => db tooling
Databases at scale
We chef for:
• Building base AMIs
• Generating monitoring configs
• Storing encrypted secrets
• Cron jobs (with zk lock)
• Inferring and publishing db state changes
Things we still suck at
• Single source of truth (git / chef-server)
• Isolated staging environment
• Full continuous testing for cookbooks
• Realtime data
• Internal software packaging & management
• Database administration at scale
Things we don’t chef
Charity Majors
@mipsytipsy