devops down-under

Download Devops down-under

If you can't read please download the document

Upload: robert-postill

Post on 25-Jun-2015

223 views

Category:

Technology


0 download

TRANSCRIPT

  • 1. DevOps Down Under 2011 Sprinkling DevOps Magic in Other People's Environments Robert Postill

2. How's This Gonna Go Down? 3. How's This Gonna Go Down?

  • Everybody's got a story and this is our's 4. Our first architecture 5. Learning from failure 6. A brief aside 7. Getting better 8. Tough messages 9. Where to from here

10. C3's Story

  • There was a dream 11. It makes the excel go into the data warehouse 12. And it's done badly 13. So we built a prototype 14. Then we made a sale

15. A little bit of how it works 16. Priorities of our first architecture

  • Works! 17. Restarts when the machine restarts 18. Remotely deploy updates 19. Not a lot of state on the VM

20. Our first architecture 21. Our first architecture 22. Lesson: Most customers will accept a small selection of services if you give them a report from that service 23. create_deployment.sh

  • Poor man's capistrano 24. A shell script that:
    • Fetched the latest from github 25. Exported it to a datestamped directory 26. Made a set of symlinks point to the right places 27. Restarted the app

28. Flaws

  • We knew practically nothing about what washappening on the box 29. The logs... THE LOGS FIX THOSE FREAKING LOGS!!!

30. And the worst flaw of all...

  • We started to get calls that started with:
    • Integritys down, what's the score?
  • Then we'd have a look... 31. And it would be the database

32. Lesson: Things you don't own go badly wrong and the first people to know are the end users 33. A lot of sad face 34. So we revved the architecture 35. Then more stuff happened...

  • We continued to get calls that started with:

Integritys down, what's the score?

  • Then we'd have a look... 36. And it would be the VM, mounted disks read-only

37. Lesson: Virtual Machines are prone to at least a couple of novel modes of failure 38. Which started to lead to the inevitable 39. So the next problem... Us

  • New Relic gives you slow transaction reports 40. In ruby select, collect and friends are ways of making in memory decisions over collections of things 41. Which works on test set sizes of ten or so 42. But doesn't on large volumes of things, like say a couple of million objects 43. We'd created a technical debt mountain

44. Hiring someone new 45. A brief trip to the metaworld

  • We're devops by necessity 46. There is no ops department 47. Our devs cover a lot of ground
    • Architecture 48. Operations 49. Database Administration 50. Networking 51. Support 52. Business Analysis

53. Behold the AnDevOpSuptecht

  • It used to be that a lot of places had Systems Programmers 54. Now it feels like architects are going the same way 55. Where's the limit going to be drawn on the responsibility of an individual... 56. Are we thinking about the roles we play in the wrong way?

57. Crap Maths Applied To Recruitment

  • Australian Population : 21,874,900 58. Melbourne Population: 3,478,138 59. 22.6% ' professionals' in 2006 census: 786,059 60. Professionals in 'information, media and telecoms': 14,246 61. Spolsky says 1 in 200 dev applicants can dev, leaving: 712 62. TIOBE Index says Ruby is used by 1.484% of devs: 10

63. Crap Maths Applied To Recruitment

  • Australian Population : 21,874,900 64. Melbourne Population: 3,478,138 65. 22.6% ' professionals' in 2006 census: 786,059 66. Professionals in 'information, media and telecoms': 14,246 67. Spolsky says 1 in 200 dev applicants can dev, leaving: 712 68. TIOBE Index says Ruby is used by 1.484% of devs: 10

69. So...

  • Before we look into
    • Team fit 70. Seniority 71. Skills (Ubuntu, Databases, Business intelligence...)
  • I need a lie down :( 72. Congratulations to you in Melbourne who do hire devops! 73. Do we need to think about apprenticeships?

74. Lesson: You need good people, really good people 75. Meanwhile, back at the point... 76. Looking To Get Smart

  • We wanted to get start deploying to numbers of machines (> 10) 77. We needed a way to start automating deployment 78. Have you seen this chef thing? 79. So we started creating recipes

80. But we had issues

  • I don't want to beat up on chef 81. The development of our architecture was *much* slower through chef 82. We lost our chef database 83. We tried to run chef server internally on two instances 84. We spent a lot of time learning things like never use the ui, only ever use data bags 85. chef changed too fast and we also changed too fast

86. Lesson: The tools may not be mature enough and more importantly you may not be mature enough to use them 87. So now we...

  • Take a stock Ubuntu VM 88. Customise via capistrano scripts 89. Snapshot, distribute 90. Update via capistrano and create_deployment.sh 91. Distribute SSH keys via chef

92. And the customers kept on ringing

  • In particular there was the terrible case of the wild performance swings 93. New Relic would give us 6x, 4x, 12x performance swings dependant on the week. 94. We'd see CPU spikes and terrible loads applied to the mongrels as users got frustrated

Integritys slow, what's the score?

  • And we'd see... not much

95. And that got difficult

  • We had to start asking for VMWare metrics 96. Our working assumption was the same version does not pitch and roll like this 97. Lets be honest what we're saying is we don't think you can manage your own infrastructure 98. Explicitly :(

99. A lot of thinking... 100. Little by little we ground out answers

  • We found out there wasn't a lot of separation between VMs 101. Then we found out the VMs were moving over different physical hosts (vMotion) 102. And then we started to get a handle on overcommitment

103. Lesson: Smart tools can play havoc with performance 104. Lesson: VMWare (or their competitors) is not a magic well 105. Where we are now 106. Where we are now 107. There's plenty for us still to do

  • Retire create_deployment.sh 108. Automate deployment 109. Refactor the architecture to give us scalability over numerous machines 110. Deploy to only part of the architecture 111. Deploy based on need

112. Wrapping Up

  • Pushing your stuff into other people's environments is hard 113. Back yourself with the stats and share them 114. Make sure your app has sufficient canaries 115. Find good people 116. Prepare for tough conversations

117. Questions?

  • Photo credits (in order of appearance):
  • http://www.flickr.com/photos/ricoslounge/38351363/- ricoslounge 118. http://www.flickr.com/photos/jima/3435396513/- jima 119. http://www.flickr.com/photos/34495711@N06/3613301938/- Aaron Frutman 120. http://www.flickr.com/photos/dancoulter/21042744/- Dan Coulter 121. http://www.flickr.com/photos/abennett96/2639105060/- BenSpark 122. http://www.flickr.com/photos/bcymet/1923368669/- bcymet