simple steps and tips to improve it infrastructure operations #yapcasia #yapcasiae
TRANSCRIPT
Simple steps and tips to improve IT infrastructure operations2015-08-22 15:30-16:30@ YAPC::Asia Tokyo 2015 (Day 2)
Yuichiro Saito (@koemu)
© Yuichiro Saito (@koemu), 2015 1
Information
• Feel free to take photos, tweet, and blog about this talk !
• Twitter hashtag: #yapcasiaE
• Slides will be uploaded to slideshare.
© Yuichiro Saito (@koemu), 2015 2
Yuichiro Saito (@koemu)
• Software Engineer @ HEARTBEATS Corp. (MSP)
• Specialty: Improve Engineering Productivity
• Work Experience: Stock analysis system development, CMS, EC, NLP, Smartphone game development, AR
© Yuichiro Saito (@koemu), 2015 4
What are you ?
• Software Engineer ?
• Operation Engineer ?
• Project Manager ?
• Other ?
© Yuichiro Saito (@koemu), 2015 7
TOC
• 0: Background
• 1: People WANT to talk about their problems
• 2: Involve closest person
• 3: Evaluating performance and beyond
© Yuichiro Saito (@koemu), 2015 8
About HEARTBEATS (HB)
• MSP (Managed Service Provider1)
• 24/7 manned monitoring.
• Stable, secure, customer-centric service.
1 MSPAlliance, “Definition of Managed Services”, http://www.mspalliance.com/blog/definition-of-managed-services/ (Web)
© Yuichiro Saito (@koemu), 2015 10
In 2012
• External
• Raise of IaaS (AWS, etc ...)
• Easier to handle service instances with IaaS (compared to on-premise).
• Internal: Environment within HB
• Manual operations.
• Oh, It's archaic :-(
© Yuichiro Saito (@koemu), 2015 11
1: People WANT to talk about their problems
© Yuichiro Saito (@koemu), 2015 12
Changes
• More instances to operate than on-premise era.
• Able to handle instances with software.
© Yuichiro Saito (@koemu), 2015 13
Problems
• More overtime work
• Manual operations in the IaaS era !
• Decreasing efficiency
• Slow lead time.
• Delayed adoption of new technologies.
© Yuichiro Saito (@koemu), 2015 14
Wow, this is sucks !
• Miss out on business opportunities.
• We must abandon our manual operations, or else we will be left behind in the IaaS era.
© Yuichiro Saito (@koemu), 2015 15
We need improvement
• But... improvement is inseparable from resistance!
• But we still need to solve our problems.
• If I can make life easier for people, they will be more susceptible to improvement.
• DeMarco said: "People can't embrace change unless they feel safe."2
2 Tom DeMarco, “The Deadline: A Novel About Project Management”, Dorset House, 1997
© Yuichiro Saito (@koemu), 2015 16
Understanding the problem
• Find Hints
• We don't have comprehensive information.
• Identify "goal-reality" pairs3
• e.g.) Management principle (goal) - Efficiency (reality)
3 Eric G. Flamholtz, Yvonne Randle, “Growing Pains: Transitioning from an Entrepreneurship to a Professionally Managed Firm”, Jossey-Bass, 2000
© Yuichiro Saito (@koemu), 2015 17
Expectation
• "Inner Work Life"4
• Performance is affected by motivation.
• Motivation is affected by happiness.
4 Amabile, T.M. and Kramer, S.J., “Inner work life: understanding the subtext of business performance”, Harvard Business Review, pp. 72-83, May 2007
© Yuichiro Saito (@koemu), 2015 18
Two types of difficult tasks
1. Short time commitment and many repetitions.
2. Infrequent but long time commitment.
© Yuichiro Saito (@koemu), 2015 20
Short time and TOO MANY repeat
Cause• Inefficient manual operations.
• Operating environment is not automated.
© Yuichiro Saito (@koemu), 2015 21
Few repeats but LONG time
Cause• Requires manual operation, lack of sharing of
information.
• The replacement personnel runs into unexpected trouble due to lack of communication during transition.
© Yuichiro Saito (@koemu), 2015 22
Analysis parameters
• Two factor
• Difficulty
• COST
© Yuichiro Saito (@koemu), 2015 23
What's HARD ?
• Mental stress
• Workload
• Short lead time
• Dependent on individual skills
• Technical unfeasible
© Yuichiro Saito (@koemu), 2015 24
What's COST
• Preparation cost
• Development cost
• Introduction, training cost
• Cost impact to the organization
© Yuichiro Saito (@koemu), 2015 25
© Yuichiro Saito (@koemu), 2015 26
© Yuichiro Saito (@koemu), 2015 27
Our conclusion so far
1. People will be willing to change if they understand that it will make their life easier.
2. Identify what is making their life difficult.
3. Prioritize according to difficulty and cost.
© Yuichiro Saito (@koemu), 2015 28
Developed internal product
• hb-agent5: Automatic server building tool
• hb-gendoc: Automatic document building tool
• Cacti bulk configuration tool
• hb-acns6: Our topic !
6 Yuichiro Saito, “hb-acns - サーバ監視・メトリック取得設定の自動化システムのご紹介 - インフラエンジニアway”, 2015, http://heartbeats.jp/hbblog/2015/06/hb-acns.html (blog)
5 Yuichiro Saito, “hb-agent - 構築・監視項目検出自動化ツール hb-agentのご紹介 - インフラエンジニアway”, 2015, http://heartbeats.jp/hbblog/2015/05/hb-agent.html (blog)
© Yuichiro Saito (@koemu), 2015 31
What's hb-acns ?
• Automatic register to monitoring system (Nagios, Cacti).
• Free ALL engineers from the pains of manual configuration.
• Preconfigureed rules can be converted into scripts.
© Yuichiro Saito (@koemu), 2015 32
Why ?
• Impossible to manually set up monitoring settings for all IaaS instances.
© Yuichiro Saito (@koemu), 2015 33
Two targets
• Small terms with many projects.
• Highest need for efficiency.
• Big terms with few projects.
• Easy standardization.
© Yuichiro Saito (@koemu), 2015 34
Why don't use OSS ?
• We have:
• Multiple datacenters.
• Many customers with varying needs.
• An established operation workflow to meet these need.
• The existing OSS was unsuitable.
© Yuichiro Saito (@koemu), 2015 35
Opportunity
• We ware about to start a new project.
• The CTO and I decided that this would be a pilot project.
© Yuichiro Saito (@koemu), 2015 36
Plan of pilot project
1. Plan: Specifications, Preparation
2. Try: Develop, Test to use, Find Problem
3. Feedback: Interview, Questionnaire
4. Spread: Propagate, Improve
© Yuichiro Saito (@koemu), 2015 37
Tips of pilot project
1. Get CTO support.
2. Introduce merit.
3. Troubleshoot immediately.
© Yuichiro Saito (@koemu), 2015 38
Get CTO support
• Initiative by CTO.
• Endorsement from CTO.
• Emphasize exective leadership.
© Yuichiro Saito (@koemu), 2015 39
Introduce merit.
• Everyone is busy => Resistant to change
• Introduce merit, repeat and repeat.
• What is improve task ?
• Which of my jobs wolud be improved ?
• Let people know that it can be done !
© Yuichiro Saito (@koemu), 2015 40
Troubleshoot immediately
• There will always be unexpected trouble.
• If the response is slow, people will stop trying.
© Yuichiro Saito (@koemu), 2015 41
Propagate
• Documentation
• Can use him/her self DIY solutions.
• Hands-on demonstrations7
• Solve the fears of weakness DIY solutions.
• Spread to nearly all of staff (over 40) :-)
7 Yuichiro Saito, “ツール普及のために社内ハンズオンに取り組んでみた - インフラエンジニアway”, 2014, http://heartbeats.jp/hbblog/2014/11/handson.html (blog)
© Yuichiro Saito (@koemu), 2015 42
(hb-acns chat ops)
• That is executed every day.
© Yuichiro Saito (@koemu), 2015 43
Our conclusion so far
• Plan -> Try -> Feedback -> Spread
• Get CTO support
• Hands-on workshops lower entry barriers.
© Yuichiro Saito (@koemu), 2015 44
3 months• The beggining of the pilot project to the
hands-on workshops.
© Yuichiro Saito (@koemu), 2015 47
1/10 times• Lead time of monitoring setup.
© Yuichiro Saito (@koemu), 2015 48
Workload time reduced >>Time invested in development
© Yuichiro Saito (@koemu), 2015 49
Cost performance
• Increase profitability
• Reduce monitoring setup and update time.
• Increase profit
• We can get more customer by saving time !
© Yuichiro Saito (@koemu), 2015 50
Side effects
• People understood that ...
• Programming will increase efficiency.
• They can write code, too !8
8 Ryota Yoshikawa (@rrreeeyyy), “Infrastructure as Code の始め方 // Speaker Deck”, Open Source Conference 2014 Tokyo/Fall, 2014, https://speakerdeck.com/rrreeeyyy/osc-20141018-infra-as-code (Slide)
© Yuichiro Saito (@koemu), 2015 51
Future goals
• Increase operation efficiency through programming
• More in-depth study of programming.
• Solve problems by programming.
• Basis formed by operating skills (Tuning, Middleware, Clarify the problem, etc ...)
© Yuichiro Saito (@koemu), 2015 52
Our conclusion so far
• Succeeded in improving operation productivity.
• People recognized the need of programming to improve operations.
• Programming improves operations efficient.
© Yuichiro Saito (@koemu), 2015 53
Don't be afraid to change !
• People will accept growing pains if it will make their lives easier.
• Hands-on workshops are great for reducing barriers and anxiety.
• Programming improves operations efficient.
© Yuichiro Saito (@koemu), 2015 55
END© Yuichiro Saito (@koemu), 2015 56