aws re:invent 2016: how news uk centralized cloud governance through policy management (dev306)
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Joe Kinsella, CTO & Founder, CloudHealth Technologies
Iain Caldwell, Head of Infrastructure, News UK & News Corp EMEA
November 30, 2016
DEV 306
News UK: Achieving Centralized
Governance Through Policy Management
Presenters
Joe Kinsella, CTO & Founder
CloudHealth Technologies
@joekinsella
Iain Caldwell, Head of Infrastructure
News UK & News Corp EMEA
@caldi100
What to expect from this session
• Overview of News Corp’s use of AWS
• Why governance is critical to cloud success
• How to drive a governance strategy
• 5 best practices
News Corp strategy
• CTO set objective to reduce data centre footprint and associated costs
• Host 75% of estate in the public cloud within next three years.
• News UK currently running at 69% aiming to make 75% by July 2017
• Before we started in 2011 we built our AWS Cloud data centre
• Ran a global application assessment for cloud readiness across all BUs
• Digital estate was the main contender for cloud – web-based
applications, mobile applications, test, and dev
• Migrate our enterprise systems to the cloud over past 2 years
• Traditional newspaper, finance, and monitoring applications etc.
News Corp on AWS
• 2K+ EC2 instances
• 750+ TB S3 storage
• 300+ RDS instances
• Utilizing broad set of AWS services –
Amazon Elastic Compute Cloud (EC2), Amazon Relational Database Service,
Amazon Redshift, Amazon VPC, AWS Direct Connect, Amazon Route 53, Amazon
WorkSpaces, AWS Storage Gateway, Amazon Simple Storage Service (S3),
Amazon Glacier, Amazon CloudFront, AWS CloudFormation, AWS Config, Amazon
CloudWatch, AWS Trusted Advisor
• Key management/support tools: CloudHealth, New Relic, Puppet,
Rundeck, and more…
What is cloud governance?
• Process to ensure secure, effective,
& efficient use of IT resources
• Includes compliance to policies
& best practices
• Covers cost, security,
availability, performance, & usage
Governance needs…
• Brand protection
• Cost control
• Management of business risk
• Compliance to policies &
standards
Why governance matters: A balancing act
Agility drives…
• Quick time to market
• Innovation
• Flexibility
The challenge of cloud governance
• Rapid pace of change
• Powerful cloud services/features
• Consumption-based pricing
• IT often influencer/auditor, not owner
• Decentralized management
• Disparate management tools
• Requires integration of multiple products & sources of data
Common cloud governance issues – News Corp
• No tagging
• Reluctance to invest in Reserved Instances
• Reserved Instances underutilised
• No rightsizing
• ELB left unused
• EBS volumes left unattached
• RDS instances with no active connections
• S3 storage exponential growth
• PoC and dev environments created and left
• Not shutting dev environments down at night
The unique challenge to the enterprise
• Ownership increasingly distributed to lines of
business that increasingly:
• Control infrastructure supporting their
businesses
• Go “rogue” to get around IT and achieve
business agility
• Do not taking into account importance of
governance, compliance, risk management
• IT increasingly influencer/auditor instead of owner
Where to start
• Establish a strategy & obtain stakeholder buy-in
• Evaluate & implement tool strategy
• Identify deliverables by stakeholder
• Implement, rinse, & repeat
Establish strategy
• Implications of competing priorities
• Digital teams require agility – speed of
products to market, embrace innovation
• Enterprise teams need to control costs,
preserve security and adhere to
governance, attract and retain good people
• What’s needed from a people perspective
• Acquiring and maintaining talent
• A focus on cloud consumption & usage
• Develop best practices
• Cloud steward
Agility Governance
Team lead
Operations
Finance Engineering
LOBs
• Business group definition & implementation
• Tagging, naming conventions, metadata, etc.
• Data integrations
• Cost, budget, assets, configuration,
performance, security
• Report definitions and delivery
• Policy definition and implementation
• Analysis, recommendations, & optimization
actions
• Capacity planning, modeling, & forecasting
• Service-level reporting
Cloud steward:Responsible for ongoing cloud optimization & governance
OPERATIONS
Evaluate & implement tool strategy
• AWSGO - enforced 7 P.M. shutdowns/snooze/start
• Delete unattached volumes >=5 days
• CloudHealth – Cost management & policy management
• Consigliere – One view for all AWS accounts Trusted Advisor
• NewRelic - APM
• Rundeck - Orchestration
• Puppet - Configuration
• Slack integration
Confidential
CEO
Global CIO Eng
Eng DevOps IT OpsCloud Ops
CFO
FP&AFin
Analyst
LOB A
Eng DevOps IT OpsCloud Ops
LOB B
Eng DevOps IT OpsCloud Ops
Product & Function
Production Web
Development App
QA DB
Staging Storage
P&L & Department
OPEX/COGS
Product
Function
Customer
Business Unit
Product
Function
Customers
Business Unit
Product
Function
Customers
Pe
rsp
ective
s
Cost Pulse
Health Check Pulse
RI Utilization Pulse
Cost by Group
Usage by Reservation Type
Reservation Modifications
Usage by Instance Type
Instance Rightsizing
Volume Rightsizing
Cost Pulse
Health Check Pulse
RI Utilization Pulse
Cost by Group
Usage by Reservation Type
Cost Pulse
Health Check Pulse
RI Utilization Pulse
Cost by Group
Usage by Reservation Type
Reservation Modifications
Usage by Instance Type
Instance Rightsizing
Volume Rightsizing
Cost Pulse
Health Check Pulse
RI Utilization Pulse
Cost by Group
Usage by Reservation Type
Reservation Modifications
Usage by Instance Type
Instance Rightsizing
Volume Rightsizing
Su
bscrip
tion
s
Over Budget
Purchase Reservations
Modify Reservations
Underutilized Instances
Unattached Volumes
Snapshot Aging
Untagged Assets
Start / Stop Instances
Over Budget
Modify Reservations
Purchase RI’s
Cost Per Group
Over Budget
Purchase Reservations
Modify Reservations
Underutilized Instances
Unattached Volumes
Snapshot Aging
Untagged Assets
Start / Stop Instances
Over Budget
Purchase Reservations
Modify Reservations
Underutilized Instances
Unattached Volumes
Snapshot Aging
Untagged Assets
Start / Stop Instances
Po
licie
sS
take
ho
lders
Identify deliverables by stakeholder
Be
st P
ractices
Rinse & repeat: Continued improvements
• Enforced tagging – EC2, RDS, ELB,
EBS & Auto Scaling groups – delete
new instance if not tagged <15mins
• Daily cleanup:
• Delete EC2 instances shut down
for >=5 days
• Delete ELB no traffic >=5 days
• Delete EC2 no traffic >=5 days
Governing cost management: The total picture
• Right-size our current estate
• Invested in Reserved Instances
• Decommissioned what we didn’t need
• Implemented automation where possible
- CloudFormation & Chef/Puppet for us
• Implemented good governance – tagging
and service transition, including change
control – in progress
• Use the AWS Trusted Advisor service
Governing security management: Key requirements
• Security groups - NACLs reviewed and
updated to allow specific access.
• IAM roles - Groups created and applied to
instance. Functions and actions restricted.
• Networking - All ports closed. Open only what
is required.
• Users not active in News are removed.
• Antivirus set up on EC2 Windows instances
automatically.
• IAM users audited and user access modified.
Success criteria: The key metrics
• Architectural – adherence to standards/controls
• Cost – efficiency & lifecycle management, TCO, ROI
• Asset – adherence to configuration standard
• Security – compliance to best-practice configuration
• Adoption – rate of adoption
What’s next for governance
We need the equivalent of DevOps for cloud management
• Processes
• Set of roles
• Tooling
• Shared standards
5 best practices
Empower a centralized owner that
delivers real value to stakeholders
Don’t give up on agility
Create partnerships with strategic
vendors
Establish high-value policies
Automate, automate, automate
Confidential
Current Security Offering▪ Default policy for monitoring for AWS
▪ Monitors access control, network
security, application security & logging
▪ Reports violations with
recommendations
▪ Security violation management▪ Include / exclude resources
▪ Group-based targeting
▪ Fully customizable & extensible
(including actions via Lambda)
▪ Integrates with Health Check
▪ Approval workflow for custom actions
▪ Per instance port-level reporting
▪ Alert Logic incident trend reporting
Security Policies for AWS
Security Monitoring
Security Recommendations