A presentation at IBM GTO by Sasha Czarkowski (Rosenbaum)
The future of DevOps Sasha Rosenbaum @DivineOps
Israeli Air Force Defense Industry R&D Cloud Consulting Microsoft GitHub DevOpsDays Chicago since 2014 Sasha Rosenbaum Red Hat @DivineOps
How about you?
The Past
Technology
1990s: Getting a new server for an application: 2-3 months
Backup
Date Release name 1990 SQL Server 1.1 (16-bit) 1992 SQL Server 4.2A 1993 SQL Server 4.21a 1995 SQL Server 6.0 1996 SQL Server 6.5 1998 SQL Server 7.0 2000 SQL Server 2000 2003 SQL Server 2000 64-bit 2005 SQL Server 2005 2008 SQL Server 2008 2010 Azure SQL database Software release cadence: 2-3-year cycle
Merge hell Merging the development branches and completing the test procedures could take months
Maintenance windows
Expected Availability < 99% 3.65 days / year
Unavailable systems were estimated to have cost American businesses $4.54 billion in 1996. Source: IBM Global Services, Improving systems availability, 1998.
Culture
Traditional IT dev ops wall of confusion
Speed Reliability
The problem isn’t technical. The problem isn’t people. The problem is socio-technical.
Darmok and Jalad at Tanagra
Patrick and Andrew at Agile TO 2008
10 deploys per day: Dev and Ops collaboration at Flickr Velocity 09: John Allspaw and Paul Hammond
Agile Infrastructure Velocity 09: Andrew Clay Shafer
DevOpsDays Ghent 2009: Patrick Debois
Speed Reliability
Charity Majors, CEO of Honeycomb
Nicole Forsgren. State of DevOps Report 2019
Speed Reliability
Software delivery is like a muscle. The more you use it, the stronger it gets. 33
Software Services Needs to be Operated Platform Services Needs to be Operated Infrastructure Services Needs to be Operated 34
In the beginning…
Deployment Checklists
Scripts
OS-level APIs
PowerShell (Windows) configuration management framework and scripting language Jeffrey Snover, 2006
Source Control for Ops
GitHub launch: 2008
Distributed version control + Pull Request system = Global collaboration
Infrastructure-level APIs
Amazon Web Services: 2002 Amazon Cloud Computing: 2006 45
Darwinian Pressure The new models evolved due to pressure to deliver adaptable services at scale. 46
Netflix, Amazon, Google, and every ‘cloud native’ company built a platform Because they had to… 47
‘Cloud’ evolved from lessons learned building and operating these internal services 48
Infrastructure as code
Configuration management minimizes manual toil and infrastructure configuration drift
“The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service.” –Werner Vogels, CTO Amazon 2006
Jez Humble and Dave Farley: 2010
Continuous Integration (CI) The process of automating the build and testing of code every time a team member commits changes to version control.
Continuous Delivery (CD) The approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time.
Software Services Needs to be Operated Platform Services Needs to be Operated Infrastructure Services Needs to be Operated 55
You will automate me out of a job!
Toil Toil is the kind of work that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows. 58
We would not be able to achieve the availability, reliability and speed we have today without automation
The problem isn’t technical. The problem isn’t people. The problem is socio-technical.
The Present
The future is already here. It’s just not evenly distributed ~ William Gibson
DevOps across Microsoft http://aka.ms/DevOps-Stories 105K 4.4M 5M 2M 500M 500K Engineers use the DevOps platform Git commits per month Builds per month Test executions per day Work items viewed per day Work items updated per day Shared with permission from Microsoft. Internal data snapshot some time in 2019 85,000 Deployments per day
DevOps and SRE engineers command a higher salary
We’ve created new jobs ¯_(ツ)_/¯
The new jobs got people higher salaries and more interesting work! ヽ(•‿•)ノ
We’ve created new disciplines!
SRE
SRE ≃ Google’s DevOps implementation
100% reliability is unattainable
Availability 99.999% 5.26 mins / year
How much does that cost?
Risk and Error Budgets Error Budgets An acceptable level of unreliability It’s a budget. It can be allocated.
SLI, SLO, and SLA Indicators Service Level Terminology Describe the metrics that matter, the values we want, and how we will react Defined measurement of an aspect of a service. Objectives Target value (or range of values) as measured by an SLI Agreements Explicit or implicit contract with users or customers, with consequences of meeting or missing objectives
Monitoring
“Monitoring is how you manage your knownunknowns, which involves checking values for predefined thresholds, creating actionable alerts and runbooks and so forth.” Charity Majors, CEO, Honeycomb
Without monitoring, you have no way to tell whether the service is even working
Observability
“Observability is how you handle unknown-unknowns, by instrumenting your code and capturing the right level of detail that lets you answer any question …” Charity Majors, CEO, Honeycomb
All of this requires collecting and analyzing massive amounts of data
”If we have data, let’s look at data. If all we have are opinions, let’s go with mine” - Jim Barksdale
Infrastructure as code ?
MLOps
MLOps •Massive amounts of data •Data model versioning •Model re-use •Model decay over time •Compliance considerations
Chaos Engineering
Chaos engineering The discipline of experimenting on a software system in production in order to build confidence in the system’s capability to withstand turbulent and unexpected conditions.
Everybody tests in production
Incident Response
Blameless postmortems
There is no root cause
DevSecOps
More code = more problems
Cyber attacks are at all time high
Security must be an integral part of the software development lifecycle
There is so much more…
The industry has evolved
Open Source
Open source is defining the new industry standards 1M+ projects 100M+ repositories Source: https://github.com. August, 2019. 40M developers 2.1M businesses
90% of IT leaders are using enterprise open source. 90% Source: Red Hat State of Open Source Report 2021
Cloud
Cloud Numbers Public Cloud ● IaaS - Dominated by 6-8 Clouds ● PaaS - Dominated by 50-100 Clouds ● SaaS - Over 4000 SaaS offerings Data Center | Private Cloud ● “Only 20% is in the Public Cloud” - IBM ● “Only 5% is in the Public Cloud “ - AWS 2020 Worldwide Public Cloud Revenues ~$235B USD 2020 Worldwide Data Center Revenues $2-4T USD
Kubernetes
85% of global IT leaders agree that Kubernetes is key to cloud-native application strategies Source: Red Hat State of Open Source Report 2021 Source: Red Hat State of Open Source Report 2021
The Future
The future is already here. It’s just not evenly distributed ~ William Gibson
seeking advantage seeking legitimacy
Agile DevOps SRE
“DevOps is a solved problem” - Someone from Google, 2019
Source: StackOverflow Developer Survey, 2018
Source: StackOverflow Developer Survey, 2018
If knowledge was all it took, we’d all have six pack abs.
We must spend time on making sure that the “standard of living” improves for everyone
Companies, just like people, don’t like to change
“Smart people don’t learn … because they have too much invested in proving what they know and avoiding being seen as not knowing.” - Chris Argyris
Learning requires vulnerability
The age of continuous updates
“In this era of becoming, everyone becomes a newbie. Worse, we will be newbies forever.” - Kevin Kelly
More innovation in the new DevOps disciplines
Managed Services
Spend More Time On What Matters The Cloud Native Organization DEV ARCH OPS TRADITIONAL IT CLOUD NATIVE “Artisanal Projects” “Industrial Products” ANCHORED UNINSPIRED APPS PROTECTIVE GATEKEEPING PER PROJECT INFRASTRUCTURE SELF SERVICE CREATION BRITTLE DEPLOYMENTS TOIL DRIVEN OPERATIONS ENABLING CONSTRAINTS COLLAPSE COMPLEXITY UNCHAINED DIFFERENTIATED VALUE RESPONSIVE INNOVATION PLATFORM SERVICES UBIQUITOUS AUTOMATION STANDARDIZE INFRA
“This is one of the innovator’s dilemmas: Blindly following the maxim that good managers should keep close to their customers can sometimes be a fatal mistake.” - Clayton Christensen
The DevOps evolution continues, as we solve new problems every day
Good DevOps copy Great DevOps steal
Thank you! @DivineOps
The term DevOps first appeared in 2009, and since then has been used to describe a cultural shift, an engineering job title, and many products in the Continuous Integration and Continuous Delivery space.
In this session, we will review the brief history of DevOps as a methodology, a set of technical skills, and an umbrella of technologies, and then dive into what the next 5 to 10 years are likely to look like in the DevOps space.