Future of DevOps

A presentation at IBM GTO in April 2021 in by Sasha Rosenbaum

Slide 1

Slide 1

The future of DevOps Sasha Rosenbaum @DivineOps

Slide 2

Slide 2

Israeli Air Force Defense Industry R&D Cloud Consulting Microsoft GitHub DevOpsDays Chicago since 2014 Sasha Rosenbaum Red Hat @DivineOps

Slide 3

Slide 3

How about you?

Slide 4

Slide 4

Slide 5

Slide 5

The Past

Slide 6

Slide 6

Technology

Slide 7

Slide 7

1990s: Getting a new server for an application: 2-3 months

Slide 8

Slide 8

Backup

Slide 9

Slide 9

Slide 10

Slide 10

Date Release name 1990 SQL Server 1.1 (16-bit) 1992 SQL Server 4.2A 1993 SQL Server 4.21a 1995 SQL Server 6.0 1996 SQL Server 6.5 1998 SQL Server 7.0 2000 SQL Server 2000 2003 SQL Server 2000 64-bit 2005 SQL Server 2005 2008 SQL Server 2008 2010 Azure SQL database Software release cadence: 2-3-year cycle

Slide 11

Slide 11

Merge hell Merging the development branches and completing the test procedures could take months

Slide 12

Slide 12

Maintenance windows

Slide 13

Slide 13

Expected Availability < 99% 3.65 days / year

Slide 14

Slide 14

Unavailable systems were estimated to have cost American businesses $4.54 billion in 1996. Source: IBM Global Services, Improving systems availability, 1998.

Slide 15

Slide 15

Culture

Slide 16

Slide 16

Slide 17

Slide 17

Traditional IT dev ops wall of confusion

Slide 18

Slide 18

Speed Reliability

Slide 19

Slide 19

The problem isn’t technical. The problem isn’t people. The problem is socio-technical.

Slide 20

Slide 20

Darmok and Jalad at Tanagra

Slide 21

Slide 21

Patrick and Andrew at Agile TO 2008

Slide 22

Slide 22

10 deploys per day: Dev and Ops collaboration at Flickr Velocity 09: John Allspaw and Paul Hammond

Slide 23

Slide 23

Agile Infrastructure Velocity 09: Andrew Clay Shafer

Slide 24

Slide 24

Slide 25

Slide 25

DevOpsDays Ghent 2009: Patrick Debois

Slide 26

Slide 26

Slide 27

Slide 27

Speed Reliability

Slide 28

Slide 28

Charity Majors, CEO of Honeycomb

Slide 29

Slide 29

Slide 30

Slide 30

Nicole Forsgren. State of DevOps Report 2019

Slide 31

Slide 31

Speed Reliability

Slide 32

Slide 32

Software delivery is like a muscle. The more you use it, the stronger it gets. 33

Slide 33

Slide 33

Software Services Needs to be Operated Platform Services Needs to be Operated Infrastructure Services Needs to be Operated 34

Slide 34

Slide 34

In the beginning…

Slide 35

Slide 35

Deployment Checklists

Slide 36

Slide 36

Scripts

Slide 37

Slide 37

OS-level APIs

Slide 38

Slide 38

Slide 39

Slide 39

PowerShell (Windows) configuration management framework and scripting language Jeffrey Snover, 2006

Slide 40

Slide 40

Source Control for Ops

Slide 41

Slide 41

GitHub launch: 2008

Slide 42

Slide 42

Distributed version control + Pull Request system = Global collaboration

Slide 43

Slide 43

Infrastructure-level APIs

Slide 44

Slide 44

Amazon Web Services: 2002 Amazon Cloud Computing: 2006 45

Slide 45

Slide 45

Darwinian Pressure The new models evolved due to pressure to deliver adaptable services at scale. 46

Slide 46

Slide 46

Netflix, Amazon, Google, and every ‘cloud native’ company built a platform Because they had to… 47

Slide 47

Slide 47

‘Cloud’ evolved from lessons learned building and operating these internal services 48

Slide 48

Slide 48

Infrastructure as code

Slide 49

Slide 49

Configuration management minimizes manual toil and infrastructure configuration drift

Slide 50

Slide 50

“The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service.” –Werner Vogels, CTO Amazon 2006

Slide 51

Slide 51

Jez Humble and Dave Farley: 2010

Slide 52

Slide 52

Continuous Integration (CI) The process of automating the build and testing of code every time a team member commits changes to version control.

Slide 53

Slide 53

Continuous Delivery (CD) The approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time.

Slide 54

Slide 54

Software Services Needs to be Operated Platform Services Needs to be Operated Infrastructure Services Needs to be Operated 55

Slide 55

Slide 55

Slide 56

Slide 56

You will automate me out of a job!

Slide 57

Slide 57

Toil Toil is the kind of work that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows. 58

Slide 58

Slide 58

We would not be able to achieve the availability, reliability and speed we have today without automation

Slide 59

Slide 59

Slide 60

Slide 60

The problem isn’t technical. The problem isn’t people. The problem is socio-technical.

Slide 61

Slide 61

The Present

Slide 62

Slide 62

The future is already here. It’s just not evenly distributed ~ William Gibson

Slide 63

Slide 63

DevOps across Microsoft http://aka.ms/DevOps-Stories 105K 4.4M 5M 2M 500M 500K Engineers use the DevOps platform Git commits per month Builds per month Test executions per day Work items viewed per day Work items updated per day Shared with permission from Microsoft. Internal data snapshot some time in 2019 85,000 Deployments per day

Slide 64

Slide 64

Slide 65

Slide 65

Slide 66

Slide 66

DevOps and SRE engineers command a higher salary

Slide 67

Slide 67

We’ve created new jobs ¯_(ツ)_/¯

Slide 68

Slide 68

The new jobs got people higher salaries and more interesting work! ヽ(•‿•)ノ

Slide 69

Slide 69

Slide 70

Slide 70

We’ve created new disciplines!

Slide 71

Slide 71

SRE

Slide 72

Slide 72

SRE ≃ Google’s DevOps implementation

Slide 73

Slide 73

100% reliability is unattainable

Slide 74

Slide 74

Availability 99.999% 5.26 mins / year

Slide 75

Slide 75

How much does that cost?

Slide 76

Slide 76

Risk and Error Budgets Error Budgets An acceptable level of unreliability It’s a budget. It can be allocated.

Slide 77

Slide 77

SLI, SLO, and SLA Indicators Service Level Terminology Describe the metrics that matter, the values we want, and how we will react Defined measurement of an aspect of a service. Objectives Target value (or range of values) as measured by an SLI Agreements Explicit or implicit contract with users or customers, with consequences of meeting or missing objectives

Slide 78

Slide 78

Monitoring

Slide 79

Slide 79

“Monitoring is how you manage your knownunknowns, which involves checking values for predefined thresholds, creating actionable alerts and runbooks and so forth.” Charity Majors, CEO, Honeycomb

Slide 80

Slide 80

Without monitoring, you have no way to tell whether the service is even working

Slide 81

Slide 81

Observability

Slide 82

Slide 82

“Observability is how you handle unknown-unknowns, by instrumenting your code and capturing the right level of detail that lets you answer any question …” Charity Majors, CEO, Honeycomb

Slide 83

Slide 83

All of this requires collecting and analyzing massive amounts of data

Slide 84

Slide 84

”If we have data, let’s look at data. If all we have are opinions, let’s go with mine” - Jim Barksdale

Slide 85

Slide 85

Infrastructure as code ?

Slide 86

Slide 86

MLOps

Slide 87

Slide 87

MLOps •Massive amounts of data •Data model versioning •Model re-use •Model decay over time •Compliance considerations

Slide 88

Slide 88

Chaos Engineering

Slide 89

Slide 89

Chaos engineering The discipline of experimenting on a software system in production in order to build confidence in the system’s capability to withstand turbulent and unexpected conditions.

Slide 90

Slide 90

Slide 91

Slide 91

Everybody tests in production

Slide 92

Slide 92

Incident Response

Slide 93

Slide 93

Blameless postmortems

Slide 94

Slide 94

There is no root cause

Slide 95

Slide 95

DevSecOps

Slide 96

Slide 96

More code = more problems

Slide 97

Slide 97

Cyber attacks are at all time high

Slide 98

Slide 98

Security must be an integral part of the software development lifecycle

Slide 99

Slide 99

There is so much more…

Slide 100

Slide 100

The industry has evolved

Slide 101

Slide 101

Open Source

Slide 102

Slide 102

Open source is defining the new industry standards 1M+ projects 100M+ repositories Source: https://github.com. August, 2019. 40M developers 2.1M businesses

Slide 103

Slide 103

90% of IT leaders are using enterprise open source. 90% Source: Red Hat State of Open Source Report 2021

Slide 104

Slide 104

Cloud

Slide 105

Slide 105

Cloud Numbers Public Cloud ● IaaS - Dominated by 6-8 Clouds ● PaaS - Dominated by 50-100 Clouds ● SaaS - Over 4000 SaaS offerings Data Center | Private Cloud ● “Only 20% is in the Public Cloud” - IBM ● “Only 5% is in the Public Cloud “ - AWS 2020 Worldwide Public Cloud Revenues ~$235B USD 2020 Worldwide Data Center Revenues $2-4T USD

Slide 106

Slide 106

Kubernetes

Slide 107

Slide 107

85% of global IT leaders agree that Kubernetes is key to cloud-native application strategies Source: Red Hat State of Open Source Report 2021 Source: Red Hat State of Open Source Report 2021

Slide 108

Slide 108

The Future

Slide 109

Slide 109

Slide 110

Slide 110

The future is already here. It’s just not evenly distributed ~ William Gibson

Slide 111

Slide 111

seeking advantage seeking legitimacy

Slide 112

Slide 112

Agile DevOps SRE

Slide 113

Slide 113

“DevOps is a solved problem” - Someone from Google, 2019

Slide 114

Slide 114

Source: StackOverflow Developer Survey, 2018

Slide 115

Slide 115

Source: StackOverflow Developer Survey, 2018

Slide 116

Slide 116

If knowledge was all it took, we’d all have six pack abs.

Slide 117

Slide 117

We must spend time on making sure that the “standard of living” improves for everyone

Slide 118

Slide 118

Companies, just like people, don’t like to change

Slide 119

Slide 119

“Smart people don’t learn … because they have too much invested in proving what they know and avoiding being seen as not knowing.” - Chris Argyris

Slide 120

Slide 120

Learning requires vulnerability

Slide 121

Slide 121

The age of continuous updates

Slide 122

Slide 122

Slide 123

Slide 123

“In this era of becoming, everyone becomes a newbie. Worse, we will be newbies forever.” - Kevin Kelly

Slide 124

Slide 124

More innovation in the new DevOps disciplines

Slide 125

Slide 125

Managed Services

Slide 126

Slide 126

Spend More Time On What Matters The Cloud Native Organization DEV ARCH OPS TRADITIONAL IT CLOUD NATIVE “Artisanal Projects” “Industrial Products” ANCHORED UNINSPIRED APPS PROTECTIVE GATEKEEPING PER PROJECT INFRASTRUCTURE SELF SERVICE CREATION BRITTLE DEPLOYMENTS TOIL DRIVEN OPERATIONS ENABLING CONSTRAINTS COLLAPSE COMPLEXITY UNCHAINED DIFFERENTIATED VALUE RESPONSIVE INNOVATION PLATFORM SERVICES UBIQUITOUS AUTOMATION STANDARDIZE INFRA

Slide 127

Slide 127

“This is one of the innovator’s dilemmas: Blindly following the maxim that good managers should keep close to their customers can sometimes be a fatal mistake.” - Clayton Christensen

Slide 128

Slide 128

The DevOps evolution continues, as we solve new problems every day

Slide 129

Slide 129

Good DevOps copy Great DevOps steal

Slide 130

Slide 130

Thank you! @DivineOps