DevOps

What is DevOps

In 2009, Flickr shocked the industry: **10 deploys a day** while everyone else deployed once a month. By 2019, Amazon reached a deploy every 11.7 seconds. How? Not new servers, not a magic framework - but a revolution in HOW people work together. That's DevOps.

**Netflix** - thousands of deploys a day thanks to Chaos Engineering and a "freedom & responsibility" culture
**Amazon** - a deploy every 11.7 seconds, transitioning from monolith to microservices through DevOps culture
**Etsy** - one of the DevOps pioneers, continuous deployment since 2010, blameless postmortems as the standard
**Google** - created SRE (Site Reliability Engineering) - their own take on DevOps with a focus on reliability and error budgets

The Birth of DevOps

In 2008, Patrick Debois, a Belgian IT consultant, was frustrated by the chasm between Dev and Ops. In 2009, inspired by the Flickr talk, he organized the first DevOpsDays conference in Ghent. The hashtag #devops spread on Twitter - and the movement got its name. Interestingly, the word DevOps was born out of Twitter's character limit.

DevOps Culture and CALMS

In 2009, at the Velocity conference, John Allspaw and Paul Hammond from Flickr presented the talk **"10+ Deploys Per Day"**. The industry was shocked: most companies deployed once a month, while Flickr deployed dozens of times a day. Their secret wasn't a magic tool - it was a **culture of collaboration** between developers and operations.

**CALMS** - a framework describing the five pillars of DevOps: **C**ulture, **A**utomation, **L**ean, **M**easurement, **S**haring.

Before DevOps, there was the so-called **"Wall of Confusion"** - a wall of misunderstanding between Dev and Ops. Developers wanted fast change, operators wanted stability. The result? A conflict of interests that slowed everyone down.

**Shift Left** - a key principle: move testing, security, and monitoring to earlier stages of development. The earlier a bug is found, the cheaper it is to fix. A bug in production costs 100x more than a bug caught during code review.

**Blameless postmortems** - incident reviews without finger-pointing. Amazon, Google, and Netflix run them after every major outage. The goal is not to punish, but to **find the systemic root cause** and prevent recurrence. If people fear punishment, they hide mistakes.

In 2001, Amazon had a monolith that took days to deploy. After migrating to a service-oriented architecture and adopting DevOps culture, they reached **a deploy every 11.7 seconds** (2019). It's not tool magic - it's cultural transformation.

What is the "Wall of Confusion" in the context of DevOps?

Automation: CI/CD and IaC

Culture without automation is just nice words. **Automation** is what turns DevOps principles into daily reality. Three main pillars: **CI/CD pipeline**, **Infrastructure as Code**, and **Configuration Management**.

**CI (Continuous Integration)** - every commit is automatically built and tested. **CD (Continuous Delivery/Deployment)** - tested code is automatically delivered to production. The chain: commit → build → tests → deploy.

**Infrastructure as Code (IaC)** - describing infrastructure in configuration files instead of manually configuring servers. Terraform, Pulumi, and CloudFormation create identical environments with a single command.

**Why automate?** Humans make a mistake in 1 out of every 10 repetitive actions. With 100 deploys per month, that's 10 errors. Automation makes the process **repeatable, fast, and predictable**.

Aspect	Manual Process	Automated
Deploy time	30–60 minutes	2–5 minutes
Errors	~10% of deploys	<1% of deploys
Rollback	"Who remembers what changed?"	git revert + auto-deploy
New server	2–3 days of setup	terraform apply (5 min)
Documentation	Goes stale instantly	Code = documentation

What happens in a CI/CD pipeline if unit tests fail?

Metrics: DORA and SLx

**"What gets measured gets managed."** Peter Drucker said this about management, but it's especially true in DevOps. How does a team know whether its DevOps is working? That's what **DORA metrics** are for - the gold standard of the industry.

**DORA** (DevOps Research and Assessment) - a Google research group that has been studying 30,000+ teams since 2014. Their conclusion: **4 key metrics** predict the success of an IT organization.

Metric	Elite	High	Medium	Low
Deployment Frequency	On-demand (multiple times per day)	Once a week – once a month	Once a month – once every six months	Less than once every six months
Lead Time for Changes	< 1 hour	1 day – 1 week	1 week – 1 month	> 6 months
MTTR (Mean Time to Recovery)	< 1 hour	< 1 day	1 day – 1 week	> 6 months
Change Failure Rate	0–15%	16–30%	16–30%	46–60%

Beyond DORA, DevOps engineers also track **SLI/SLO/SLA** - a three-tier reliability guarantee system.

**Monitoring vs Observability.** Monitoring answers the question **"what broke?"** (CPU 100%, disk full). Observability answers **"why did it break?"** - through three pillars: **logs**, **metrics**, **traces**.

Netflix uses an **error budget** - a budget for failures. If SLO = 99.9% uptime, then error budget = 0.1% = ~43 minutes of downtime per month. While the budget remains, the team can ship features. Once exhausted - only stability fixes.

A team deploys once a month, Lead Time = 3 weeks, MTTR = 2 days. What DORA level are they?

Sharing: DevOps as a Culture

Culture, Automation, and Measurement covered. Now - **Sharing**, the last pillar of CALMS. This is where the most common misconception about DevOps lives: many people think DevOps is a set of tools. Docker, Kubernetes, Terraform. Buy it, install it - and DevOps is done.

**DevOps is NOT tools.** A team can run Kubernetes and still deploy once a quarter via a Jira ticket. Or deploy 50 times a day with simple bash scripts. Tools help, but culture comes first.

**Blameless culture** - no one is to blame for outages. The **system** that allowed the failure is at fault. If a developer accidentally took down production - the question isn't "why did they do that", but "why did the system allow this to happen?". No code review? No automated tests? No canary deploy?

**Shared responsibility** - "you build it, you run it". Developers are responsible for their code in production. This isn't punishment - it's **fast feedback**. The developer who wrote the code best understands how to fix it.

**Documentation as code** - documentation lives alongside code, in the same repository. README, ADRs (Architecture Decision Records), runbooks. If documentation is in a separate Wiki - it will be stale within a week. If it's in Git next to the code - it's updated together with it.

**Three pillars of knowledge sharing:** 1) Blameless postmortems after every incident. 2) Internal tech talks and demo days. 3) Documentation in Git alongside code. All three only work if people are **not afraid** to share mistakes.

DevOps is a separate role or team responsible for deployments and servers

DevOps is a culture of shared responsibility where Dev and Ops work as one team with common goals

Creating a separate "DevOps team" often creates another wall instead of tearing down the existing one. DevOps is a set of practices (CI/CD, IaC, monitoring, blameless culture) that the ENTIRE organization must adopt - not just one department.

A company hired a "DevOps engineer" and bought Kubernetes. Six months later, deploys are still once a month. What's the problem?

Key Takeaways

DevOps = **CALMS**: Culture, Automation, Lean, Measurement, Sharing - not a set of tools
The **"Wall of Confusion"** between Dev and Ops is broken down through shared goals and shared responsibility
**CI/CD pipeline** automates the path from commit to production, eliminating human error
**DORA metrics** - 4 indicators that predict the effectiveness of an IT organization
**Blameless culture** - systems fail, people aren't to blame. Find the root cause, not the culprit
The Flickr talk "10 deploys per day" is now demystified - it's not magic, it's culture + automation + metrics

Вопросы для размышления

A new DevOps engineer joins a company where Dev and Ops don't talk to each other. Where should the transformation begin?
Which DORA metrics matter most for a startup? And for a bank?
Why is blameless culture so hard to implement? What prevents people from not looking for someone to blame?

Связанные уроки

devops-02 — Linux fundamentals are required to operate the automation pipelines introduced here
st-01-feedback-loops — CI/CD pipeline is a feedback loop that shortens the Dev-Ops cycle from months to minutes
alg-01-big-o — DORA metrics measure process efficiency the same way Big-O measures algorithm efficiency
sd-01-intro — System Design decisions shape what DevOps must automate and monitor
sec-01 — DevSecOps integrates security into the CI/CD pipeline - Shift Left principle
dist-03-fallacies
os-19-containers

DevOps

What is DevOps

**Netflix** - thousands of deploys a day thanks to Chaos Engineering and a "freedom & responsibility" culture
**Amazon** - a deploy every 11.7 seconds, transitioning from monolith to microservices through DevOps culture
**Etsy** - one of the DevOps pioneers, continuous deployment since 2010, blameless postmortems as the standard
**Google** - created SRE (Site Reliability Engineering) - their own take on DevOps with a focus on reliability and error budgets

The Birth of DevOps

DevOps Culture and CALMS

**CALMS** - a framework describing the five pillars of DevOps: **C**ulture, **A**utomation, **L**ean, **M**easurement, **S**haring.

What is the "Wall of Confusion" in the context of DevOps?

Automation: CI/CD and IaC

**Why automate?** Humans make a mistake in 1 out of every 10 repetitive actions. With 100 deploys per month, that's 10 errors. Automation makes the process **repeatable, fast, and predictable**.

Aspect	Manual Process	Automated
Deploy time	30–60 minutes	2–5 minutes
Errors	~10% of deploys	<1% of deploys
Rollback	"Who remembers what changed?"	git revert + auto-deploy
New server	2–3 days of setup	terraform apply (5 min)
Documentation	Goes stale instantly	Code = documentation

What happens in a CI/CD pipeline if unit tests fail?

Metrics: DORA and SLx

**DORA** (DevOps Research and Assessment) - a Google research group that has been studying 30,000+ teams since 2014. Their conclusion: **4 key metrics** predict the success of an IT organization.

Metric	Elite	High	Medium	Low
Deployment Frequency	On-demand (multiple times per day)	Once a week – once a month	Once a month – once every six months	Less than once every six months
Lead Time for Changes	< 1 hour	1 day – 1 week	1 week – 1 month	> 6 months
MTTR (Mean Time to Recovery)	< 1 hour	< 1 day	1 day – 1 week	> 6 months
Change Failure Rate	0–15%	16–30%	16–30%	46–60%

Beyond DORA, DevOps engineers also track **SLI/SLO/SLA** - a three-tier reliability guarantee system.

A team deploys once a month, Lead Time = 3 weeks, MTTR = 2 days. What DORA level are they?

Sharing: DevOps as a Culture

**DevOps is NOT tools.** A team can run Kubernetes and still deploy once a quarter via a Jira ticket. Or deploy 50 times a day with simple bash scripts. Tools help, but culture comes first.

DevOps is a separate role or team responsible for deployments and servers

DevOps is a culture of shared responsibility where Dev and Ops work as one team with common goals

A company hired a "DevOps engineer" and bought Kubernetes. Six months later, deploys are still once a month. What's the problem?

Key Takeaways

DevOps = **CALMS**: Culture, Automation, Lean, Measurement, Sharing - not a set of tools
The **"Wall of Confusion"** between Dev and Ops is broken down through shared goals and shared responsibility
**CI/CD pipeline** automates the path from commit to production, eliminating human error
**DORA metrics** - 4 indicators that predict the effectiveness of an IT organization
**Blameless culture** - systems fail, people aren't to blame. Find the root cause, not the culprit
The Flickr talk "10 deploys per day" is now demystified - it's not magic, it's culture + automation + metrics

Вопросы для размышления

A new DevOps engineer joins a company where Dev and Ops don't talk to each other. Where should the transformation begin?
Which DORA metrics matter most for a startup? And for a bank?
Why is blameless culture so hard to implement? What prevents people from not looking for someone to blame?

Связанные уроки

devops-02 — Linux fundamentals are required to operate the automation pipelines introduced here
st-01-feedback-loops — CI/CD pipeline is a feedback loop that shortens the Dev-Ops cycle from months to minutes
alg-01-big-o — DORA metrics measure process efficiency the same way Big-O measures algorithm efficiency
sd-01-intro — System Design decisions shape what DevOps must automate and monitor
sec-01 — DevSecOps integrates security into the CI/CD pipeline - Shift Left principle
dist-03-fallacies
os-19-containers

What is DevOps

The Birth of DevOps

DevOps Culture and CALMS

Automation: CI/CD and IaC

Metrics: DORA and SLx

Sharing: DevOps as a Culture

Key Takeaways

Related Topics

Вопросы для размышления

Связанные уроки

What is DevOps

The Birth of DevOps

DevOps Culture and CALMS

Automation: CI/CD and IaC

Metrics: DORA and SLx

Sharing: DevOps as a Culture

Key Takeaways

Related Topics

Вопросы для размышления

Связанные уроки