Systems Theory

Resilience: Why Some Systems Survive and Others Collapse

Yellowstone forest recovered after 36% of its territory was destroyed. Lehman Brothers collapsed forever in a week. Why did one giant survive while the other disappeared? The answer lies in the concept of resilience.

  • **Business**: Companies that survived COVID had different resilience strategies
  • **Technology**: Netflix survived an AWS outage thanks to Chaos Engineering
  • **Finance**: An emergency fund is literally resilience in monetary form
  • **Health**: The immune system is biological resilience

Resilience ≠ Stability

**Yellowstone, 1988**. A massive wildfire destroyed 36% of the forest - 800,000 hectares. A disaster? Thirty years later, the forest recovered and became **healthier**. Meanwhile, **Lehman Brothers** collapsed in a week in 2008 and never recovered.

**Stability** - a system's ability to RESIST change and remain in its current state.

**Resilience** - a system's ability to RECOVER after a shock and return to functioning.

**Oak vs Bamboo**: An oak is rigid, resists the wind - but snaps in a hurricane. Bamboo bends to the ground - but springs back up after the storm.

A company hadn't changed its business model in 20 years. A competitor appeared and took 50% of the market. The company couldn't adapt and shut down. This is a problem of...

Three Sources of Resilience

Where does a system's ability to recover come from?

**1. Diversity** - many different elements = many paths to functioning. An ecosystem with 100 species is more resilient than one with 10.

**2. Modularity** - the system consists of relatively independent parts. The failure of one part doesn't kill the whole system.

**3. Redundancy** - duplication of critical functions. Two engines on a plane. A financial emergency fund.

SourceExampleHow it works
Diversity5 products vs 1A drop in one market doesn't kill the company
ModularityMicroservices vs monolithA service failure doesn't take down the system
Redundancy6 months of expenses in the bankLosing a job isn't a catastrophe

Which source of resilience helped Yellowstone forest recover after the wildfire?

Signs of Fragility

How can one tell when a system is becoming fragile?

  1. **Over-optimization** - 'cut everything unnecessary' = cut the buffers
  2. **Hidden interdependence** - 'Who knew THAT depended on THIS?'
  3. **Compression of diversity** - one product, one supplier, one technology
  4. **Long period without stress** - '10 years without failures' = nobody knows how to respond

**Lehman Brothers - all four signs:**

SignAt Lehman Brothers
Over-optimizationLeverage 30:1, minimal reserves
Hidden connectionsDerivatives linked everyone to everyone
No diversityEveryone held similar mortgage-backed securities
No stress25 years of market growth, 'prices only go up'

Lehman didn't die from a single blow - it was already dead, it just didn't know it yet.

Resilience is expensive and inefficient. Better to optimize.

Resilience is insurance. The cost of fragility during a crisis is usually higher than the cost of maintaining reserves.

Lehman was super-optimized. Its collapse cost the global economy $600 billion. Netflix spends on redundancy but never goes down.

A company is proud that it 'optimized away everything unnecessary' and runs on zero inventory. This is...

Designing for Resilience

Resilience doesn't happen on its own - it must be intentionally designed:

  1. **Embrace redundancy** - yes, it's 'inefficient'. That's the price of resilience.
  2. **Loose coupling** - modules can fail independently. Clear boundaries.
  3. **Safe-to-fail experiments** - small probes instead of big bets.
  4. **Feedback loops** - learn about problems quickly. Monitoring, alerts.

**Netflix Chaos Engineering**: Netflix deliberately 'kills' its own services in production. Chaos Monkey randomly shuts down services.

**Result**: Teams are FORCED to design for failures. When AWS actually goes down, Netflix keeps running. Competitors don't.

**Antifragility (Taleb)**: Some systems actually get STRONGER from stress. Like muscles under load. Controlled stress strengthens.

Why does Netflix deliberately 'kill' its own services in production?

Key Ideas

  • **Resilience ≠ stability**: recovery vs resistance to change
  • **Three sources**: diversity, modularity, redundancy
  • **Signs of fragility**: over-optimization, hidden connections, monoculture, absence of stress
  • **Design principles**: redundancy, loose coupling, safe-to-fail, feedback loops
  • **Antifragility**: controlled stress strengthens the system

What's Next?

We've studied the core concepts of systems. Now let's apply them to real-world systems.

  • Ecosystems — Natural systems as a model of resilience
  • Economic Systems — Markets, crises, antifragility
  • Social Systems — Society as a complex adaptive system

Вопросы для размышления

  • What are the structural sources of resilience in a complex system? Where are signs of fragility most likely to appear?
  • What categories of reserves determine system resilience: capital, capabilities, or relationships?
  • What design changes increase resilience in a system at high load?

Связанные уроки

  • st-07-adaptation — Adaptation determines how a system responds to shocks
  • st-01-feedback-loops — Feedback loops are the recovery mechanism
  • st-09-ecology — Ecosystems are a living model of resilience through diversity
  • st-04-leverage — Leverage points show where to build reserves
  • prob-04-bayes — Bayesian updating mirrors how a system adapts after a shock
  • dyn-08
Resilience: Why Some Systems Survive and Others Collapse

0

1

Sign In