DevOps

Service Mesh: Istio, Linkerd

In 2018, Lyft had 200+ microservices. The team spent months adding retry logic, TLS, and tracing to each service individually. Then they released Envoy as open source. By 2019, the same features were available to every service in 30 minutes by enabling the sidecar injection label.

  • **Airbnb** uses Istio for canary deployments: a new service version receives 1% of traffic, error rate is automatically monitored, and rollback happens without engineer intervention if the threshold is breached.
  • **T-Mobile** implemented mTLS via Istio and removed the requirement for VPN for internal services - every call is cryptographically authenticated, achieving zero-trust at the network level.
  • **Booking.com** uses Kiali for service dependency graph analysis during incidents - instead of hours figuring out 'who calls whom', a live map with error rates is available in 30 seconds.

Sidecar Proxy

The sidecar is a proxy container (Envoy in Istio, linkerd-proxy in Linkerd) automatically injected into every pod. It intercepts all inbound and outbound traffic without any application code changes.

The sidecar pattern decouples network concerns from application logic. A Python service and a Go service both get identical mTLS, retries, and tracing without any changes to their code.

What is the main advantage of the sidecar pattern in a service mesh?

Traffic Management

Istio provides VirtualService and DestinationRule for fine-grained traffic control. VirtualService defines routing rules; DestinationRule defines how to reach a destination (load balancing, circuit breaking, subsets for canary).

Canary with Istio is more powerful than Kubernetes rolling update: traffic percentage is controlled independently of pod count, and rollback is instant (change weights from 5/95 to 0/100).

How is an Istio canary deployment better than a standard Kubernetes rolling update?

Mutual TLS (mTLS)

Mutual TLS authenticates both parties in a connection: the client verifies the server certificate and the server verifies the client certificate. In Istio, every service-to-service call is encrypted and mutually authenticated automatically.

With Istio mTLS in STRICT mode, the network perimeter shrinks to the pod level. A compromised pod cannot impersonate another service - every connection requires a certificate bound to a specific workload identity.

What distinguishes mTLS from standard TLS?

Observability via Service Mesh

Service mesh provides L7 observability for free: every sidecar generates metrics, traces, and access logs for every service call. No manual instrumentation required.

The value of service mesh observability is coverage: 100% of service-to-service calls are instrumented automatically, including calls from services that predate the observability requirement.

Service mesh replaces the need for APM tools like Datadog or New Relic

Service mesh provides infrastructure-level L7 metrics and tracing, but APM tools add application-level context: business transactions, code-level profiling, user session tracking.

Istio shows that payment-api has p99=500ms. Datadog APM shows that 80% of that latency is in one specific SQL query. These are complementary layers of visibility.

What is the main contribution of a service mesh to observability compared to manual instrumentation?

Key Ideas

  • **Sidecar** - Envoy proxy is injected into every pod and intercepts all traffic without code changes.
  • **Traffic Management** - VirtualService and DestinationRule provide granular control: canary 5%/95%, A/B testing, header-based routing, circuit breaking.
  • **mTLS + Observability** - mutual authentication of all calls and automatic L7 metrics/traces for every service without instrumentation.

Related Topics

Service mesh integrates with the broader observability and security stack:

  • K8s: Advanced Patterns — Istio is installed via Helm and Operator; injection works at the Kubernetes namespace level.
  • Distributed Tracing — Istio automatically creates spans for every inter-service call and propagates context through Envoy.

Вопросы для размышления

  • When is a service mesh justified, and when does it add unnecessary complexity? At what number of services does it make sense?
  • If OpenTelemetry SDK is already in all services - what additional value does Istio provide?
  • How does mTLS in Istio change the network security model compared to traditional firewall rules?

Связанные уроки

  • net-49-service-mesh
  • dist-12-consistency
Service Mesh: Istio, Linkerd

0

1

Sign In