Parallel Computing
CSP and Channels: Go
Go was designed by Google engineers who were frustrated by C++'s complexity and Java's verbosity when building highly concurrent networked servers. The language's defining feature - goroutines and channels - was not a theoretical exercise but a direct solution to Google's scaling problems: running millions of concurrent connections on commodity hardware. Today, Kubernetes, Docker, Terraform, and every major cloud provider's CLI are written in Go. The pattern they all use is the same: goroutines for concurrency, channels for communication, select for coordination.
- **Kubernetes** is written entirely in Go and uses goroutines extensively in its controller manager, which runs hundreds of concurrent control loops (one per resource type) monitoring cluster state - the CSP model maps naturally to the reconciliation loop pattern.
- **Cloudflare's DNS server** (1.1.1.1) uses Go goroutines to handle 1+ trillion DNS queries per day, with each query in a separate goroutine and channels coordinating between the UDP listener, cache, and upstream resolver layers.
- **Uber's Go services** handle hundreds of thousands of ride requests per second using goroutine-per-request models; their internal Go framework (Cadence/Temporal) uses channels to implement durable workflow execution across distributed systems.
Goroutines
A goroutine is a lightweight concurrent unit of execution in Go - similar conceptually to a thread but managed by the Go runtime rather than the OS. The Go scheduler multiplexes many goroutines onto a small number of OS threads (GOMAXPROCS, default = number of CPUs), using cooperative preemption at function call boundaries and safepoints. Starting a goroutine costs ~2KB of stack vs. ~1MB for an OS thread.
The Go runtime scheduler uses an M:N model: M goroutines on N OS threads. Work stealing: when an OS thread's local run queue is empty, it steals goroutines from other threads' queues. This enables a server to run 1 million concurrent goroutines (handling 1 million connections) on a 8-core machine that would support only ~8000 OS threads.
Go's goroutine stacks start at 2KB and grow dynamically up to 1GB by default (GOFLAGS). Stack growth occurs automatically when the current stack is insufficient - the Go runtime copies the stack to a larger allocation. This eliminates stack overflow errors common in languages with fixed-size stacks.
Why can Go run 1 million goroutines when an OS only supports ~8000 threads on an 8-core machine?
Channels
Channels are the primary communication mechanism in Go, implementing Communicating Sequential Processes (CSP, Hoare 1978). A channel is a typed pipe that goroutines can send to and receive from. Unbuffered channels (make(chan T)) synchronize sender and receiver: the sender blocks until a receiver is ready and vice versa - the rendezvous point. Buffered channels (make(chan T, N)) allow up to N items without blocking the sender.
Go's design philosophy: 'Do not communicate by sharing memory; instead, share memory by communicating.' Instead of protecting shared state with a mutex, pass ownership of data through a channel. The receiving goroutine is the only one accessing the data after the send - effectively a lock-free transfer of ownership.
Buffered channel capacity is a performance knob: capacity=0 (synchronous) minimizes memory but maximizes synchronization overhead; capacity=runtime.NumCPU() allows producers and consumers to run slightly out of step, hiding latency spikes. Too large a buffer delays backpressure signaling to producers.
What happens when a goroutine sends to an unbuffered channel with no receiver waiting?
Select Statement
The select statement in Go waits on multiple channel operations simultaneously, executing the first case that is ready. It is Go's mechanism for multiplexing channels and implementing timeouts, cancellation, and non-blocking channel operations. If multiple cases are ready simultaneously, select chooses uniformly at random - preventing starvation.
context.Context is the idiomatic Go mechanism for propagating cancellation signals through a call tree. A parent goroutine creates a context with cancel, passes it down, and calls cancel() on timeout or user action. All goroutines receiving this context check ctx.Done() via select - enabling coordinated shutdown without shared state.
What happens in a select statement when multiple channel cases become ready simultaneously?
Fan-In and Fan-Out Patterns
Fan-out distributes work from one source to multiple goroutines in parallel; fan-in merges multiple goroutine outputs back into one channel. Together they form the pipeline pattern: the most natural concurrency structure in Go. A URL downloader fans out to N goroutines (parallel downloads), then fans in results to a single channel for the consumer.
Go's pipeline pattern is composable: each stage takes input channels and returns output channels. Adding a new processing step means inserting a new function in the chain. This composability, combined with context-based cancellation, is why Go dominates data engineering CLIs (kubectl, terraform, docker CLI) where pipelines are common.
Channels are always better than mutexes in Go
Channels excel for passing ownership of data and coordinating goroutine lifecycles; mutexes are simpler and more efficient for protecting shared state accessed by multiple goroutines (caches, counters, maps)
The Go FAQ itself says 'use a mutex if that's the clearest solution' - channels for ownership transfer and goroutine coordination, mutexes for protecting shared data structures like sync.Map or a shared cache
In a Go fan-out pattern, why do multiple workers all read from the same input channel rather than from separate per-worker channels?
Key Ideas
- **Goroutines** cost ~2KB vs. 1MB for OS threads and are multiplexed by Go's M:N scheduler - enabling millions of concurrent goroutines on a small number of CPU cores via cooperative I/O blocking.
- **Channels** implement CSP: typed, goroutine-safe pipes that transfer data ownership rather than sharing memory - unbuffered channels synchronize sender and receiver; buffered channels decouple them.
- **Fan-out/fan-in** with a shared input channel provides automatic work-stealing load balancing; select enables timeout, cancellation, and channel multiplexing with uniform random case selection.
Related Topics
CSP channels connect to actor model and lock-free concurrency:
- Actor Model: Erlang, Akka — Both CSP channels and actors avoid shared state via message passing; actors couple the message queue to a specific entity while CSP channels are named independently of goroutines
- Lock-Free Data Structures — Go channels are implemented using lock-free queues internally; understanding lock-free techniques explains why buffered channels have lower overhead than mutexes for producer-consumer patterns
Вопросы для размышления
- Design a concurrent rate limiter in Go that allows at most 100 requests per second globally across all goroutines. What Go primitives would be used and how?
- A Go HTTP server creates one goroutine per request and each request makes 5 downstream API calls. Under 10,000 req/s, how many goroutines are running, and what are the memory implications?
- When would sync.Mutex be preferable to a channel for protecting shared state in a Go program? Give a concrete example.