Software Engineering

Code Metrics

2019. Microsoft analyzes the Windows codebase through Code Hotspot metrics. Finding: 20% of files contain 80% of defects. These files are hotspots: high cyclomatic complexity + high churn. Targeted refactoring of only these 20% reduces the defect rate by 35% at minimal cost.

**SonarQube** (10K+ enterprise deployments): CC, coupling, duplication - core metrics in the Quality Gate before production deploy
**CodeScene** (Adam Tornhill): behavioral code analysis via git history - used at Spotify, Ericsson to prioritize tech debt
**Google's Code Health**: internal code quality measurement program correlates with engineer retention - bad code literally drives developers away

Cyclomatic Complexity: Complexity as Path Count

NASA Software Engineering Laboratory studied the correlation between cyclomatic complexity and defects: modules with CC > 10 have 2.5x more defects than modules with CC <= 10. This isn't just correlation - it's causal: high CC means many execution paths, each requiring a separate test case.

Cyclomatic Complexity (CC) = number of independent paths through code = E - N + 2P, where E is edges in the control flow graph, N is nodes, P is connected components. Simplified formula: CC = 1 + (number of decisions). Each if, else if, while, for, case, &&, || adds 1. CC 1-10: low complexity. CC 11-20: moderate, attention needed. CC 21-50: high, refactor. CC > 50: untestable.

CC equals the minimum number of test cases for full branch coverage. CC = 15 means you need at least 15 tests. This is a direct link to cost of testing. Google's Testing Blog: functions with CC > 6 statistically require reworked tests on every change. Tools: ESLint (complexity rule), SonarQube, PyFlakes, Java's PMD.

A function has CC = 15. What is the minimum number of unit tests for full branch coverage?

Coupling: The Cost of Dependencies

Michael Feathers in Working Effectively with Legacy Code: 'A highly coupled module cannot be tested in isolation.' Coupling is the degree of dependency between modules. The higher the coupling, the more a change in one module propagates through the system - the ripple effect.

Afferent coupling (Ca): how many other modules depend on this one. High Ca = a responsible module; changes affect many. Efferent coupling (Ce): how many other modules this one uses. High Ce = a dependent module, vulnerable to changes in dependencies. Instability I = Ce / (Ca + Ce): 0 = maximally stable, 1 = maximally unstable. Unstable modules should not depend on unstable ones.

Stable Dependencies Principle (SDP) - one of Uncle Bob's component coupling principles: a module should depend only on more stable modules. The dependency graph must be a DAG (Directed Acyclic Graph) - cyclic dependencies break this hierarchy. Detection tools: Dependency Cruiser (JS), NDepend (.NET), Structure101 (Java).

OrderService has Ce = 12 (depends on 12 modules) and Ca = 2 (2 modules depend on it). What is Instability I?

Cohesion: A Module That Does One Thing Well

Cohesion is the flip side of coupling: how related are the methods and fields within a module to each other. High cohesion = methods work with the same data, implement one concept. Low cohesion = a 'utility class' with unrelated functions, a God object. Rule: if a class or module is hard to name without AND - cohesion is low.

LCOM (Lack of Cohesion of Methods) - a formal metric. LCOM4: the number of connected components in the graph where vertices are methods and edges are shared fields. LCOM4 = 1: one connected component, high cohesion. LCOM4 > 1: disconnected groups of methods - the class can be split. Example: a class with read/write methods (use connection) and parse/format methods (strings only) - LCOM4 = 2.

High cohesion + Low coupling - the two wings of good design. They are interconnected: when a class does one thing (high cohesion), it has fewer reasons to depend on other modules (low coupling). A God object violates both principles simultaneously: does everything (low cohesion) and depends on everything (high coupling). SRP in SOLID is the high cohesion principle expressed in terms of business reasons for change.

The class OrderProcessor contains methods: processPayment(), sendEmail(), generatePDF(), updateInventory(). What does this say about cohesion?

Code Churn: Change History as a Risk Metric

Michael Feathers and Adam Tornhill (CodeScene): the correlation between a file's change frequency (churn) and defect count is one of the strongest in software engineering research. A frequently changed file with high complexity is a hotspot: the zone of maximum technical debt and maximum risk.

Churn = number of changes to a file over a period. Churn alone isn't bad: actively developed code changes. But churn + complexity creates a hotspot. Visualization: CodeScene (Adam Tornhill) builds a 'code city' where file radius = complexity and color = churn. Large red circles = refactoring priority. This approach directs effort where ROI is maximum.

Temporal coupling via git log is a capable tool for discovering hidden dependencies. If order.service.ts and payment.service.ts change together 80% of the time - they have an implicit relationship not expressed in code. Adam Tornhill in Software Design X-Rays calls this 'change coupling' and shows it often predicts bugs more accurately than static analysis.

Code metrics are objective - high CC is always bad, low coupling is always good

Metrics are indicators requiring context: high CC in a state machine is normal, zero coupling can mean duplication

A state machine with 50 states has CC=50 by necessity - this is not technical debt. Complete decoupling is sometimes achieved through code duplication - which is worse than coupling. Metrics point to places for attention, they don't auto-diagnose

auth.service.ts has churn=95 (changed 95 times in 6 months) but CC=3. Is it a hotspot?

Key Ideas

**Cyclomatic Complexity**: independent path count = minimum test cases; CC > 10 doubles defect rate per NASA data
**Coupling**: afferent (Ca) / efferent (Ce) / instability I = Ce/(Ca+Ce); unstable modules must depend on stable ones
**Cohesion**: LCOM4 = number of connected components in method graph; >1 means the class can be split; high cohesion + low coupling = good design
**Churn**: hotspot = high churn + high complexity; temporal coupling via git log reveals hidden dependencies

Вопросы для размышления

Can a class have perfect metrics (CC=2, coupling=0, LCOM4=1) but poor design? Give an example.
How do you use churn analysis in sprint planning? What decisions does it help make?
Temporal coupling: two files change together 90% of the time. Is this always bad or sometimes normal? When is each case?

Связанные уроки

se-15 — Clean code principles are the qualitative foundation; metrics measure them quantitatively
se-09 — Refactoring is applied where metrics reveal problems
se-11 — Code review uses metrics as objective criteria
se-05 — SOLID principles reduce coupling and increase cohesion - metrics confirm this
alg-02 — Cyclomatic complexity is directly tied to number of paths - a graph theory concept
stat-05-hypothesis