DevOps
GitOps: ArgoCD and Flux
2019: an engineer at a major bank accidentally ran `kubectl delete namespace production` on the live cluster. Recovery took 4 hours because nobody knew the exact infrastructure state before the incident. With GitOps, recovery would have taken: `git log`, `argocd app sync` - minutes instead of hours.
- **Intuit** migrated 4000+ microservices to ArgoCD - every production deployment requires a PR approval, and the full history of all changes is stored in Git with the engineer's name and reason.
- **Weaveworks** (authors of the term GitOps) used Flux to update firmware on industrial IoT devices - the same reconciliation loop as for Kubernetes, applied to edge nodes.
- **CERN** uses Flux to manage thousands of clusters for experimental equipment - any configuration change to particle detectors goes through code review in Git.
Declarative Model: Git as the Single Source of Truth
In 2017 the Weaveworks team coined the term GitOps: all desired infrastructure state lives in a Git repository, and an automated agent continuously brings the real cluster state in line with the repository. Not `kubectl apply` by hand - a system that watches for compliance on its own.
**4 GitOps principles (OpenGitOps):** (1) Declarative - desired state is described, not scripted. (2) Versioned - all manifests in Git with history and rollback. (3) Automatic application - approved changes are applied automatically. (4) Continuous verification - the system continuously checks compliance. **Difference from traditional CI/CD:** a traditional pipeline pushes to the cluster; a GitOps agent pulls from Git - the cluster is never directly accessible from CI.
What is the key architectural security difference between GitOps and traditional CI/CD?
Reconciliation Loop: Continuous Synchronization
Reconciliation is the heart of GitOps. ArgoCD and Flux continuously compare desired state (Git) with actual state (cluster) and apply the difference. This is not a one-time deploy - it is an infinite correction cycle running every few minutes.
**Reconciliation in ArgoCD:** every 3 minutes (or via webhook) ArgoCD compares Live State (real Kubernetes objects) with Desired State (Git). The difference is a diff. With `selfHeal: true` the difference is applied automatically. **Flux reconciliation:** the GitRepository controller watches Git, the Kustomization/HelmRelease controller applies changes. Each controller is independent. **What is compared:** not just resource specs, but labels and annotations too. The `managedFields` field helps track what is GitOps-managed.
An operator manually changed `replicas: 3` in a deployment via `kubectl edit` (Git specifies `replicas: 2`). ArgoCD has `selfHeal: true`. What happens?
Drift Detection: Catching Configuration Divergence
Configuration drift occurs when real infrastructure state gradually diverges from the documented state. In traditional teams this accumulates over years: 'we changed that manually in prod three years ago and forgot.' GitOps solves this at the system level: any divergence immediately appears as Sync Status = OutOfSync.
**ArgoCD Sync Status:** Synced - cluster matches Git. OutOfSync - there is a divergence (diff). Unknown - could not check. **Health Status:** Healthy, Degraded, Progressing, Suspended. **What causes drift:** manual kubectl changes, controller changes (HPA modifies replicas), kubectl rollout undo. **Tools:** `argocd app diff my-app` shows the exact diff; the ArgoCD UI highlights divergences visually.
HPA constantly changes `spec.replicas` in a Deployment. ArgoCD sees this as OutOfSync and resets to 3 replicas from Git. How to fix this?
Promotion: Advancing Changes Between Environments
GitOps changes both deployment and how changes are promoted between staging and production. Instead of 'run the pipeline with ENV=prod parameter' - create a Pull Request from the staging branch to the prod branch. Code review becomes deployment review. Rollback is a git revert.
**Promotion strategies:** (1) Branch-based: `main` -> staging, `release` -> prod. ArgoCD watches different branches. (2) Directory-based: `clusters/staging/` and `clusters/prod/` in one repository. Promotion is a PR copying changes. (3) Kustomize overlays: `base/` with shared manifests, `overlays/staging/` and `overlays/prod/` with patches. (4) ArgoCD ApplicationSet: automatically creates an Application per environment from a template. **Image promotion:** Flux Image Automation automatically updates Docker image tags in Git when a new image appears in the registry.
GitOps only applies to Kubernetes and containers
GitOps is an infrastructure management pattern using Git as the source of truth, applicable to any IaC system: Terraform, Ansible, Crossplane, serverless configurations.
Flux has providers for Terraform (tf-controller), AWS, and Azure. The reconciliation loop principle applies to any system that supports declarative state description.
A team uses GitOps with ArgoCD. A production deployment needs to be rolled back urgently. Correct GitOps way?
Key Ideas
- **GitOps pull model** eliminates the need to give CI servers access to the cluster - the agent pulls desired state from Git.
- **Reconciliation loop** continuously compares Git with the cluster and automatically corrects divergences - manual kubectl changes are reverted with selfHeal.
- **Rollback in GitOps** = `git revert` - history is preserved, the change is documented, ArgoCD/Flux apply it automatically.
Related Topics
GitOps builds on CI/CD pipelines and IaC tooling:
- CI/CD Pipelines — CI builds images and updates tags in Git; the GitOps agent applies changes to the cluster
- Terraform: IaC Basics — GitOps principles apply to Terraform via tf-controller or Atlantis
Вопросы для размышления
- How should secrets management be organized in GitOps when secrets cannot be stored in Git?
- In what scenarios can automatic selfHeal be dangerous and should be disabled?
- How does GitOps change the incident response process compared to a traditional kubectl-based approach?