PostgreSQL
Installation and Configuration
Цели урока
- Install PostgreSQL on Ubuntu, macOS, or RHEL and understand what initdb does
- Configure postgresql.conf using SHOW/SET/ALTER SYSTEM
- Set up pg_hba.conf for secure access control
- Tune five key performance parameters for specific hardware
- Run production-ready PostgreSQL in Docker with volumes and healthchecks
Предварительные знания
A developer installs PostgreSQL, starts the application, and everything works. A month later, traffic grows - queries slow down, connections time out, disk I/O hits 100%. The problem is not PostgreSQL itself; it is that shared_buffers at 128 MB on a server with 32 GB of RAM is a lawnmower engine in a Ferrari.
- **A startup on Heroku** experiences a 10x traffic spike overnight - memory parameters must be reconfigured in minutes without downtime
- **A DevOps engineer** provisions PostgreSQL in Kubernetes - docker-compose, volumes, health checks, custom configuration
- **A developer** receives `FATAL: no pg_hba.conf entry` when connecting from a new IP - understanding pg_hba.conf is the only path forward
- **A team** migrates to a new server - forgot to transfer pg_hba.conf, now nobody can connect
- **A production incident** - work_mem = 512 MB with 200 concurrent complex queries consumed all RAM and the server went down
Michael Stonebraker and the POSTGRES Project
Stonebraker launched POSTGRES at UC Berkeley in 1986 as a successor to Ingres. The core idea was extensibility: user-defined types and rules that no other database offered. In 1996 the project went open-source and added SQL, renaming to PostgreSQL. Stonebraker received the Turing Award in 2014. Today PostgreSQL is the only database in the top five not owned by a corporation.
Installing PostgreSQL
Getting PostgreSQL up and running takes five minutes. A careless installation can cost hours of debugging later. This section covers installing PostgreSQL on any OS, what initdb actually does, and how to manage multiple versions side by side.
**initdb** is the command that creates a database cluster: the directory structure, system catalogs, and configuration files. On Ubuntu/Debian, initdb runs automatically during package installation. On RHEL/Fedora, it must be invoked manually.
**On Ubuntu, multiple clusters can coexist** on the same machine. The pg_lsclusters utility lists all installed clusters:
**pgenv** is a PostgreSQL version manager analogous to nvm for Node.js. Particularly convenient during development when testing against multiple versions: `pgenv install 17.2 && pgenv use 17.2`
What does the initdb command do?
postgresql.conf - the Main Configuration File
**postgresql.conf is the control center of PostgreSQL.** It hosts hundreds of parameters - from the address the server listens on, to buffer sizes and query planner behavior. The file resides in the cluster directory (`/var/lib/postgresql/17/main/postgresql.conf` on Ubuntu).
**There are three ways to change parameters** - and each has its own scope:
**ALTER SYSTEM writes to postgresql.auto.conf**, not to postgresql.conf. Parameters in auto.conf take precedence. If postgresql.conf is edited manually but the value does not apply, check whether it is being overridden in auto.conf.
**listen_addresses = 'localhost'** means PostgreSQL accepts connections only from the local machine. To allow external access, change it to `'*'` or a specific IP - but always configure pg_hba.conf accordingly to maintain security.
ALTER SYSTEM SET work_mem = '64MB' was executed. When does the change take effect?
pg_hba.conf - Who Can Connect
**pg_hba.conf (Host-Based Authentication)** is PostgreSQL's firewall. It determines who can connect, from where, and by what method. Even with a correct password, a connection is rejected if no matching rule exists in pg_hba.conf. This is why `FATAL: no pg_hba.conf entry for host` is one of the most common errors on a first deployment.
| Method | Description | When to use |
|---|---|---|
| trust | No password - unconditional trust | Local development only. NEVER in production. |
| peer | OS username must match PostgreSQL username | Local Unix sockets (sudo -u postgres psql) |
| md5 | Hashed password (legacy) | Legacy compatibility only |
| scram-sha-256 | Modern authentication (since v10) | Production - the de-facto standard |
| cert | Client SSL certificate | Maximum security, no passwords |
| reject | Reject the connection | Explicit denial for specific users or IP ranges |
**Rule order is critical!** PostgreSQL evaluates records top to bottom and applies the FIRST matching rule. If `reject` appears after `trust` for the same address range, the reject rule will never be reached.
**A common first-deployment error:** `FATAL: no pg_hba.conf entry for host "192.168.1.5"`. The client's IP address does not match any rule in pg_hba.conf. Add a rule for that IP or subnet.
pg_hba.conf contains: `host all all 0.0.0.0/0 trust`. What does this mean?
Key Configuration Parameters
Of PostgreSQL's 300+ parameters, five account for 90% of performance. PostgreSQL ships with conservative defaults intentionally - they are sized for a machine with 512 MB of RAM. On a server with 32 GB, that means shared_buffers = 128 MB uses less than 1% of available memory. The result: the server reads from disk constantly instead of from cache.
| Parameter | Default | Recommendation | What it does |
|---|---|---|---|
| shared_buffers | 128 MB | 25% RAM | Size of the main buffer cache. All data passes through it. |
| effective_cache_size | 4 GB | 50-75% RAM | A hint to the planner about available OS page cache. Does not allocate memory! |
| work_mem | 4 MB | 32-256 MB | Memory per sort/hash operation. Caution: multiplies across operations and sessions. |
| maintenance_work_mem | 64 MB | 512 MB - 2 GB | Memory for VACUUM, CREATE INDEX, ALTER TABLE. |
| max_connections | 100 | 50-200 | Maximum concurrent connections. Each connection uses ~5-10 MB of RAM. |
**work_mem is the most deceptive parameter.** A single complex query can consume work_mem multiple times - once per sort or hash operation. With work_mem = 256 MB and a query containing four sort operations, that is 1 GB of RAM for a single query. Multiply by 50 concurrent queries and the total reaches 50 GB.
**effective_cache_size does not allocate memory!** It is a hint for the query planner: "when choosing between an Index Scan and a Sequential Scan, assume that ~12 GB of data may already be cached by the OS." Increasing this value makes the planner more willing to use indexes.
**pgtune** (https://pgtune.leopard.in.ua/) is an online PostgreSQL tuning calculator. Enter server specifications and workload type (OLTP/OLAP/Mixed) to receive optimized values for all key parameters.
A server has 32 GB RAM, SSD storage, and serves an OLTP workload. What is the optimal value for shared_buffers?
PostgreSQL in Docker
**Docker is the fastest way to run PostgreSQL for development.** One command, a working server in ten seconds. There are, however, a few traps worth knowing about before starting - the kind that silently destroy data.
**Without a volume, all data will be lost!** The container stores its data internally. Running `docker rm pg-dev` will permanently destroy every database, table, and row. Always mount a volume for the `/var/lib/postgresql/data` directory.
**shm_size** is an important Docker parameter. By default, Docker allocates only 64 MB of shared memory, but PostgreSQL may require more - especially when shared_buffers is increased. The error `could not resize shared memory segment` is the signal to increase shm_size.
**For Mac M1/M2:** when building a Docker image for deployment on an x86_64 server, always specify the target platform: `docker build --platform linux/amd64`. Without this flag, the result is an `exec format error` on the server.
PostgreSQL's default settings are fine for production - just install and run
The defaults are tuned for minimal resources: shared_buffers=128MB, work_mem=4MB, max_connections=100. Production deployments require tuning these values to match the actual server hardware.
PostgreSQL intentionally ships with conservative defaults so it can start even on a machine with 512 MB of RAM. On a server with 32 GB, those defaults use less than 1% of available capacity. Without tuning, shared_buffers is so small that PostgreSQL reads from disk constantly, and a tiny work_mem causes sort operations to spill to disk.
PostgreSQL was started with `docker run` without a `-v` volume flag. What happens when `docker rm` is run?
Key Takeaways
- **Installation:** apt/brew + initdb creates the cluster. Use pg_lsclusters to manage multiple versions on Ubuntu
- **postgresql.conf** is the control center. SHOW/SET to inspect values, ALTER SYSTEM for persistent changes, pg_reload_conf() to apply without a restart
- **pg_hba.conf** is PostgreSQL's firewall. Rule order matters: the first match wins. Use scram-sha-256 in production, trust only locally
- **Five key parameters:** shared_buffers (25% RAM), effective_cache_size (75% RAM), work_mem (handle with care), maintenance_work_mem, max_connections
- **Docker:** always mount a volume, use healthcheck, pass parameters via command, shm_size >= 128 MB
- shared_buffers = 128 MB on 32 GB RAM - that is 0.4% utilization of the server's capacity
Related Topics
PostgreSQL configuration is the starting point for tuning and security:
- Data Types — Next step: what data types PostgreSQL offers and how to choose among them
- Memory Tuning — shared_buffers, work_mem and other memory parameters in detail
- Security — Advanced pg_hba.conf configuration, SSL, and Row-Level Security
Вопросы для размышления
- With 100 microservices each having a connection pool of 10 - will max_connections = 100 be sufficient? What needs to change?
- Why is the recommended shared_buffers 25% of RAM rather than 75%? Where does the rest of the memory go?
- After migrating to a new server, pg_hba.conf was not transferred. What error appears, and how is it fixed?
Связанные уроки
- pg-01-intro — PostgreSQL architecture - context for understanding configuration
- pg-03-data-types — Next step: PostgreSQL data types
- pg-43-tuning-memory — Deep dive into shared_buffers, work_mem, buffer cache
- pg-48-security — Advanced pg_hba.conf, SSL, Row-Level Security
- pg-45-tuning-connections — max_connections, PgBouncer - connection pooling in production
- db-01-intro — Foundational understanding of why a DBMS is needed
- db-03-acid