Backend Transport
Transport Debugging: tcpdump, Wireshark, curl
'Service is not working.' That is the only information from monitoring. Without debugging tools - hours spent on hypotheses. With tcpdump: 30 seconds to see that SYN packets reach the server but no SYN-ACK comes back. Firewall rule changed 10 minutes ago. Incident resolved. The tool, not the intuition.
- **Cloudflare** publicly described an incident where tcpdump revealed a BGP routing loop: packets were circulating between two DCs. Without packet capture this would have been impossible to diagnose from application logs.
- **GitHub** uses Wireshark analysis for diagnosing Git protocol issues. Especially useful for problems with large pack files and the smart HTTP protocol.
- **Netflix** enabled SSLKEYLOGFILE in development environments for Wireshark analysis of HTTPS traffic between services - this accelerated debugging of TLS-related issues.
tcpdump: Packet Capture
tcpdump captures raw network packets at the kernel level, below TLS decryption. It shows TCP handshakes (SYN, SYN-ACK, ACK), RST packets (connection refusals), retransmissions, and connection teardowns. Essential for diagnosing connectivity problems that the application never sees.
tcpdump requires root or CAP_NET_ADMIN. In Kubernetes, use kubectl debug with ephemeral containers or netshoot image. On AWS EC2, VPC Flow Logs provide packet-level visibility without agent installation on instances.
tcpdump shows SYN packets from the client but no SYN-ACK from the server. What does this indicate?
Wireshark: Packet Analysis
Wireshark provides GUI-based deep packet inspection. It decodes protocols (HTTP, gRPC, Kafka wire protocol), shows TCP stream reconstruction, and can decrypt TLS using SSLKEYLOGFILE. Wireshark display filters are more powerful than tcpdump BPF filters for application-level analysis.
SSLKEYLOGFILE is supported by OpenSSL, NSS (Firefox/Chrome), and Node.js. Setting it in development environments allows Wireshark to decrypt HTTPS and gRPC traffic without modifying application code. Never use in production - it exposes all TLS keys.
Wireshark shows many [TCP Retransmission] packets. What does this indicate?
curl for HTTP Debugging
curl -v provides a complete HTTP diagnostic: DNS resolution time, TCP connection, TLS handshake, time to first byte (TTFB), and total transfer time. TTFB minus network round-trip equals server processing time. This isolates whether latency is network or application.
When TTFB is high (>500ms) and TCP/TLS times are normal (<100ms combined), the bottleneck is server-side. Check application traces (Jaeger) and database query execution plans. When TCP time is high, check network path and firewall rules with traceroute.
curl shows: tcp_connect=0.001s, tls_handshake=0.05s, ttfb=2.5s, total=2.51s. Where is the bottleneck?
Message Broker Monitoring
Kafka consumer lag and RabbitMQ queue depth are the primary health metrics for async systems. Uneven lag across partitions indicates a stuck consumer. High queue depth without consumer activity indicates a consumer crash or deadlock.
Kafka consumer lag should be monitored per consumer group and per partition. Total lag hides partition imbalances. An alert on partition-level lag >10K messages with 5-minute persistence catches stuck consumers before they cause business impact.
Kafka consumer lag in partition 1 = 1000 and growing, while other partitions have lag = 5. What needs to be checked?
Transport Troubleshooting Checklist
Systematic transport debugging follows a layer-by-layer approach: physical connectivity -> DNS -> TCP -> TLS -> HTTP -> Application. Starting at the application layer wastes time if the problem is lower in the stack.
50% of production incidents are caused by 5 root causes: DNS misconfiguration, firewall rule change, TLS certificate expiry, connection pool exhaustion, and message queue backup. Checking these first eliminates half of all incidents in under 5 minutes.
Distributed tracing replaces tcpdump and curl for debugging
Tracing works at the application level. tcpdump/Wireshark are needed for network problems that the application cannot see (firewall, packet loss, TLS errors).
If a packet is blocked by a firewall, no tracing span is created at all. tcpdump operates at the network level below TLS and the application - it sees what tracing cannot.
A service returns 503, but curl -v shows successful TCP and TLS. What is the next step?
Summary
- **tcpdump** is the first tool when connectivity fails: shows SYN/SYN-ACK, RST, Retransmissions - network problems that the application never sees.
- **curl -v + timing** - HTTP diagnostics: DNS, TCP, TLS, TTFB each in isolation. TTFB is server processing time.
- **Kafka lag / RabbitMQ queue depth** - first metrics when broker problems occur. Uneven lag across partitions = problem with a specific consumer.
Related Topics
Debugging tools complement observability at different stack levels:
- Distributed Tracing — Tracing shows application-level delays; tcpdump/Wireshark - network-level. Together they cover the entire stack
- Security: mTLS — TLS debugging via curl -v and Wireshark SSLKEYLOGFILE is necessary when configuring mTLS and diagnosing certificate errors
Вопросы для размышления
- How to safely capture traffic in production without risking PII data leakage?
- When is tcpdump insufficient and Wireshark is needed for analysis?
- How to automate network diagnostics as part of CI/CD or health checks?