Deep Learning

Frameworks: PyTorch vs TensorFlow

2019. Google openly concedes PyTorch is winning in research and ships TensorFlow 2.0 with a completely redesigned API. Tesla Autopilot migrates from TF to PyTorch. Hugging Face builds its entire hub on PyTorch. The outcome looks decided. But TFLite runs on 3 billion mobile devices, and TF Serving anchors production at Google - nobody is moving that.

**Meta (Facebook):** PyTorch powers the recommendation system handling one trillion inferences per day
**Google:** TensorFlow runs in Search, Gmail, Google Photos, YouTube - billions of requests daily
**Tesla Autopilot:** started on TensorFlow, migrated to PyTorch - an instructive industry migration story

Soumith Chintala and the fight for the dynamic graph

In 2016 Soumith Chintala at FAIR wrote PyTorch in a few weeks, pulling ideas from Torch (a Lua library) and Chainer (a Japanese framework with define-by-run). The whole FAIR team agreed: researchers do not want Sessions and placeholders - they want to write Python. The intuition was right. Within two years PyTorch took the top spot in academic publications. The classic ML pattern: the most ergonomic tool wins, not the most capable one.

Предварительные знания

Backpropagation: How Neural Networks Learn

PyTorch: define-by-run

**January 2016. Facebook AI Research drops PyTorch** - a framework that flipped the script on what neural network code should look like. Instead of describing a computation graph in a custom language (as TensorFlow 1.x demanded), PyTorch let neural networks be written as plain Python. The approach is called **define-by-run**: the computational graph is built on the fly during every forward pass.

**PyTorch's philosophy** is 'Python first'. Standard Python constructs (if, for, print) work inside the model. Breakpoints, pdb, intermediate value prints - all of it just works. For researchers this was a revolution after the 'black box' of TensorFlow 1.x. By 2023 over 80% of papers at NeurIPS and ICML ship in PyTorch.

**PyTorch ecosystem:** torchvision (computer vision), torchaudio (audio), torchtext (NLP), PyTorch Lightning (high-level wrapper), Hugging Face Transformers (pretrained models). Meta runs PyTorch in a recommendation system serving one trillion inferences per day.

**model.train() and model.eval()** - never skip the switch! Train mode: dropout randomly zeros neurons, batch norm uses current batch statistics. Eval mode: dropout off, batch norm uses accumulated statistics. Skip the switch and validation results turn flaky.

What does the define-by-run approach in PyTorch mean?

TensorFlow: from graph to Keras

**TensorFlow shipped in November 2015 out of Google Brain.** Version 1.x took the opposite philosophy from PyTorch: first describe the computational graph in a custom language, then run it inside a Session. This approach - **define-and-run** - opened the door to optimization but made debugging brutal.

**TensorFlow 2.0 (2019) tore up the playbook.** Google admitted PyTorch's ergonomics had won and shipped three big changes: eager execution by default, Keras as the primary API, and stripped-down code. Tesla Autopilot migrated from TF to PyTorch - a cautionary tale for the industry.

Component	Purpose	PyTorch equivalent
TensorFlow Core	Low-level tensor operations	torch
Keras	High-level API for models	torch.nn + Lightning
TFLite	Deploy on mobile devices	PyTorch Mobile / ExecuTorch
TF.js	Run in the browser	ONNX.js
TF Serving	Production inference server	TorchServe
TFX	ML pipeline (from data to deploy)	MLflow + Kubeflow

**TensorFlow's edge: the production ecosystem.** TFLite runs on billions of mobile devices. TF.js runs in the browser without a server. TF Serving handles millions of requests per second at Google. Search, Gmail, Google Photos, YouTube - all on TensorFlow.

**Do not mix up TF 1.x and TF 2.x** - effectively two different frameworks. Most complaints about TensorFlow target version 1.x. TF 2 with Keras is a modern, convenient tool. Check the version when reading old tutorials.

What was the main change introduced by TensorFlow 2.0?

Eager Execution: compute immediately

**Eager execution** is the mode where operations run immediately, like normal Python. Write `a + b` - get the result on the spot, not a description of some future computation. PyTorch worked this way from day one. TensorFlow flipped to it as the default in version 2.0.

**Eager execution's killer feature: debugging.** Drop print() anywhere in the model to see real values. Set breakpoints in pdb. Standard Python profiling tools just work. Critical for research, where models are experimental and crawling with bugs.

Property	Eager Execution	Graph Mode
Computation	Immediate	Deferred (compile -> run)
Debugging	print, pdb, breakpoints	Harder - needs special tools
Python control flow	if/for work natively	Need tf.cond / tf.while_loop (TF1)
Speed	Baseline	Optimized (operator fusion, etc.)
Usage	Research, prototyping	Production, deployment

**Rule of thumb:** stick with eager execution during development and debugging. Flip to graph mode (torch.compile, tf.function) once the model is production-ready. Most researchers never leave eager mode - optimization only matters at scale.

Why is eager execution more convenient for debugging neural networks?

Graph Mode: optimization for production

**Eager execution is convenient but slower.** Every operation hits the Python interpreter, allocates intermediate tensors, fires commands at the GPU one at a time. **Graph mode** ingests the full computational graph at once and optimizes it: fuses operations (operator fusion), strips redundant computations, plans memory.

**What does the compiler actually do?** Operator fusion - merging multiple operations into one (Linear + ReLU = one GPU kernel instead of two). Memory planning - recycling memory from tensors no longer needed. Constant folding - pre-computing static expressions. None of these optimizations are reachable in eager mode because the framework only sees one operation at a time.

**torch.compile() is not a free speedup.** The first call is slower due to compilation (sometimes minutes). For small models the compile overhead may never pay off. Dynamic tensor shapes (varying batch length) can trigger recompilation. Start without compile - add it once speed becomes the bottleneck.

**Practical advice for 2026:** new project? Start with PyTorch. Mobile deployment? Consider ONNX or ExecuTorch. Browser? TensorFlow.js or ONNX Runtime Web. Peak inference speed on NVIDIA? TensorRT. The 'framework wars' are over - pick what fits the task.

PyTorch is for research, TensorFlow is for production. Each framework only suits its own niche.

This claim is outdated. PyTorch 2.0 with torch.compile(), TorchServe, and ExecuTorch has closed the production gap. TensorFlow 2.x with Keras and eager execution became more convenient for research. Both frameworks can be used for the full cycle.

Historically (2016-2019), PyTorch was more convenient for experiments while TensorFlow had a more mature production ecosystem. But since 2020 both frameworks have been actively borrowing the best ideas from each other. Meta and Google use their frameworks for both scenarios.

What does torch.compile() do under the hood?

Key Ideas

**PyTorch** - define-by-run, Pythonic API, the standard for research (80%+ papers). Code reads like plain Python
**TensorFlow** - from static graph (TF1) to eager execution and Keras (TF2). Strong production ecosystem (TFLite, TF.js, TF Serving)
**Eager execution** - operations execute immediately. Convenient for debugging and research. Default in both frameworks
**Graph mode** (torch.compile, tf.function) - optimizes computations: operator fusion, memory planning. Needed for production

Вопросы для размышления

Why did PyTorch win in research despite TensorFlow coming out earlier and having Google's backing?
torch.compile() and tf.function() convert eager code into an optimized graph. Why is eager mode needed at all - why not always compile?
Starting a new ML project today - which framework to choose and why?

Связанные уроки

dl-02 — Backpropagation is the algorithm PyTorch autograd and TF GradientTape implement differently
dl-04 — CNN architectures are built with PyTorch in subsequent lessons
ml-09-gradient-descent — Adam/SGD optimizers are concrete gradient descent implementations in these frameworks
ml-45-mlops-pipeline — Model deployment depends on framework: TorchServe vs TF Serving vs ONNX
dl-01 — Computational graph concept is introduced in the first DL lesson
ml-28-optimizers — Optimizers explain which framework API to choose
ml-25-neural-networks