AllTutorials

How Varnish Request Coalescing Can Save Your Backend: A Hands-On Demo

When multiple users hit the same uncached URL at once, most servers crumble under the pressure. Varnish, however, has a secret weapon: request coalescing. In this article, we explore how it works, why it matters, and how you can see its performance benefits firsthand using a simple Dockerized demo app. Whether you’re scaling a high-traffic site or just curious about request handling strategies, this guide has you covered.
Avatar photo
by Furkan OZTURK
Full-stack Developer
Published: May 1, 2025 01:24
Modified: May 4, 2025 00:26

One of the lesser-known but powerful features of Varnish is request coalescing—a performance optimization that helps dramatically reduce backend load under bursty or high-concurrency conditions.

In this article, we’ll walk through what request coalescing is, why it matters, and how you can see it in action with a Dockerized demo application I’ve prepared.

🧠 What Is Request Coalescing?

Request coalescing is a mechanism where Varnish groups simultaneous requests for the same resource into a single request to the backend.

Without Coalescing

If 100 clients simultaneously request the same uncached resource, all 100 requests will hit your backend—wasting compute, memory, and possibly causing failures under pressure.

With Coalescing

With request coalescing, Varnish detects that the response is not yet in cache and deduplicates requests—only one request goes to the backend, and once that response is ready, Varnish shares it with all 100 waiting clients. This not only protects your backend but also improves perceived performance for users.

Request Coalescing as a Protective Barrier

Without request coalescing, identical requests that arrive at the same time all reach the backend. If 100 users request the same slow-loading page simultaneously, the backend must process the same expensive operation 100 times in parallel. This quickly creates a bottleneck, consuming CPU, memory, and I/O resources, and leading to slower response times for everyone. Request coalescing acts as a protective barrier: Varnish detects these duplicate requests, allows only the first one to pass through, and then serves the same response to the others once it’s ready. This prevents unnecessary backend load and helps keep the system fast and responsive, even during traffic spikes.

🔧 Try It Yourself: Dockerized Demo App

To help you observe request coalescing in action, I’ve built a minimal Dockerized app that simulates a slow backend endpoint and wraps it with Varnish.

The demo application creates a MySQL table named employees with the following columns:

  • id: integer (primary key)
  • name: string
  • bio: text

It then seeds the table with 50,000 records to simulate a realistic dataset.

To create a performance bottleneck, the PHP backend executes a LIKE query on the bio column—something that typically causes a full table scan and is computationally expensive.

  • When a single request hits the server, the query completes without issue.
  • But when multiple concurrent requests are made, MySQL slows down significantly, as it struggles to handle the repeated heavy queries in parallel.

This is where Varnish’s request coalescing comes in:

Instead of hammering the database with the same query over and over, Varnish ensures that only the first request hits the backend, while all others wait for that result and get it from the cache. The result? Drastically reduced load and faster response times under pressure.

🐳 Getting started

Clone the repo:

git clone https://github.com/itsjjfurki/varnish-nginx-php-mysql.git
cd varnish-nginx-php-mysql

and follow the README.md for further instructions.

🔍 Under the Hood: How It Works in Varnish

Varnish uses an internal structure called the “busy object” to track in-progress cache fetches. When a request comes in and a response is being fetched for that URL, subsequent requests wait on that “busy object” rather than triggering new fetches.

🚀 Why It Matters

Request coalescing is especially valuable in:

  • High-traffic scenarios with frequent cache misses
  • CDN layers protecting slow/expensive origins
  • Microservices setups with shared endpoints

By implementing it, you reduce backend duplication, improve response time, and increase scalability—all with minimal configuration.

🧪 Try It and Tell Me What You Think

Let me know if you try the demo or if you run into anything unexpected! You can open an issue on the GitHub repo or shoot me a message via the contact form on this site.