Optimizing Searches: Handling High Parallel Traffic

Handling High Parallel Traffic with Persistent TCP Connections

Some clients send many parallel HTTP calls through one long-lived TCP connection. This may cause all traffic to be routed to the same backend server, which leads to slower responses and uneven load distribution. This page explains why this happens, how it affects performance, and what you should do to avoid it.

Overview

Our load balancer keeps each TCP connection attached to the same server instance for a minimum of 4 minutes. When a client reuses the same TCP connection for all parallel requests:

  • Every request goes to the same backend node.
  • Other nodes stay mostly idle.
  • The active node becomes overloaded.
  • Response times increase under high concurrency.
📘

In other words:

One long-lived connection + many parallel requests = one overloaded server.

Why Keep-Alive Isn’t the Fix

This issue is not related to the Connection: keep-alive header.

  • Keep-Alive only controls whether the HTTP connection stays open.
  • It does not affect load balancing.
  • It does not cause or prevent the issue.

The root cause is sending all parallel requests through a single TCP socket, regardless of headers.


What You Should Do

If your application sends large batches of parallel requests, you should occasionally recreate the TCP connection so the load balancer can distribute traffic across multiple server instances. This can be done by:

  • Recreating the HTTP client between batches
  • (for example: a new Session in Python, a new Agent in Node.js, a new HttpClient in Java).
  • Using a pool of multiple connections instead of one long-lived one.
  • Avoiding scenarios where all concurrent requests go through the same socket.

You do not need to recreate the connection on every request. Only do this when dealing with high parallelism or bursts.

Why It Matters

If You Keep One TCP Connection All traffic is pinned to one backend instance. Latency increases under load. Throughput decreases. You may see slowdowns, timeouts, and inconsistent performance.

If You Refresh TCP Connections Periodically Requests spread across all server instances. Load is balanced more evenly. Response times stay stable. Scalability improves for heavy workloads. Fewer spikes and failures during peak traffic.

Best Practices

If you run parallel batches, recreate the connection after each batch. For ongoing high concurrency, prefer multiple HTTP client instances or a connection pool. Do not rely on Keep-Alive headers to solve this problem. They do not influence the load balancer’s routing logic.