Network Part 4 - Traffic Distribution, Where Do You Split the Load?

Published: April 29, 2026

October 4, 2021. Facebook, Instagram, and WhatsApp went dark simultaneously for nearly six hours. No server crashed. No code was deployed. A routine maintenance command accidentally withdrew all of Facebook’s BGP routes — the instructions that tell the rest of the internet how to reach Facebook’s data centers. With no route to follow, traffic had nowhere to go. Facebook had vanished from the internet.

The servers were running. The load balancers were running. Everything was fine — except that no traffic could reach any of it. This is what happens when traffic distribution breaks at the routing level. It doesn’t matter how well you’ve built the system behind the load balancer if requests can’t find their way in.

Not all load balancers work the same way. Some look only at the outside of a packet and route it fast. Others read what’s inside before deciding where to send it. In Part 1, we established that L4 is fast because it doesn’t look inside, and L7 is slower because it does. Load balancers face the same choice. Which layer do you split traffic at?

DNS-Based Load Balancing — The Limits of the Simplest Approach

The most primitive form of load balancing starts at DNS. Register multiple server IPs under a single domain, then rotate which IP gets returned with each request. This is DNS round-robin.

Cloudflare Learning: What is round-robin DNS?
Cloudflare Learning: What is DNS load balancing?

                ┌───────────────┐
                │     Client    │
                └───────────────┘
                        ↓
              "What's example.com?"
                        ↓
    ┌──────────────────────────────────────┐
    │               DNS Server             │
    │  (Returns a different IP each time)  │
    └──────────────────────────────────────┘
        ┌───────────────┼───────────────┐
  [1st request]   [2nd request]    [3rd request]
       ↙                ↓                ↘
 ┌────────────┐   ┌────────────┐   ┌────────────┐
 │  Server A  │   │ Server B   │   │  Server C  │
 │192.168.0.1 │   │192.168.0.2 │   │192.168.0.3 │
 └────────────┘   └────────────┘   └────────────┘

[Structural limits]

✗ Blind to server state
  → DNS keeps returning Server A even when it's overloaded
✗ Can't detect failures
  → DNS keeps responding with Server B's IP even after it goes down
✗ TTL caching
  → once a client receives an IP, it keeps hitting that server until TTL expires

The design of DNS round-robin looks flawless on paper. Imagine a theme park with three parking lots — A, B, and C. The navigation app at the entrance distributes cars in order: first car to A, second to B, third to C. Arithmetically perfect.

But the navigation app doesn’t check with the server every second. Once it gets directions, it trusts them for a fixed window of time. It sets a timer — “this information is valid for 10 minutes” — and doesn’t re-query the server until that timer runs out. That’s TTL, the expiration date on a piece of information.

This is where the bottleneck forms. Picture a convoy of tour buses rolling in — hundreds of cars in a single column. The moment the lead car receives directions to Lot A, every car behind it copies that information and heads straight there without ever asking the server. The answer “A is the right call right now” is already locked into every device in the convoy.

The server is ready to send the next group to B and C. But no one’s asking. Lot A is gridlocked from the entrance, while B and C sit completely empty.

The server distributed traffic correctly. The only problem is that the cars held onto their directions for 10 minutes and never let go. DNS round-robin can decide who gets what information — but it can’t control how long they hold onto it.

DNS round-robin looks like load balancing. In practice, it’s just blind rotation.

L4 Load Balancer — Route Without Opening the Packet

L4 load balancers follow the same philosophy as the L4 layer from Part 1. They never look inside the packet. They check the destination address (IP) and the door number (port), then decide which server to send it to.

[Transport Layer]

            ┌───────────────────┐
            │   Client Request  │
            └───────────────────┘
                      ↓
      ┌─────────────────────────────────┐
      │         L4 Load Balancer        │
      │                                 │
      │         ✓ IP address            │
      │         ✓ Port number           │
      │ ✗ Packet content (never opened) │
      └─────────────────────────────────┘
        ↙             ↓             ↘
  ┌───────────┐ ┌───────────┐ ┌───────────┐
  │ Server A  │ │ Server B  │ │ Server C  │
  └───────────┘ └───────────┘ └───────────┘
( Distributed by IP hash or least connections )

Not reading the content means processing is fast. It can handle millions of concurrent connections. Game servers — where thousands of clients are holding simple TCP connections simultaneously — are a natural fit.

The trade-off is clear. Because L4 never opens the packet, it can’t route based on what’s inside. Sending /api/payments to the payments server and /api/products to the product server isn’t possible at L4. It doesn’t know the difference.

L7 Load Balancer — Read the Packet, Then Decide

L7 load balancers open the packet and read it. HTTP headers, URL paths, cookies, even the request body. They understand what the request is asking for before deciding where to send it.

[Application Layer]

                       ┌───────────────────┐
                       │   Client Request  │
                       └───────────────────┘
                                 ↓
            ┌─────────────────────────────────────────┐
            │         L7 Load Balancer Check          │
            │                                         │
            │       ✓ IP address / Port number        │
            │           ✓ HTTP  method / URL          │
            │             ✓ Host header               │
            │        ✓ Cookies / Request Body         │
            └─────────────────────────────────────────┘
           ↙                     ↓                  ↘
  ┌─────────────────┐  ┌──────────────────┐   ┌─────────────┐
  │  Payment Server │  │  Product Server  │   │ User Server │
  └─────────────────┘  └──────────────────┘   └─────────────┘
                      ( Routed by URL path )

Reading the URL means /api/payments goes to the payments server and /api/products goes to the product server. Reading cookies makes Session Persistence possible. If a user’s shopping cart is stored on Server A, that user needs to keep hitting Server A — otherwise the cart disappears. L7 reads the user ID from the cookie and routes every subsequent request from that user to the same server.

The cost is structural. Every request has to be parsed and interpreted before it can be routed. That overhead is higher than L4 by design. As traffic scales, the cost compounds.

L4 vs L7 — The Bottleneck Tells You Which One to Use

	L4 Load Balancer	L7 Load Balancer
Sees	IP address, port number	HTTP headers, URL, cookies, request body
Speed	Fast	Slower
Routes by	Connection count, IP hash	URL path, cookies, headers
Can do	Simple TCP distribution	Content-based routing A/B testing Session persistence
Common use	Game servers High-volume streaming	API Gateway Microservices

This is where Goldratt’s Theory of Constraints — first introduced in Part 1 — applies directly. The constraint isn’t always the same. It depends on where the system is closest to 100% saturation. The same principle that told us which OSI layer was bottlenecked now tells us which load balancer to reach for.

If concurrent connections are approaching the ceiling, L4. If requests need to be routed based on their content, L7. In practice, many production systems run both in layers — L4 takes the initial traffic and splits it into server groups, L7 handles fine-grained routing within each group.

HAProxy Blog: Layer 4 and Layer 7 Proxy Mode

Risk Pooling — Where You Split Determines the Cost

Supply chain management has a concept called Risk Pooling. Whether you consolidate inventory in one warehouse or spread it across multiple regional locations changes the total cost of your operation.

A centralized warehouse handles all orders from one location. Inventory management is simple. Demand spikes anywhere can be absorbed by the same pool. But a customer in Miami ordering from a warehouse in Chicago waits an extra day. Efficient to manage, slow to respond.

Regional warehouses flip the trade-off. A customer in Miami gets same-day delivery from a local facility. Fast. But each warehouse needs its own inventory, its own staff, its own operations. If the Miami warehouse runs out of stock, restocking from Atlanta isn’t instant. Fast, but expensive.

Load balancers follow the same logic. Where you split the traffic determines the latency cost.

L4 is closer to the centralized model. Everything routes through one fast layer without reading the contents. Throughput is unmatched, but content-based decisions aren’t possible. Fast, but coarse.

L7 is closer to the regional model. Every request gets read and sent to the right destination. Precise, but every read costs time. Accurate, but slower.

There’s no universal answer. It depends on what you’re building.

A game server handling tens of thousands of simultaneous players needs every connection it can get. There’s no time to open packets. L4 is the answer. A microservices platform where a misrouted payment request breaks the entire checkout flow needs precision. Reading the URL and routing correctly is worth the overhead. L7 is the answer.

Speed first, or accuracy first. That’s the question L4 and L7 are each answering.

HBR: Supply Chain Management and Risk Pooling

The Bottom Line

DNS round-robin assigns turns without knowing server state. L4 routes without opening the packet. L7 reads before routing. Each approach makes a different call on where to absorb the cost — and none of them is universally right.

Choosing where to split traffic isn’t a technical decision. It’s a trade-off decision. The same question this series has been asking since Part 1: where is the constraint, and what are you willing to give up to resolve it?

Know where the bottleneck is, and you’ll know where to split.

Next up: everything covered so far — OSI layers, TCP handshake costs, HTTP evolution, load balancing — comes together in real systems. Three scenarios: an e-commerce platform, a live chat service, and a payment system. Where does the bottleneck form, and which choices resolve it?