Skip to content

Network Part 1 - The OSI Model as a Fault Map

Published: March 27, 2026

In a previous post, we watched a single DNS misconfiguration on one AWS server bring 3,500 companies across 60 countries to a standstill. DNS lives at Layer 7. The failure started there.

This kind of thing repeats. On June 21, 2022, a misconfigured BGP route at Cloudflare blocked 50% of all global HTTP traffic. No server was overloaded. No deployment had gone wrong. Packets simply lost their way and looped endlessly through the network. This time, the failure was at Layer 3.

Both incidents share one thing: it took far too long to find the cause. Because no one knew which layer had failed.

The OSI model is not a taxonomy for networking textbooks. It’s a fault map — a way to pinpoint exactly where a system breaks.

Reference: Cloudflare Blog: Cloudflare outage on June 21, 2022


Before looking at the fault map, we must answer one question: Why divide the OSI model into 7 layers? Wouldn’t it be more efficient if layers could see each other?

The answer lies in Conway’s Law: “Systems design reflects the organization’s communication structure.”

The OSI model is this law in action. Each layer talks only through a strict interface, keeping its inner workings private. Layer 4 doesn’t care if Layer 7 is using HTTP or gRPC. Layer 3 has no interest in whether Layer 4 is TCP or UDP.

This is deliberate ignorance. And that ignorance produces two trade-offs:

  • Freedom to change: System evolution is simplified through decoupling. A migration from HTTP/1.1 to HTTP/2 is confined to Layer 7, leaving the rest of the stack untouched.

  • Fault isolation: System stability is maintained through containment. A routing failure at Layer 3 does not affect the application logic at Layer 7. The impact is restricted to the failing layer itself.

That’s why the Cloudflare outage could be called “a Layer 3 problem” immediately. Without the layered design, the cause would have been buried somewhere in the full stack.

Each layer chose not to know the others. That ignorance is exactly what makes it possible to know which layer broke.

Reference: Martin Fowler: Conway’s Law


Goldratt’s Theory of Constraints is direct: the output of any system is capped by its weakest link. Networks are no exception. But the nature of the bottleneck changes depending on which layer you’re looking at.

Packets travel down from L7 to L1 on the sender’s side — each layer wrapping the data in its own envelope. On the receiving end, they unwrap back up from L1 to L7. Seven layers. Seven handoffs. Under high-volume traffic, one of those handoffs will crack first. The question is: at which layer does it happen, and why?

L4 — Speed was the goal. Awareness was the price.

Layer 4 is deliberately blind to content. It sees an IP address, a port number, a protocol — TCP or UDP — and nothing else. It never opens the packet. Think of it as a courier that delivers sealed envelopes without knowing what’s inside. That’s why it’s fast.

But that choice has structural consequences. Every TCP connection occupies a port. Port numbers top out at 65,535 — with a realistic working range of around 28,000. Once concurrent connections hit that ceiling, the system stops accepting new ones. No exceptions.

L4’s bottleneck is connection count. It doesn’t matter how light your application is; if you run out of ports during a flash sale, your system hits a hard physical wall.

L7 — Awareness was the goal. Speed was the price.

Layer 7 sees everything: HTTP headers, URL paths, cookies, request bodies. It reads the packet, understands the context, and makes decisions accordingly. That’s enormously powerful.

But that knowledge is expensive. Parsing takes time. Authentication takes time. Decompression, routing logic, business rules — they all stack. The per-request Latency at L7 is higher than anywhere below it by design. As traffic scales, those costs don’t just add — they compound.

L7’s bottleneck is computational complexity. It isn’t limited by port numbers, but by the sheer weight of its own intelligence. Every header parsed and every cookie checked is a tax paid in CPU cycles.

L4 stays blind and stays fast. L7 stays aware and pays for it. Neither is a flawed design. They made different trade-offs.

Pull back to all seven layers, and the picture looks like this:

Rate of Saturation → 100%
L7 [████████████░░] Logic Latency spikes ← felt first
L4 [███████░░░░░░░] Concurrency ceiling
L3 [█████░░░░░░░░░] Routing overhead
L1 [███░░░░░░░░░░░] Throughput saturation ← when this goes, everything goes

L7 tends to be the primary bottleneck, whereas L1 failure represents a fundamental system collapse. Under high-volume load, there’s only one question that matters: at which layer has saturation reached 100%?

How to resolve L4 and L7 bottlenecks in practice — that’s Part 4 (Load Balancers).

Reference: Google SRE Book: Monitoring Distributed Systems
Reference: RFC 793: Transmission Control Protocol


The OSI model is more than a classification; it’s a map of potential failure points. Each layer exists as a trade-off: we either sacrifice awareness for speed or accept slowness for deeper insight.

The layer that hits 100% saturation first becomes the system’s bottleneck. These clear boundaries allow us to isolate and fix that specific constraint without disrupting the entire stack.

This is why great engineers don’t panic. They don’t fix the system; they find the layer, then fix the layer.

Next up: A deep dive into Layer 4. We’ll look at the hidden cost of TCP’s 3-way handshake — the process every connection must complete before a single byte of real data moves. Under high load, this ‘handshake’ is anything but cheap.