The Day an Invisible Cloud Limit Crashed My Server (And the Panic That Followed)

One fine holiday morning, I woke up to every developer’s worst nightmare: an early morning call. A client using our ERPGulf system was completely locked out. The server was unreachable. Well, not exactly unreachable, but mostly dead, drifting in and out of consciousness.

I jumped out of bed, fired up my laptop, and went straight to work. I checked the Ubuntu memory—plenty of space. Processors? Barely idling. Hard disk? Miles of room left. Everything inside the server looked absolutely pristine.
In a slight panic, I did what any tech guy does: I restarted the server from inside Ubuntu. Nothing changed. I went into the Oracle Cloud (OCI) console and forced a hard reboot from the infrastructure side. Still nothing. The client was panicking, and honestly, so was I.

I did what we all do when we hit a wall—I opened a high-priority support ticket with OCI. After an agonizing wait, their reply came back: "Everything looks fine on our end. Please contact your network team." That was it. No help. I sat there waiting for meaningful replies for two hours, watching the clock tick, while my client kept pinging me on WhatsApp without a single second of interruption. Every beep of my phone felt like a ticking time bomb.

The Deep Dive into the Unknown
Frustrated and desperate, I dove into a hell of a lot of research. I spent hours prompting AI, digging through old Linux forums, and searching Google for anything that could explain why a completely healthy server would just stop talking to the internet.

And then, I found a clue. A tiny, hidden metric buried deep inside the cloud infrastructure settings: something called the Connection Tracking Table.
Here is what was actually happening under the hood, explained simply.
The Smart Firewall That Suffocated My Server
When you set up a cloud server, you use what’s called a Stateful firewall. It's designed to be smart. When a legitimate user connects to the server, the cloud firewall opens the front door, makes a mental note of who they are, and automatically lets their data back out the back door.
To do this, the cloud keeps a dynamic checklist of every single open conversation. Oracle assigns a maximum size to this checklist—for our server size, it was capped at 6,000 slots.
Under normal circumstances, you never even use 5% of that. But we weren't dealing with normal circumstances.
Our server was being targeted by an incredibly aggressive, automated botnet from the dark corners of the internet. These bots were scanning the web, stumbled upon our custom SSH management port, and started hammering it with thousands of fake login attempts every single second.

Because our firewall was "smart" (Stateful), it tried to remember every single one of these thousands of fake bot connections. The bots would connect, fail, and drop away instantly, leaving behind a massive trail of "ghost" connections. Within minutes, the cloud's mental checklist filled up to 100% capacity.
Once that tracking table was completely full, the cloud router choked. It didn't have any memory slots left to process my real connection, so it started dropping network packets indiscriminately. The server wasn't broken; it was just blind and deaf because its firewall was overwhelmed.

The Fix: Making the Firewall Purely "Blind"
The solution felt completely counterintuitive at first: we had to make the firewall dumber.
We switched our custom SSH port rule from Stateful to Stateless.
A stateless rule doesn't keep a memory checklist. It has absolutely zero memory. It acts like a blind security guard at a gate. It looks at an incoming packet, checks a static rule ("Is this for the SSH port? Yes"), and lets it pass right through without writing anything down in a tracking table.

Because it doesn't write anything down, it consumes zero memory. 100,000 bots can smash into that port simultaneously, and the cloud table utilization stays at a perfect 0%. The bots can spam all day long, our local server software blocks them internally, but the cloud network fabric never chokes.
The moment we flipped that switch, the invisible ceiling vanished. The tracking table cleared out, packets started flowing freely, and the server instantly popped back online.

The Question I Asked Myself: What About Our Web Server?
Once the adrenaline faded and the server was breathing normally again, a glaring question hit me. If making the SSH port stateless was such a magical, zero-resource cure for bot attacks, why wouldn’t I just do the exact same thing for our HTTP and HTTPS web ports?
Our ERPGulf apps run on those ports. They get plenty of traffic and occasional web-bot scans too. Why not just make the entire server stateless and never worry about this tracking table nightmare ever again?
I dug back into the research, and that’s when the second big realization of the day hit me. It turns out, web traffic and SSH terminal traffic are two completely different beasts.
Here is why turning our web ports stateless would have actually broken our client’s system in an entirely different way.

1. The Dynamic Return-Door Problem
When you connect to a server via SSH, it's just one steady, predictable connection between your laptop and the terminal.
But when a user opens a web browser to access an ERP system, the browser behaves like a machine gun. To load a single page quickly, it opens dozens of separate, temporary paths to download images, layout styles, and data charts simultaneously. It uses random, unpredictable "exit doors" on your laptop to do this.
If I made the web ports stateless, the cloud firewall would lose its memory entirely. Every time our web server tried to send a piece of the webpage back to the client's browser, the firewall would say, "I don't remember asking for this," and block its own exit. To fix it, I would have to manually leave thousands of outbound doors wide open to the entire internet. It turns out, managing that is a security and administrative nightmare.

2. The "Hanging" Website (The Ghost Dropped Packets)
The internet relies on a silent, background conversation to keep web data moving fast. If a web server tries to send a data packet that is too large for some random router along the highway to handle, that router sends a tiny message back to the server saying, "Hey, too big, shrink the packet size."

With a Stateful Firewall: The cloud remembers the web session, lets that tiny adjustment message through, the server shrinks the data size, and the website loads instantly for the user.
With a Stateless Firewall: The firewall blindly blocks that incoming adjustment message because it doesn’t match a strict, pre-written rule. The server keeps trying to blast large packets that vanish into a black hole. The result? The website randomly freezes, spins forever, and feels incredibly laggy for the end-user.

3. Web Servers Are Built for the Noise
The final piece of the puzzle was understanding how web software is built.
Our web engines (like Nginx) are naturally designed to handle thousands of rapid, short-lived requests incredibly efficiently. They use clever tricks like "Keep-Alive" lines, which allow a single open connection to process hundreds of requests from a user without burning up new slots.

Furthermore, in heavy enterprise environments, web traffic doesn't hit the server directly—it sits behind a Cloud Load Balancer. These are massive, industrial-grade network shields built to track hundreds of thousands of states at the cloud border, effortlessly absorbing any web-bot floods before they ever touch our actual server.
The Final Blueprint

By the time the sun was fully up and my coffee was cold, I had my ultimate game plan clear in my head:

SSH (Management Ports): Keep them Stateless. They handle low data volumes, have a single predictable path, and making them stateless immunizes our core management line from being choked out by random internet background noise.
HTTP / HTTPS (Web App Ports): Keep them Stateful. Web apps need a smart, remembering firewall to handle complex browser routing, heavy data streams, and external payment or API connections smoothly without breaking.

Stepping back, it’s amazing how a single morning crisis can completely reshape how you look at cloud architecture. I started the day in a panic, but I ended it with a system that is twice as resilient as it was when I woke up.

OUR PRODUCTS

Custom ERPNext Solutions

The Day an Invisible Cloud Limit Crashed My Server (And the Panic That Followed)