K8s won't save you: how I over-engineered my first cloud-native app

Have you ever looked at a codebase you wrote years ago, and your only thought is: “What was I thinking?”

Back in 2020, I was diving head-first into Kubernetes and the cloud-native world to build a notification system. I was so excited to finally be able to deploy my app to the cloud, and I was determined to make it the best it could be.

I had loads of experience with monoliths and “traditional” architectures and I had also worked with some distributed systems before, but it was always within a team of experienced engineers.

This time I was the only one in the company with some microservices experience and I felt like a cloud architect. Today, I see a textbook example of microservice over-engineering and concurrency chaos.

Here is a deep-dive retrospective on the technical debt I built, the code I’d never ship today, and the lessons that cost me hours of on-call debugging.

1. The “Custom Gateway” Delusion

One of the heaviest components in the project was a custom api gateway. Initially I thought I was building a simple reverse proxy and I didn’t need to use any framework or existing solutions like NGINX or Envoy but I soon realized I was building a full-fledged gateway with authentication, rate limiting, and request forwarding.

The Roast:

I was solving infrastructure problems with application code. By building a custom gateway, I created a massive maintenance burden for a feature set that NGINX or a managed service like Google API Gateway could have handled out of the box. The complexity lived in my code instead of being handled by the platform.

The 2026 Lesson:

Don’t write what you can rent. Infrastructure should be transparent. If you find yourself writing complex boilerplate just to move bytes from A to B, you are likely over-engineering the solution.

2. Unbound Concurrency: The Go Memory Leak

In the notification service, I used a standard Go channel to process incoming messages from Pub/Sub. On paper, it looked fine. In reality, it was a ticking time bomb.

The 2020 Mistake:

1
2
3
4
5
6
7
8
// 2020 Version: The "Fire and Hope" Pattern
func (p *Processor) Start() {
    for msg := range p.PubSubChannel {
        // Spawning an unlimited number of goroutines
        // This is an OOMKiller waiting to happen
        go p.handleNotification(msg)
    }
}

The 2026 Refactor:

Today, I’d use a bounded worker pool to ensure the system stays stable under load, regardless of how many messages are in the queue.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
// Bounded Concurrency
func (p *Processor) Start(workerCount int) {
    var wg sync.WaitGroup
    jobChan := make(chan Message, workerCount)

    for i := 0; i < workerCount; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for msg := range jobChan {
                p.handleNotification(msg)
            }
        }()
    }

    for msg := range p.PubSubChannel {
        jobChan <- msg
    }
    close(jobChan)
    wg.Wait()
}

3. The “Happy Path” Implementation

I did use a message queue (Pub/Sub) to decouple the services, which was a step in the right direction. However, the implementation was incredibly fragile because it assumed the “Happy Path”.

The 2020 Mistake:

1
2
3
4
5
6
7
func (p *Processor) handleNotification(msg Message) {
    err := p.EmailProvider.Send(msg)
    if err != nil {
        // The message is simply logged and then... gone.
        log.Printf("failed to send: %v", err)
    }
}

The Roast:

The system lacked any concept of resiliency patterns. If the downstream provider failed, the data simply vanished.

The 2026 Refactor:

A senior implementation expects failure. We use exponential backoff and a DLQ to ensure data integrity.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
func (p *Processor) handleNotification(msg Message) {
    retryPolicy := backoff.NewExponentialBackOff()
    
    err := backoff.Retry(func() error {
        return p.EmailProvider.Send(msg)
    }, retryPolicy)

    if err != nil {
        p.MoveToDeadLetterQueue(msg, err) 
        log.Errorf("Permanently failed to send: %w", err)
    }
}

4. Resource Blindness in GKE

I deployed the app to GKE without setting a single resource limit or request in my YAML manifests.

The Roast:

I assumed Kubernetes was “smart” enough to manage the resources. Because my Go binary was quite heavy, a single node would often get “starved,” leading to cascading failures across the cluster. I was treating a shared cluster like a dedicated VM with infinite resources.

The 2026 Lesson:

Resource requests and limits are non-negotiable. In a cloud-native environment, you must tell the scheduler exactly what you need. This is the difference between a stable system and one that restarts every time there’s a minor traffic spike.

Conclusion: Simplicity is a Senior Feature

Fortunately the system never had to handle high traffic conditions but the occasional spikes in traffic would cause the cluster to become unstable and I had to deal with the consequences of my over-engineering.

The project was a failure of architecture, but a success in education, I have learned a lot about k8s, cloud-native and go concurrency patterns. It taught me that a “Senior” isn’t someone who knows how to use every feature of Kubernetes, it’s someone who knows when to keep things simple.