Skip to main content
Back to Blog
engineering

The Night My Own Login Broke: A Production War Story From a Solo Launch

21 April 202611 min read
AuthProductionSupabaseNext.jsDebuggingHavnwright
Share:

A Note on Expertise

I'm not writing as an "expert" or claiming to have all the answers. I'm a builder sharing my journey on what worked, what didn't, and what I learned along the way. The tech landscape changes constantly, and with AI tools now available, the traditional notion of "expertise" is evolving. Take what resonates, verify what matters to you, and forge your own path. This is simply my experience, offered in the hope it helps fellow builders.

The most important thing I can tell you about launching a platform is that development and production are not the same thing. You can test every feature twenty times in dev, you can run end-to-end checks until you are sick of them, and production will still find something you missed. This is the story of the night I learned that, on my own, at two in the morning, with nobody else to call.

Why I launched at midnight

There is no rational reason for this. I had spent months on the platform. I had spent the entire day before the launch auditing every feature, going through session after session with the agents, compacting conversation history constantly because I was still on 200K context windows at the time. It was hectic. I was tired.

When I decided to go live, it was around midnight. I could tell you I picked that time because it was a low-traffic moment, or because I wanted to catch the early-morning crowd with a surprise launch, but that would be dressing it up. The truth is I was deep in the work and I hit the button because I wanted to hit the button. I had been building for too long without releasing, and I wanted the platform to exist in the world before I went to bed.

I also had a strange mindset that is worth naming. I pictured people waiting on the other side. As if there was a queue outside a shop door and I was about to flip the sign from "closed" to "open." There was no queue. There were no users. I had not even announced the launch anywhere. But the sense that something real was about to start put pressure on me to treat every check as if the world was watching.

In hindsight, this mindset was useful in the hour that followed.

The test that should have been trivial

Once the platform was live, I started running through the flows I had already run a hundred times. Sign-in, sign-up, dashboard loads. Everything looked okay at first. Then I got to forgot password.

Forgot password is one of those flows that feels like a rounding-error feature when you are building. It is necessary, it is not fun, and you never quite know whether it is worth testing as thoroughly as the main flows because, honestly, how often do users actually click it.

Often enough. Especially on launch day.

I created a test account. I clicked forgot password. Nothing arrived in my inbox.

This is the kind of failure that sounds small when you read it. In the moment, it is not small. A broken forgot password flow means that any user who forgets their password is locked out of the platform forever, unless they can find a manual way to contact me. On a live platform, that is not acceptable for five minutes, let alone indefinitely.

Then it got worse.

When your own session turns against you

I had initiated the forgot password flow, which meant my old password was effectively frozen, pending a reset that could not happen because the email was never arriving. I could not log back in. I was locked out of my own test account on my own platform, fifteen minutes after launch.

This is where the panic moment hit. Not loud panic. The quiet kind where you realise you are the only person who can fix this, the platform is live, the flow is broken, and if any user hit the same path right now they would have exactly the same experience you are having.

It was twelve thirty in the morning. I had no plan for this.

The 2 AM debugging tour

I had spent most of my career working on the backend, writing code. The quality assurance and production verification steps were usually someone else's job, on a different team, running the checks I did not do. This was the first time I was the entire chain. Developer, tester, operator, incident responder. All me, at a desk, alone, with the platform live.

I did not want to open the code at all. I wanted the flow to just work. But there was no path forward that did not involve going into the internals and figuring out what had broken.

What followed was about ninety minutes of moving through the stack layer by layer. Is the middleware intercepting the reset flow correctly? Is the route handler being called? Is Supabase actually trying to send the email? Is Resend actually receiving the request? Is the SMTP configuration correct? Is the email template referencing the right callback URL? Is the callback URL routing through middleware in a way that lets the reset token validate properly?

The exact fix is something I do not remember precisely, which is its own lesson about writing things down under stress. What I do remember is that the problem was at the middleware layer. The token that was supposed to validate the reset flow was being rejected by a guard that should have recognised it as a legitimate auth action. Once I found the check that was too strict, loosened it to accept the reset token path, and redeployed, the emails started arriving and the reset flow worked end to end.

It was around 2am when the fix shipped. I tested it three times to make sure it was not a fluke. Then I sat back and realised something that had not been obvious until that moment.

What I did not understand yet

The reason this bug existed, and the reason I had missed it in development, was that I did not properly understand how auth flows are supposed to be structured in a multi-tenant system yet. I knew what the pieces were. Middleware. Tokens. Sessions. Row level security. I knew the words. I had even wired them up. What I did not have was the mental model that tells you which piece is responsible for which guarantee, and where the handoffs between them are supposed to happen.

In development, I was always logged in as myself. My session was fresh. Middleware let me through everything because the guards I had written were accidentally permissive for my specific path. The reset-password flow has a very specific property that almost no other flow has. The user is effectively not-yet-authenticated but needs to hit an authenticated endpoint to complete the reset. That edge case was not something I had thought about at the middleware layer. So my middleware, which worked for every other flow, was catching the reset flow in a gap and refusing to let it through.

This is one of those problems that is obvious once you understand it, and invisible until you do. In development, you never encounter it because you never need to do a full reset flow on an account you are actively using. In production, it is the second thing that happens after launch.

The 14-hour Saturday that followed

The next day was a Saturday. I sat down and did not get up for 14 hours.

What I wrote during that session eventually became the foundation for the centralised authentication pattern I wrote about months later. A single auth provider at the top of the application. A query layer that enforces user scoping structurally. Database-level row security as a last line of defence. The three-layer defence model that is now in every serious web app I have looked at.

Before that Saturday, I had auth code. After that Saturday, I had an auth system. The difference is not cosmetic. A system tells you where the guarantees come from. A collection of auth code hopes that each piece does the right thing.

That is the structural lesson that came out of the midnight incident. The practical lesson, the thing I would tell another solo founder about to launch, is different. Let me name those separately.

What I would tell another solo founder

Do not launch at midnight after a 12-hour audit session. You will miss things. Your judgement is not sharp. If you can wait until morning, wait. If you cannot, accept that your first hour post-launch is going to be a recovery session, not a celebration.

Test the flows you never use. Forgot password is the canonical one, but also includes email verification, account deletion, plan cancellation, any flow that only fires once per user lifecycle. These are the ones that break in production because they are the ones that never hit in development.

Create real test accounts and use them like users would. Not admin accounts with elevated permissions. Not your personal account that is logged in seventeen tabs. Fresh accounts with no history, going through the flow cold, the way a stranger would.

Understand where your middleware ends and your auth provider begins. This is the boundary that caught me. In most stacks, middleware runs before the auth provider has a chance to validate anything, and certain flows need to cross that boundary in both directions. If you do not know how your stack handles this, read the docs of whichever provider you use until you do.

Keep notes while you debug, not after. I cannot tell you exactly which line fixed the bug that night. I can tell you the shape of the fix. Writing down the specific fix in the moment, even a terse note in a file, would have saved me future debugging on similar issues. Tired-me is not someone you want to rely on for precise recall.

Accept that production will humble you. No amount of dev testing catches everything. What it can do is make the surprises smaller. Every test you run in development is a production problem you do not have, but there will always be at least one production problem you did not predict.

The honest postscript

That launch night is a memory that sticks with me, not as a failure but as a turning point. Before it, I thought I was building a web app. After it, I knew I was running a platform, and that running a platform requires a different category of discipline.

The auth work that followed became one of the most solid parts of Havnwright. The tools and patterns I put in place that weekend still hold up today. The discipline of treating every release as a potential incident and every flow as something to verify, not assume, came from those 90 minutes of being locked out of my own test account.

If you are about to launch something solo, budget more time for production than you think you need. Not because something catastrophic will happen, but because something small and specific will, and you will be the one who has to find it.


This is part of a series about building products as a solo founder. Earlier posts cover the centralised authentication pattern that came out of this incident, Postgres RLS as a last line of defence, and every rule is a bug report. This is the fourteenth and final post of the backdated catch-up phase. Future posts will go up when there is something real to say.

About the Author

Alireza Elahi is a solo founder building products that solve real problems. Currently working on Havnwright, Publishora, and the Founder Knowledge Graph.

Related posts