The Cloud Migration That Almost Broke Our Export Service

We had three weeks to migrate file storage from AWS S3 to Azure before our AWS contract renewed. The codebase had a clean storage abstraction built for exactly this scenario. We almost shipped without checking whether everything actually used it.

What's at stake

Hard deadline — AWS contract renewal in three weeks with no extension option
Customer-facing file exports silently broken if the migration went wrong
No equivalent staging environment to validate Azure behavior before go-live

File exports weren't a background feature — customers downloaded invoices, reports, and audit records through them. A broken export that fails silently doesn't produce an error page. It produces a missing file when someone needs it most.

The Scenario

You're the lead engineer at a B2B SaaS company. You have three weeks to migrate file storage from AWS S3 to Azure before your AWS contract auto-renews at a rate leadership won't approve. The codebase has a storage abstraction layer — an enum-routed system designed years ago with exactly this kind of migration in mind. It looks clean. How do you approach it?

No hints. Just judgment.

The common mistake

Most engineers start a cloud storage migration by updating the abstraction layer and deploying — it feels clean because the abstraction was designed for exactly this. But that collapses two separate risks into one deployment: whether all traffic actually flows through the abstraction, and whether the new backend behaves identically under real load. An audit addresses the first. A hard cutover leaves you exposed on the second. Both need to be solved in the design before you write a line of migration code.

Lessons

Abstraction layers only protect what actually flows through them — audit before any migration
Design for parallel backends from the start, not as a contingency after problems appear
Rollback should be a config change throughout the migration, not a code revert
Silent failures in customer-facing features are discovered by customers, not monitoring
Two risks in one deployment doubles your exposure — solve coverage and behavioral parity separately

Impact

Export service bypass caught before migration — zero customer-facing file failures

← Back to all cases