Open to Engineering Manager / Director rolesLet's connect
Cases/The Cloud Migration That Almost Broke Our Export Service
ArchitectureFeatured

The Cloud Migration That Almost Broke Our Export Service

We had three weeks to migrate file storage from AWS S3 to Azure before our AWS contract renewed. The codebase had a clean storage abstraction built for exactly this scenario. We almost shipped without checking whether everything actually used it.

What's at stake
  • Hard deadline — AWS contract renewal in three weeks with no extension option
  • Customer-facing file exports silently broken if the migration went wrong
  • No equivalent staging environment to validate Azure behavior before go-live

File exports weren't a background feature — customers downloaded invoices, reports, and audit records through them. A broken export that fails silently doesn't produce an error page. It produces a missing file when someone needs it most.

The Scenario

You're the lead engineer at a B2B SaaS company. You have three weeks to migrate file storage from AWS S3 to Azure before your AWS contract auto-renews at a rate leadership won't approve. The codebase has a storage abstraction layer — an enum-routed system designed years ago with exactly this kind of migration in mind. It looks clean. How do you approach it?

No hints. Just judgment.

The common mistake

Most engineers start a cloud storage migration by updating the abstraction layer and deploying — it feels clean because the abstraction was designed for exactly this. But that collapses two separate risks into one deployment: whether all traffic actually flows through the abstraction, and whether the new backend behaves identically under real load. An audit addresses the first. A hard cutover leaves you exposed on the second. Both need to be solved in the design before you write a line of migration code.

Lessons
  • Abstraction layers only protect what actually flows through them — audit before any migration
  • Design for parallel backends from the start, not as a contingency after problems appear
  • Rollback should be a config change throughout the migration, not a code revert
  • Silent failures in customer-facing features are discovered by customers, not monitoring
  • Two risks in one deployment doubles your exposure — solve coverage and behavioral parity separately
Impact
  • Export service bypass caught before migration — zero customer-facing file failures
  • Migration completed on deadline with no production incidents
  • Parallel backend pattern adopted as the standard for future infrastructure migrations
  • Rollback capability maintained throughout — cutover was a config change, not a deployment event
← Back to all cases