A memory leak alert fired 90 minutes before the database OOM'd, but was buried in 400 weekly alerts.
Situation
You're the engineering manager responsible for platform reliability. After a 4-hour database outage, the post-mortem reveals the alert fired 90 minutes before the OOM, but your on-call engineer had 47 unread alerts at the time. Leadership wants a plan.
Stakes
What's your immediate response to fix the alerting problem?