Who This Helps
You're a junior analyst who just saw a key number fall off a cliff. Maybe revenue per user dipped 12% overnight. Or weekly active users dropped 7 days in a row. Your boss wants answers by Friday. This guide helps you pinpoint the root cause fast and ship a clean analysis with clear recommendations.
This approach comes straight from the Data Reliability Leadership course, where reliability starts with defining what good looks like. You'll use that same discipline to diagnose a KPI drop.
Mini Case
Mei, a junior analyst at a subscription service, noticed the 7-day retention rate dropped from 34% to 29% in one week. That's a 15% relative decline. Her team panicked. Instead of guessing, she ran a focused diagnosis session in 90 minutes. She found the root cause: a new onboarding flow broke sign-ups for mobile users on Android 12. The fix was a one-line code change. By Friday, she shipped a one-page analysis with three clear recommendations: roll back the onboarding change, add a monitor for Android 12 sign-ups, and run a postmortem.
Do This Now (5 Steps)
- Lock the metric and time window. Write down the exact KPI, the date range, and the expected value. For Mei, that was 7-day retention, last 7 days, expected 34%.
- Split by segments. Break the metric by device, region, user type, or any dimension you have. Mei split by device OS and saw Android 12 users had a 22% retention rate vs. 36% for others.
- Check for data quality issues. Look for missing data, delayed pipelines, or schema changes. Mei checked the event logs and found no gaps.
- Trace the user journey. Map the steps users take before the KPI event. Mei mapped sign-up to first session to day 7. She found the Android 12 sign-up flow had a broken button.
- Write one root cause sentence. Summarize in one line: "The 7-day retention drop was caused by a broken sign-up button on Android 12, affecting 8% of new users." Then add three clear recommendations.
Avoid These Traps
- Chasing every possible cause. Pick the top three segments and dig deep. You don't have time to test 20 theories.
- Blinding trusting dashboards. Always verify the raw data. A dashboard might hide a filter or a bug.
- Skipping the fix. Your analysis is only useful if it leads to action. Always include a recommendation.
- Overcomplicating the output. A one-page memo with a chart and three bullets beats a 10-slide deck.
- Forgetting to check for recent changes. Ask your team: "Did we deploy anything last week?" That's how Mei found the onboarding change.
Your Win by Friday
By Friday, you'll have a one-page analysis that answers: what dropped, why it dropped, and what to do next. You'll look like a pro who can diagnose a KPI drop in one focused session. And you'll have a repeatable process for the next time a number goes red. That's the kind of reliability that builds trust with your stakeholders.
And hey, if you can do it in 90 minutes like Mei, you'll have time for a coffee break too.