← Back to blog

Junior Analyst · Data Reliability Leadership

Diagnose a KPI Drop: Use a Data Contract to Find the Root Cause

Stop guessing why your numbers are down. Pinpoint the real problem in one focused session using a structured approach from Data Reliability Leadership.

Who This Helps

Hey there, Junior Analyst. You just saw your key metric drop 15% overnight. Your stakeholder is asking why, and you need a clear answer fast. This is for anyone who needs to move from 'something's wrong' to 'here's exactly what broke and what we should do.' The Data Reliability Leadership course gives you the playbook.

Mini Case

Sam's weekly active user report showed a sudden 12% dip. Panic started. Instead of diving into a dozen dashboards, Sam first checked the data contract for that metric. The contract defined the exact source table and freshness rule. In 10 minutes, Sam found the nightly ingestion job for that table had been failing silently for 3 days. The root cause wasn't user behavior—it was a broken pipeline. Sam reported the real issue and a fix ETA, saving the team two days of wild speculation.

Do This Now (5 Steps)

  1. Pause the panic. Take a deep breath. Your job isn't to have an instant answer, it's to find the right answer.
  2. Locate the contract. Find the data or metric contract for the dropped KPI. If one doesn't exist, note the assumed source and calculation. (This is a core mission in Data Reliability Leadership: defining contracts for key metrics.)
  3. Verify the source. Go directly to the raw source table or system listed in the contract. Check if data is arriving. Look for failed jobs or missing records from the last 24-48 hours.
  4. Trace the flow. Follow one record's path from the source through the transformation. Is it getting lost? Is a join broken? A quick trace often reveals the leak.
  5. Isolate the change. Ask: What changed in the last 48 hours? A deployment? A query edit? A partner API update? Link the timeline of the drop to a change log.

Avoid These Traps

  • Chasing symptoms. Don't start by analyzing user segments or time-of-day patterns if the underlying data is rotten. Garbage in, garbage out, as they say.
  • Skipping the source. Avoid living in the final dashboard. The answer is usually upstream.
  • Blaming 'data quality' vaguely. Be specific. Is it completeness, freshness, or accuracy? Your stakeholder needs details.
  • Working in a silo. Tell your data engineer or teammate what you're checking. A quick Slack message can shortcut your search.
  • Forgetting to document. When you find the cause, write it down in a ticket or log. This turns your firefight into a learning moment for the team.

Your Win by Friday

By using this contract-first approach, you'll ship a diagnosis that's clean and actionable. Instead of a vague 'KPI is down' update, you'll say: 'The dailyactiveusers KPI dropped because the `user_sessions` table ingestion has been failing since Tuesday's deploy. The pipeline owner is fixing it, and we expect data to be backfilled by noon.' You'll build trust, save everyone time, and look like the calm, competent analyst you are. Now go find that root cause—you've got this.