Who This Helps
Founders and operators who see a key metric drop and need to know why—fast. This is for anyone tired of chaotic data fire drills. It’s a core practice from the Data Reliability Leadership program.
Mini Case
Mei saw weekly active users drop 18% last Tuesday. Her team spent two days debating if it was a bug, a feature change, or bad data. By Friday, they realized a data pipeline had silently broken the prior Sunday. A reliability baseline would have flagged the broken contract in 30 minutes, saving three days of guesswork.
Do This Now (5 Steps)
- Grab your top three KPIs. Revenue, sign-ups, activation rate—whatever keeps you up at night.
- For each, ask: What’s the source? Is it Stripe, your product database, a third-party tool? Write it down.
- Define the contract. How fresh should this data be? Updated hourly? By 9 AM daily? What’s the allowed range? (e.g., Sign-ups should never be negative).
- Check the last 7 days. Does the actual data match your freshness and range rules? Spot any gaps or weird values?
- Score it. Green for good, yellow for a warning, red for broken. Your first baseline is done. (It’s easier than untangling headphone wires.)
Avoid These Traps
- Don’t start by digging into the data lake without a map. You’ll get lost.
- Don’t let the team argue about definitions during the crisis. Define contracts when things are calm.
- Don’t try to baseline everything at once. Three key metrics are enough to start.
- Don’t ignore small, consistent drifts. A 2% error every day for a week is a 14% problem.
- Don’t skip documenting the score. If it’s not written down, the debate starts over tomorrow.
Your Win by Friday
You’ll have a simple, one-page reliability scorecard for your most important numbers. Next time a KPI dips, you can check your baseline in five minutes to see if the data itself is trustworthy. You’ll either rule out a data issue instantly or pinpoint the broken source, turning a multi-day mystery into a one-hour fix.