A Customer Health Score That Actually Predicts Churn
Customer health scores are the most over engineered, under useful artifact in most B2B SaaS companies. The dashboard is green, the renewal forecast is confident, and three weeks before the renewal date the customer churns. The post mortem always finds that the warning signs were visible, sometimes for months. The problem was not data availability. It was that the health score was measuring the wrong things, weighted in the wrong proportions, and reviewed on the wrong cadence.
TL;DR: what a predictive health score looks like
A health score that actually predicts churn is a weighted composite of three input families: usage depth, relationship signal, and commercial signal. It is recalculated weekly, reviewed in a 30 minute meeting, and tied to specific intervention playbooks. If any of those three properties are missing, the score is decoration.
Why most health scores fail
Most health scores fail because they weight what is easy to measure, not what predicts the outcome. Logins are easy to track, so logins get heavy weight. NPS surveys are easy to ship, so NPS gets heavy weight. Neither one predicts churn at the timeframe you can actually intervene. A customer logging in every day can still churn if the economic buyer changed and the new buyer is shopping replacements. A customer with a 9 NPS can still churn if their procurement team is consolidating vendors.
Predictive health scores weight the signals that lead the outcome by 60 to 90 days, not the signals that confirm it after the fact.
The three input families that actually predict churn
Usage depth, not usage breadth
Logins, seats provisioned, and feature touches are breadth metrics. Depth metrics measure whether the customer is achieving the outcome they bought the product for. For most B2B SaaS, two or three workflows account for 80% of the value. Usage depth measures the trend of those specific workflows, weighted by the seat type that performs them. A customer running the core workflow 40% less than they were six weeks ago is a high churn risk regardless of total login count.
Relationship signal
The relationship signal captures the strength of the human connection to the account. It is harder to instrument than usage but more leading. Useful inputs: executive sponsor change in the last 90 days, drop in CSM meeting frequency or attendance, ticket sentiment trend, and time since last strategic review. A change in executive sponsor is the single most predictive event in most B2B churn data. Wire it in as a flag, not a slow moving score.
Commercial signal
Commercial signals are the inputs that change the economic calculus of the renewal. Recent layoffs at the customer, public budget cuts in their function, a shift in their ownership structure, or contraction in their own revenue. None of these are inside your product, which is why most health scores ignore them. They predict the largest churn events more reliably than any internal signal.
The weighting that survives reality
A useful starting weight set for most B2B SaaS:
- Usage depth: 40 to 50%
- Relationship signal: 25 to 35%
- Commercial signal: 15 to 25%
- Reserve 5 to 10% for product specific leading indicators unique to your workflow (integration health, data freshness, admin engagement).
Calibrate the weights against the last 12 months of churn events. Pull every churned account, score them weekly going back 120 days, and find the weighting that best separates churn from retention at the 60 day mark. The exercise usually cuts the number of inputs in the current scorecard by half.
How the score should be reviewed
A health score is only as good as the meeting it triggers. The review cadence that produces interventions: a weekly 30 minute meeting with CS leadership and one revenue ops analyst, focused only on accounts that crossed a threshold (moved from green to yellow, yellow to red, or red for two weeks running). No status updates on healthy accounts. No discussion of accounts that have already churned.
Each flagged account leaves the meeting with one of three outcomes: an intervention playbook owner and date, a decision to deprioritise (with rationale), or a re categorisation because the data is wrong. Anything else turns the meeting into a status call and the score loses operational meaning.
Wiring the score into the renewal forecast
The biggest payoff of a predictive health score is forecast accuracy, not just churn prevention. Customers in the red bucket should be assigned a churn probability in the renewal forecast that matches their actual historical conversion rate, not the optimistic 90% the CSM is reporting. Most teams find the gap between the CSM forecast and the health score weighted forecast is 8 to 15 points of NRR, which is the gap that explains most of the annual surprise. See what good NRR actually looks like for the benchmarks to calibrate against.
Common mistakes that make health scores useless
- Including too many inputs. More than 8 to 10 inputs and the score becomes uninterpretable. CSMs stop trusting it.
- Recalculating monthly. Churn signals move faster than that. Weekly minimum.
- Treating yellow as "we'll watch it." Yellow is where intervention is cheapest. By the time an account is red, the playbook options have collapsed.
- Hiding the score from the customer facing team. CSMs need to see the inputs, not just the color. Without inputs they cannot prioritise interventions.
- Never re calibrating the weights. A score calibrated 18 months ago is calibrated against a customer base that no longer exists. Re calibrate every two quarters.
What changes when the score is real
Three things change when the health score moves from decorative to predictive. First, the gross retention number becomes a leading indicator the CFO can rely on. Second, the CS team starts allocating time by churn probability instead of ARR size, which usually re weights the book of business in ways that surface previously hidden risk. Third, the renewal forecast stops surprising the board, which buys CS leadership political capital to fix the upstream causes (onboarding gaps, ICP misfit at sale, packaging mismatches) instead of firefighting individual accounts.
The link back to the sales motion
A surprising number of red accounts trace back to sales decisions, not CS execution. Deals closed outside the ICP, with the wrong champion, or with packaging that did not match the usage pattern, churn at predictable rates. A working health score gives the sales motion the feedback loop it needs to tighten qualification at the front end. Pair the score with a quarterly review of churned accounts coded against the same framework used in your win loss program and the patterns will be obvious within two cycles.
Where to start this week
Pull every churned account from the last 12 months. Score them weekly going back 120 days against the three input families above using whatever data you already have. Find the weighting and threshold that flagged the churn earliest. That is your starting calibration. Ship it next Monday. Refine over the following two quarters using the same exercise.
Customer Success and Expansion is one of the eight pillars in the GTM Diagnostic. The full methodology shows how the pillar interacts with Pipeline and Forecasting, because a weak health score almost always shows up as a forecasting problem first.