TL;DR

  • Champion-challenger testing compares AI models without replacing production
  • NBFCs can improve treatment strategy without full model re-deployment
  • Test results must be statistically valid before any champion switch
  • RBI model governance requires documented challenger test methodology
  • Score band improvements show up directly in cost-to-collect metrics

Every NBFC collections head reaches the same decision point eventually. The current AI model has been running for eight months. Recovery rates are acceptable. The data science team has built a new propensity model that, in backtesting on historical data, outperforms the current model by a meaningful margin across the 30-60 DPD bucket. The question is whether to replace the production model with the new one.

Replacing the production model based on backtest performance alone is a high-stakes move. Backtests run on historical data where the outcomes are already known. Live portfolios behave differently. A model that outperforms in testing can underperform in production if the live borrower population differs from the training window, if economic conditions have shifted since the training data was collected, or if the treatment strategies applied during the training period are not replicated in production.

Champion-challenger testing is how NBFCs evaluate that question without taking on that risk. The current model stays in production as the champion, driving treatment decisions for the majority of the portfolio. The challenger model runs simultaneously on a controlled slice of the portfolio, receiving its own treatment assignments based on its own scores. Both models are evaluated on live outcome metrics over a defined test window. The switch happens only when the data supports it, and only after governance sign-off.

This blog covers what champion-challenger testing means in a collections context, why Indian NBFC portfolios need it more than most, how to structure a test that produces valid results, what can be tested beyond propensity models, and what RBI requires when a challenger replaces the champion.

What Champion-Challenger Testing Means in Collections

Champion-challenger testing is a controlled experiment in which the production model and a candidate replacement or variant run simultaneously on separate, randomly assigned account populations.

The champion continues to drive treatment decisions for the majority of the portfolio. The challenger runs on a defined test population, typically 10 to 20% of the relevant bucket. Both populations receive treatment decisions based on their respective model’s scores. Both are tracked on the same outcome metrics over the same window.

The challenger can be a new propensity model built on a different feature set or training window. It can also be a new score band configuration, a new channel sequencing strategy, a revised treatment intensity for a specific DPD bucket, or a new self-cure suppression threshold. Any element of the collections decision logic can be structured as a challenger test. A full propensity model replacement is the highest-stakes challenger test. Channel strategy and treatment routing tests often produce meaningful improvements with lower risk and faster governance clearance.

The structural difference from backtesting matters. A backtest evaluates a model against historical data where the outcomes are already known. The model can be tuned to perform well on that specific historical window. A champion-challenger test evaluates both models against live outcomes in the current portfolio environment, with current borrower behaviour, current economic conditions, and current treatment execution. Outperformance in backtesting that does not hold in live testing is common. The champion-challenger structure catches this before the switch is made, not after.

A backtest evaluates a model against historical outcomes that are already known. A champion-challenger test evaluates both models against live outcomes in the current environment. Only one of those reflects how the model will actually perform in production.

Why Indian NBFC Portfolios Need Challenger Testing

Four characteristics of Indian NBFC portfolios make ongoing challenger testing more valuable than periodic manual model review.

Salary Cycle Sensitivity

The 30-60 DPD bucket in a personal loan NBFC portfolio turns over rapidly in response to monthly salary credit dates. A propensity model calibrated on Q2 data may score the same behavioural pattern differently from one calibrated after a salary cycle disruption caused by public holidays, payroll processing delays, or banking system downtime. A challenger model trained on more recent data can be tested against the champion on the affected accounts before a full recalibration is committed to production.

Portfolio Composition Change

Indian NBFCs grow quickly and their portfolio composition changes faster than most Western lenders. A model trained when the portfolio was 60% two-wheeler loans may underperform when the portfolio has shifted to 50% personal loans and 30% MSME loans. The borrower profiles, income structures, and payment behaviour norms across these product types differ materially. A challenger model retrained on the current composition can be evaluated against the champion specifically on the segments where the composition shift has been largest, before any full model replacement.

Seasonal Variation

Agricultural income seasonality, festival spending cycles, and year-end financial behaviour create periods where payment propensity shifts materially from the historical baseline. A champion model calibrated on full-year data may be systematically less accurate during Diwali spending months or post-harvest payment months for agricultural borrowers. A challenger model built to incorporate seasonal adjustment can be tested against the champion during the specific period it is designed to handle, producing outcome data that is directly relevant to the decision.

Regulatory Environment Changes

RBI regulatory updates can shift borrower behaviour and treatment constraints at the same time. A challenger model built to reflect the post-update operating environment can be tested against the champion to determine whether its adjustments produce better outcomes under the new rules, before the NBFC commits to a full model change governance process.

How to Structure a Valid Champion-Challenger Test

Five elements determine whether a champion-challenger test produces results that are actionable and defensible under RBI governance.

Population Assignment

Accounts must be randomly assigned to champion and challenger populations. Assignment must happen at the account level, before any treatment is applied in the test window. Assigning accounts based on any characteristic that the models also use as a predictor introduces selection bias that makes the results uninterpretable. If balance, DPD, bureau score, or any other predictor variable is used to split the test and control populations, the test cannot isolate the model’s contribution to the outcome difference.

The test population must also be large enough to detect a meaningful difference in outcome metrics. For a collections propensity model test targeting a 5 percentage point improvement in payment rate at 95% confidence, several hundred accounts per arm is a practical minimum, depending on baseline payment rates and outcome variance in the portfolio. A test run on 200 accounts from a 20,000-account portfolio will not produce statistically valid conclusions within a standard test window.

Test Window Definition

The test window must be long enough to capture at least one full payment cycle for the accounts in the test population. For a 30-60 DPD personal loan portfolio with monthly payment cycles, a minimum four-week window is required to observe whether the challenger’s treatment assignments produce payment completion outcomes rather than contact and response signals only.

Tests run on shorter windows can capture right-party contact rate and PTP rate differences. These are useful secondary signals. Payment outcome data is the primary decision criterion for a model switch. The test window should also avoid periods of known payment behaviour shift, festival months, salary cycle disruption periods, unless the challenger is specifically designed to perform during those periods.

Outcome Metrics

Define the outcome metrics before the test begins. The primary metric for a collections propensity model should be payment rate within the test window. Secondary metrics include right-party contact rate, PTP fulfilment rate, cost per recovery, and self-cure rate for accounts the challenger scores above the self-cure suppression threshold.

Defining metrics after the test has run introduces the risk of selecting the metric on which the challenger happened to outperform, rather than the metric that reflects actual business value. An NBFC that selects its success metric after seeing the results cannot defend the champion switch decision under RBI model change governance.

Statistical Validity Threshold

Set a statistical significance threshold before the test runs. For NBFC portfolios of moderate size, a 95% confidence threshold is achievable within a standard test window with appropriate population sizing. The challenger’s outperformance must clear this threshold before a switch is recommended. A result that appears directionally positive but does not reach statistical significance is not grounds for replacing the production model.

Directional results from underpowered tests are a common source of premature champion switches. The challenger appears to outperform across two or three weeks. The team recommends a switch. The switch is made without waiting for statistical validity. The challenger underperforms in production for reasons the small test population did not capture.

Champion Switch Governance

The decision to replace the champion with the challenger is a model change under RBI’s framework. Required documentation includes the test design and population assignment methodology, the outcome metrics and statistical validity criteria defined before the test began, the full test results with statistical validation, and sign-off from the model risk committee or equivalent governance body. This documentation must be retained and available for supervisory review.

A challenger that outperforms the champion but is promoted to production without formal governance sign-off creates an audit gap under RBI model risk management requirements. The NBFC has made a material change to its collections decision logic without the documentation that demonstrates the change was evaluated, validated, and approved.

An infographic illustrating a champion-challenger AI testing governance flow for Indian NBFCs and fintechs. The visual outlines five key stages: population assignment, test window definition, outcome metrics, statistical validity threshold, and champion switch governance, highlighting the structured process used to evaluate and promote AI models based on performance and governance criteria.

What Can Be Tested as a Challenger

Champion-challenger infrastructure in NBFC collections is broader than propensity model comparison. Four test types produce meaningful collections improvements without requiring a full model replacement and the governance overhead that comes with it.

Score Band Reconfiguration

The current champion may score accounts into four bands with defined treatment assignments for each. A challenger tests whether a five-band configuration, with a separate band for accounts sitting close to the threshold between high and mid propensity, produces better treatment differentiation and lower cost-to-collect. This is a treatment matrix change, not a model change, and typically carries a lighter governance footprint than replacing the underlying scoring model.

Channel Sequencing Strategy

The current champion routes mid-propensity accounts to a voice-first, SMS-follow-up sequence. A challenger tests whether a WhatsApp-first, voice-escalation sequence produces better right-party contact rates and payment completion for the same score band in the same portfolio segment. For many Indian NBFC borrower populations, WhatsApp carries materially higher response rates than voice as the lead channel. A challenger test produces live evidence for this specific portfolio before changing the production strategy.

Treatment Intensity

The current champion assigns two contact attempts within a seven-day window for mid-propensity accounts. A challenger tests whether three attempts within five days, with shorter gaps, produces better outcomes or primarily higher opt-out rates and borrower friction. The answer varies by portfolio segment, DPD bucket, and channel. A challenger test produces segment-specific evidence rather than a policy decision based on general assumptions.

Self-Cure Threshold

The current champion suppresses accounts above a defined self-cure propensity score from active outreach. A challenger tests whether raising that threshold, suppressing more accounts from the contact queue, reduces cost per recovery without materially affecting the overall payment rate. This test is particularly useful in early bucket portfolios where a proportion of accounts will self-cure regardless of contact, and where the cost of unnecessary outreach is measurable.

Any element of the collections decision logic can be structured as a challenger test. Channel strategy and treatment routing tests often produce faster, lower-risk improvements than a full model replacement.

RBI Governance Requirements for Challenger Testing

Champion-challenger testing sits within RBI’s model risk management framework in two places: as an ongoing model monitoring practice, and as a model change mechanism.

As a Monitoring Practice

RBI expects NBFCs to test whether deployed models remain fit for purpose as portfolio conditions change. Structured challenger testing is a documented method for meeting this expectation. Running a challenger at defined intervals, such as quarterly for material collections models, provides ongoing evidence that the champion model is still performing at an acceptable level relative to available alternatives. This is a stronger governance position than periodic manual model review, because it produces outcome data rather than parameter inspection.

As a Model Change Mechanism

When a challenger test results in a champion switch, the switch requires full model change documentation. The test design and population assignment methodology must be recorded before the test runs. The outcome metrics and statistical validity criteria must be defined before the test runs. The test results, statistical validation, and the governance body’s sign-off must be documented before the switch is implemented in production.

Vendor Model Governance

When the champion and challenger are both vendor-supplied models, the NBFC retains full ownership of the governance process. RBI places validation and governance responsibility on the institution. The NBFC is responsible for the test design, the outcome measurement, the statistical validation, and the governance sign-off. A vendor that provides test design support or outcome reporting is contributing to the NBFC’s process, not substituting for it.

Documentation Retention

All champion-challenger test documentation must be retained and available for supervisory review. This includes the original test design, the population assignment records, the outcome data, the statistical analysis, and the governance sign-off. An NBFC that has run challenger tests and cannot produce this documentation has a model governance gap that will surface during RBI examination.

A Better Model Sitting in Backtesting Is Not Improving Your Portfolio

A challenger model that outperforms in backtesting but has never been tested against the production environment is not yet a better model. It is a candidate. The champion-challenger framework is the mechanism for determining whether the candidate actually outperforms in the environment where collections decisions are made and outcomes are measured.

For Indian NBFCs, the value of this discipline is compounded by portfolio characteristics that make live performance divergence from backtest performance more likely: rapid portfolio composition changes, salary cycle sensitivity, seasonal variation, and a regulatory environment that continues to evolve. A quarterly challenger testing programme, structured correctly and governed under RBI’s model risk management framework, builds a continuous improvement cycle into the collections operation rather than relying on periodic model replacements driven by backtest results alone.

Five markers of a well-run champion-challenger programme for Indian NBFCs:

  • Random account-level population assignment with no predictor variable used in the split
  • Test window covering at least one full payment cycle, avoiding known seasonal disruption periods
  • Primary and secondary outcome metrics defined before the test begins, not selected after results are seen
  • Statistical significance threshold set before the test runs, with a documented minimum population size per arm
  • Full model change governance documentation retained for every champion switch, regardless of whether the challenger was vendor-supplied

iTuring’s AI collections platform includes built-in champion-challenger testing infrastructure with configurable population splits, automated outcome tracking against pre-defined metrics, and native documentation output for RBI model change governance.