TL;DR

  • Self-learning models update their scoring logic as new data arrives
  • Model drift occurs when portfolio behaviour diverges from training data
  • Three types of drift each require a different monitoring response
  • RBI model governance requires documented drift thresholds and retraining triggers
  • Monitoring must be continuous, not limited to periodic manual review

A propensity model deployed on a personal loan portfolio in January was trained on 18 months of historical payment data. At deployment, its scores were accurate. Recovery rates improved through the first quarter.

By August, the picture has changed. Payment rates in the mid-propensity band have dropped. The model continues to route accounts to the same treatments it assigned in January, but the accounts scoring as mid-propensity in August are behaving more like the low-propensity accounts from January. The model has not been retrained. Its scoring logic reflects the portfolio as it existed at training time, which is now more than eight months in the past.

This is model drift. The portfolio has shifted. New borrower segments have entered the book. Salary credit patterns have changed for a portion of the customer base. The macroeconomic environment has tightened in ways that affect payment behaviour. The model does not know any of this. Every treatment routing decision it makes is based on feature relationships that were accurate in January and are less accurate today.

Self-learning models close this gap by continuously updating their scoring logic as new outcome data arrives. This blog explains what that means technically, what the three types of drift are, how monitoring systems detect and respond to each, and what RBI requires before a self-learning model goes into an NBFC production environment.

What Self-Learning Means in a Collections AI Context

A self-learning model updates its parameters continuously or at defined intervals as new labelled outcome data becomes available, without requiring a full manual retraining cycle each time.

In collections, labelled outcome data is generated constantly. Every contact attempt produces a response or no-response signal within a defined window. Every treatment assignment produces a payment or no-payment signal. Every promise-to-pay produces a fulfilment or breach signal. Every account that the model suppressed from outreach either self-cures or does not within the outcome window. Each of these signals is a label: a ground truth observation about how a borrower with a specific set of features actually behaved under a specific treatment.

A static model is trained once on a historical dataset, deployed, and left unchanged until a manual retraining cycle is initiated. Its parameters reflect the portfolio as it existed during the training window. Every month that passes without retraining is a month during which the model’s scoring logic drifts further from the current portfolio reality.

A self-learning model ingests ongoing outcome signals and updates its feature weights within a governed update framework. The update framework controls how much the model’s parameters can shift per update cycle, how frequently updates run, and the conditions under which an update is paused pending human review.

Three approaches to self-learning are used in collections AI:

Online learning updates model parameters after every new observation or small batch of observations. The model is always reflecting the most recent data, but parameter updates are small and tightly constrained. This approach requires careful governance controls because it can also update quickly in the wrong direction if the recent data is unrepresentative.

Periodic retraining rebuilds the model at defined intervals, typically weekly or monthly, on a rolling training window that drops older data and adds recent data. The model is not continuously learning, but each version is trained on a window that reflects recent portfolio conditions. This is the most common approach in NBFC collections because it balances accuracy with governance manageability.

Triggered retraining initiates a retraining cycle when a monitoring system detects that drift has exceeded a documented threshold, rather than on a calendar schedule. This approach is efficient for portfolios where conditions are stable for extended periods and then change quickly, because it directs retraining effort to the moments where it produces the most improvement.

A static model reflects the portfolio as it existed at training time. Every month that passes without retraining is a month during which the model’s scoring logic drifts further from the current portfolio reality.

What Model Drift Means for NBFC Collections

Model drift is the degradation in a model’s predictive accuracy caused by a growing mismatch between the conditions at training time and the conditions at scoring time. Three types of drift are relevant to NBFC collections, and each requires a different monitoring and response approach.

Illustration showing three types of AI model drift in collections systems: data drift, concept drift, and label drift, highlighting how changes in borrower data, behaviour patterns, and outcome relationships can affect model performance and require monitoring and retraining.

Data Drift (Covariate Shift)

Data drift occurs when the distribution of input features in the live portfolio changes from the distribution the model was trained on. The model’s feature weights were calibrated on the training distribution. When the live distribution shifts, those weights produce scores that do not reflect the actual risk profile of the current population.

For Indian NBFCs, data drift is commonly triggered by portfolio composition changes. A model trained when the portfolio was primarily two-wheeler and personal loans may face a live portfolio that now includes a significant MSME loan component. The distribution of loan-to-income ratios, tenor profiles, income source types, and historical payment patterns across the scoring population has shifted materially. The model is applying weights designed for one distribution to a different distribution.

Data drift can also occur without portfolio composition changes. If the NBFC has expanded into new geographies, entered new borrower income segments, or changed its origination criteria, the population being scored today may differ from the training population even if the product mix has not changed.

Concept Drift

Concept drift occurs when the relationship between input features and payment outcomes changes, not just the distribution of the features themselves. This is the most damaging type of drift for collections AI because it can reverse the direction of a prediction.

An example relevant to Indian NBFC portfolios: salary delay has historically been a short-term DPD signal followed by self-cure within five to seven days of funds clearing. The model scores salary-delay indicators as self-cure signals and suppresses these accounts from active outreach. Over time, as macroeconomic conditions tighten, salary delay in a particular borrower segment begins correlating with prolonged delinquency rather than brief DPD spikes. The feature relationship has reversed. The model continues to suppress accounts based on a signal that now predicts the opposite outcome.

Concept drift cannot be detected by looking at input feature distributions alone. It requires tracking whether high-scoring accounts continue to produce better payment outcomes than low-scoring accounts over time.

Label Drift

Label drift occurs when the distribution of outcome labels in the training data no longer reflects the current portfolio. A model trained during a period of economic expansion, when payment rates were materially higher, has a training dataset where positive payment outcomes are overrepresented relative to the current environment. The model’s baseline propensity estimates are calibrated too high for the current portfolio.

Label drift in the opposite direction is also possible. A model trained during a period of economic stress may underestimate current payment propensity when conditions improve, routing accounts to higher-intensity treatments than their actual risk profile warrants.

How Self-Learning Models Detect and Respond to Drift

Three monitoring checks form the core of a production drift detection system for NBFC collections models.

Score Distribution Monitoring

Track the proportion of accounts falling into each score band over time. A score distribution that was 30% high propensity, 45% mid propensity, and 25% low propensity at deployment, and has shifted to 18% high, 40% mid, and 42% low eight months later, indicates that the portfolio has changed in ways the model was not trained to reflect. The model is scoring a larger proportion of accounts as low propensity not because the portfolio has deteriorated, but because the input feature distributions have shifted and the model is applying training-time weights to a different population.

Score distribution monitoring detects data drift early, before it fully degrades the model’s rank-ordering accuracy.

Rank-Order Stability Monitoring

Track whether high-scoring accounts continue to produce better payment outcomes than low-scoring accounts. The Gini coefficient and the Kolmogorov-Smirnov statistic measure the model’s discriminatory power across the full score distribution. A declining Gini coefficient over successive monitoring windows indicates the model is losing its ability to rank-order accounts by payment probability. This is the primary signal of concept drift.

A Gini decline of more than 5 percentage points from the deployment baseline is a widely used threshold for triggering a retraining cycle. The specific threshold should be documented before deployment and should reflect the NBFC’s tolerance for scoring inaccuracy in the collections context.

Feature Drift Monitoring

Track the distribution of individual input features over time using Population Stability Index. PSI measures how much a feature’s distribution has shifted between the training period and the current monitoring period.

PSI below 0.1 indicates a stable distribution. PSI between 0.1 and 0.2 indicates moderate shift that warrants monitoring. PSI above 0.2 on a key feature indicates that the model’s weight for that feature was calibrated on a distribution that no longer reflects the live portfolio, and the model should be reviewed. For models where salary credit date alignment, DPD trajectory, or contact response rate are among the top predictors, PSI monitoring on these features provides an early warning of the Indian NBFC-specific data drift patterns described above.

Documented Retraining Triggers

Each monitoring check should have a documented threshold that initiates a defined response. Score distribution shift beyond a defined tolerance triggers a data review. A Gini decline beyond the documented threshold triggers a retraining cycle. PSI above 0.2 on a top-five feature triggers a model review. These thresholds must be set and documented before deployment. Ad hoc retraining decisions made in response to observed performance issues, without documented trigger criteria, are not defensible under RBI’s model governance framework.

A Gini coefficient that is declining over successive monitoring windows is the clearest signal that the model is losing its ability to separate accounts that will pay from accounts that will not.

RBI Requirements for Self-Learning Model Governance

Self-learning models create governance requirements that go beyond those for static models, because the model’s parameters are changing after deployment. RBI’s model risk management framework applies to each material parameter update, not just the original deployment.

Update Framework Documentation

The update framework must be documented before deployment. This includes the update frequency or the trigger conditions for triggered retraining, the size of the rolling training window, the constraints on how much any parameter can shift per update cycle, and the conditions under which an automatic update is paused and routed to human review. A self-learning model that updates without a documented governance framework is changing the NBFC’s collections decision logic in ways that are not auditable.

Continuous Monitoring Records

Monitoring results must be recorded continuously and retained for supervisory review. The Gini coefficient, PSI values for key features, and score distribution statistics must be logged at each monitoring interval, not just at periodic manual review points. An NBFC that monitors informally and cannot produce a time series of monitoring metrics across the life of the model cannot demonstrate to RBI examiners that the model was performing within acceptable bounds between formal review cycles.

Retraining Trigger Documentation

The thresholds that trigger retraining must be documented before deployment and applied consistently. If a retraining cycle is initiated outside the documented trigger criteria, the reason must be documented and reviewed through the model change governance process. Selective or inconsistent application of trigger thresholds creates an audit gap.

Model Inventory Currency

When a self-learning model undergoes a material parameter update, the model inventory entry must be updated to reflect the new training window, the update date, and the monitoring metrics at the time of update. A model inventory entry that still reflects the original deployment state of a model that has been retrained multiple times does not meet RBI’s model documentation requirements.

Vendor Governance Responsibility

If the self-learning model is supplied by a third-party vendor, the NBFC retains governance responsibility. The vendor can provide monitoring infrastructure and retraining capabilities. The NBFC must own the threshold documentation, the monitoring records, the retraining decisions, and the model inventory updates. RBI will examine the NBFC’s governance records, not the vendor’s platform documentation.

A Model That Cannot Learn Is a Model That Is Falling Behind

A static model deployed on an Indian NBFC portfolio faces a portfolio that will change faster than most credit environments. Salary cycle patterns shift. Portfolio composition evolves. Macroeconomic conditions move. Regulatory rules change. Each of these changes widens the gap between what the model was trained on and what the portfolio looks like today.

Self-learning models close that gap systematically rather than waiting for a manual retraining cycle to catch up. The governance requirements are more demanding precisely because the model is changing after deployment. Those requirements exist to ensure that the learning process is controlled, documented, and auditable, not to prevent it.

For Indian NBFCs operating under RBI’s model risk management framework, a self-learning collections model with properly documented update governance, continuous monitoring records, and a defensible retraining trigger framework is both a more accurate scoring system and a stronger governance position than a static model reviewed annually.

Five markers of a well-governed self-learning collections model for Indian NBFCs:

  • Update framework documented before deployment: frequency or trigger conditions, training window size, parameter shift constraints, and pause conditions
  • Continuous monitoring records retained for Gini coefficient, PSI on key features, and score distribution at each monitoring interval
  • Retraining trigger thresholds documented before deployment and applied consistently, with any out-of-trigger retraining decisions recorded and reviewed
  • Model inventory updated after each material parameter update with training window, update date, and monitoring metrics at time of update
  • Vendor governance responsibility clearly allocated: vendor provides infrastructure, NBFC owns threshold documentation, monitoring records, retraining decisions, and inventory currency

iTuring’s AI collections platform runs continuous model monitoring with automated Gini, PSI, and score distribution tracking, triggered retraining workflows, and native RBI model governance documentation output at each update cycle.