Resources > Collections & Recovery > Building a Propensity...

March 24, 2026

Building a Propensity Model for Collections: How US Banks Predict Payment Behaviour Before Default

18 min read

Collections & Recovery

18 min read

TL;DR

Propensity models predict payment likelihood before default occurs
Default propensity and payment propensity are two distinct model outputs
Behavioural signals outperform static credit attributes in collections
Explainability is non-negotiable under SR 11-7 and ECOA
Survivorship bias and target leakage are the two most common build failures

Once an account transitions to default, the probability of curing it drops to 7%.

That single figure, drawn from research on consumer credit default transitions, is the entire business case for propensity modelling in collections. The probability of a current account transitioning to default sits at 23%. The probability of recovering an account that has already defaulted sits at 7%. The math is unambiguous: intervening before default is roughly three times more likely to produce a positive outcome than recovering after one.

Yet most US bank collections teams still build their operational workflows around days-past-due buckets. They react to a missed payment, assign the account to a queue, and begin the recovery process. The DPD model is a lagging indicator. It tells you what already happened. A propensity model tells you what is about to happen, and it gives you time to intervene before the cost of doing so multiplies.

This article is a practical guide to what collections propensity models actually predict, which signals make them work, how to build and validate them to SR 11-7 and ECOA standards, and how to turn a propensity score into a contact strategy that recovers more at lower cost.

What a Collections Propensity Model Actually Predicts

The term “propensity model” covers two distinct predictions in a collections context, and treating them as interchangeable is a common and expensive error.

Default propensity predicts the probability that a currently performing account will miss a payment in the next 30, 60, or 90 days. This is the early warning signal. It applies to accounts that have not yet entered the collections workflow and enables pre-delinquency interventions: proactive payment reminders, hardship plan offers, and pre-emptive contact before an account becomes a collections case. The INFORMS research on propensity to pay models describes this as predicting “whether the customer will pay on time, before 30 days or 60 days from the due date,” with the model applied at various intervals in the account lifecycle.

Payment propensity predicts the probability that an account already in delinquency will make a voluntary payment in the next 30 days, given the current state of the account and the available contact and treatment options. This is the collections strategy signal. It does not ask whether an account will eventually pay. It asks which delinquent accounts, if contacted today through which channel and with which message, are most likely to self-cure before the cost of the recovery process escalates.

It is also worth distinguishing both from a churn prediction model. A churn prediction model forecasts whether a customer will voluntarily disengage from a product or service, closing an account or switching providers. A collections propensity model targets an involuntary behaviour: payment failure driven by financial stress. The signal sets, intervention timing, and regulatory validation requirements are fundamentally different for each.

Both models matter. But they operate at different points in the delinquency lifecycle, they use different feature sets, and they require different validation approaches. Most banks that deploy only one are leaving significant recovery opportunity on the table. The business logic is clear: default propensity captures accounts before they cost money to recover; payment propensity optimises how that recovery cost is allocated once delinquency has occurred.

Quote highlighting the difference between default propensity and payment propensity, emphasizing that each requires distinct models and answers different questions.

The Input Signals That Actually Matter

A credit scorecard and a collections propensity model share a superficial resemblance. Both ingest customer data. Both produce a risk score. The critical difference is in the signals they prioritise.

A credit scorecard is built on relatively static credit bureau attributes: payment history, utilisation, age of accounts, inquiries. These attributes change slowly. They are well-suited to underwriting decisions made at a single point in time. Traditional credit risk scoring frameworks rely on precisely these stable, point-in-time signals, which is why they fall short when applied to the dynamic, week-to-week behavioural patterns that determine collections outcomes.

A collections propensity model needs signals that reflect current and changing behaviour. The accounts it scores are people whose financial circumstances are shifting, whose engagement with their bank is a real-time indicator of intent, and whose payment likelihood can change week to week.

Four categories of signals consistently outperform traditional credit attributes in collections propensity contexts.

Infographic showing four categories of signals that outperform traditional credit attributes: payment velocity and momentum, engagement signals, financial stress indicators, and behavioural channel signals.

Payment velocity and momentum. How the pattern of payments has changed in the last 30 to 90 days, not just whether a payment was missed. An account that has progressively reduced payment amounts across three consecutive cycles is a different risk profile from an account that missed a single payment after years of consistent full payment. The trajectory matters as much as the current state.

Engagement signals. Whether the customer has logged into their account recently, opened payment communications, initiated contact with the bank, or clicked through to payment options in digital channels. Research on intelligent collections strategy consistently identifies customer-initiated engagement as one of the strongest predictors of voluntary payment. A delinquent customer who opened a payment reminder email yesterday is fundamentally different from one who has not engaged with any communication in 45 days.

Financial stress indicators. Patterns across the customer’s broader relationship with the institution that indicate stress: declining balances across accounts, increasing utilisation on revolving credit, missed payments on multiple products simultaneously, or recent overdraft activity. These cross-product signals often appear two to four weeks before a collections-stage missed payment.

Behavioural channel signals. Which contact channels the customer has responded to historically, at what times of day, and with what outcomes. A customer who has never responded to a voice call but has a 40 percent email response rate requires a fundamentally different contact approach from one with the inverse pattern.

Model Architecture: Why Explainability Is Non-Negotiable

The collections propensity model architecture question often gets framed as a performance versus interpretability trade-off. It is not quite that simple.

Gradient boosting models, including implementations like XGBoost and LightGBM, consistently perform well in collections propensity contexts. Research specifically on LightGBM in financial services propensity modelling demonstrates strong performance on both accuracy and computational efficiency for large collections portfolios. They handle mixed feature types, manage missing data gracefully, and produce well-calibrated probability outputs that map cleanly to propensity score bands.

Neural network architectures can outperform gradient boosting on raw predictive accuracy, particularly when engagement signal data is rich and high-dimensional. The ArXiv deep learning research on consumer default prediction shows consistent improvements over traditional scoring models. The tradeoff is interpretability: neural networks produce predictions that are substantially harder to explain at the individual account level.

For US bank collections AI, that interpretability gap is a compliance problem, not just a technical preference.

SR 11-7 requires that model logic “can be reasonably understood by qualified individuals,” and explicitly requires conceptual soundness validation and ongoing monitoring of model behaviour. ECOA requires that adverse action notices provide specific reasons for credit-related decisions. Shapley values (SHAP) provide the individual-level explanations needed to satisfy ECOA adverse action requirements, and they are now the standard approach for demonstrating explainability in OCC and Federal Reserve model examinations.

A collections propensity model that cannot produce account-level SHAP explanations is a regulatory liability, regardless of how accurately it predicts. The practical solution for most US banks is a gradient boosting architecture with built-in SHAP explainability, rather than a neural network that requires post-hoc approximation to satisfy examination requirements.

Model Risk Management and Validation Requirements for Collections Propensity Models

Model risk management for collections propensity models under SR 11-7 requires four specific validation components, each of which addresses a different potential failure mode.

Out-of-time testing. Standard cross-validation splits training and test data randomly. Out-of-time testing splits them chronologically, holding out the most recent time period as the test set. This matters for collections models because payment behaviour patterns shift with economic conditions. A model that validates well on random splits but performs poorly on out-of-time data is overfitted to a historical period that may no longer reflect current conditions. For OCC examination purposes, out-of-time test results are a core component of the conceptual soundness validation.

Champion-challenger deployment. After initial validation, routing a defined proportion of production accounts through a challenger version alongside the production model, a champion challenger model structure, provides continuous empirical evidence of performance under live conditions. This is particularly important for self-updating architectures, where the version examined during initial validation may differ meaningfully from the version running three months later. SR 11-7 specifically contemplates ongoing performance benchmarking against alternative estimates as part of the ongoing monitoring requirement.

SHAP value analysis across the feature set. Validating that the features driving predictions are conceptually sound, legally permissible under ECOA, and stable across demographic segments. A model that assigns high predictive weight to a feature correlated with protected class membership requires scrutiny regardless of whether that feature is itself a protected attribute. SHAP analysis surfaces these correlations at the feature importance level, allowing the model risk team to evaluate them before deployment rather than after an adverse finding.

Disparate impact testing. Running the model’s output score distributions and downstream contact strategy decisions across protected classes to test for differential treatment outcomes. Under ECOA, a disparate impact finding can arise from a facially neutral model if its practical effect is disproportionate treatment of protected groups. This testing must be documented, submitted as part of the validation package, and repeated at each revalidation cycle.

Translating Propensity Scores Into Collections Strategy

A propensity model produces a score. That score needs to drive a decision. The decision is which account gets which contact treatment, through which channel, at what intensity, and with what message. Getting this translation right is where the ROI from propensity modelling is actually realised.

The standard approach is score banding: dividing the score distribution into segments that receive differentiated treatment. Three to five bands typically cover the range of meaningful treatment variation.

Diagram showing segmentation of accounts into propensity bands—very low or zero, low, medium, and high—used to guide treatment strategies.

High propensity accounts are accounts with a strong probability of voluntary payment given appropriate contact. These accounts need timely, low-friction outreach. Digital channels, short messaging, and easy payment options. The objective is not persuasion; it is removing the barriers to payment for someone who already intends to pay. Deploying expensive agent time on these accounts is a misallocation of collections resources.

Medium propensity accounts are accounts where payment is possible but not certain. These accounts benefit from personalised outreach that acknowledges their specific situation, offers flexible payment options, and uses the channel they have historically engaged with. This is where the model’s engagement signal features drive the most differentiated treatment.

Low propensity accounts are accounts where voluntary payment in the near term is unlikely. Intensive manual contact on these accounts produces high cost and low recovery. The appropriate strategy depends on account balance and product type: for smaller balances, automated low-cost outreach cycles; for larger balances, structured hardship assessment and escalation planning.

Very low or zero propensity accounts require a fundamentally different approach from collections operations. These are accounts headed toward charge-off, early legal action, or external referral. Continuing to deploy standard collections contact cycles is both costly and potentially a source of compliance exposure under FDCPA communication frequency caps.

Three Propensity Model Failures That Quietly Destroy Performance

Target leakage. Target leakage occurs when a model is trained on features that contain information about the outcome the model is trying to predict, but would not be available at the time the prediction needs to be made in production. A common example in collections is including features derived from post-default resolution behaviour (such as whether a payment plan was offered or whether the account was charged off) in the training data. The model learns patterns that do not exist in the live decision environment, and performance collapses in production. Rigorous feature engineering review and a strict cut-off date for feature construction are the standard mitigations.

Survivorship bias. Training a payment propensity model only on accounts that were contacted through your historical collections process introduces a systematic bias. Accounts that were never contacted, or were contacted very early, are excluded from the training data. The model learns payment patterns from a non-representative sample, overestimating the effectiveness of specific contact strategies because it has only seen the accounts those strategies were applied to. The correction requires building training datasets that explicitly include the full account population, not just the accounts that received treatment.

Ignoring the contact feedback loop. In collections, the model decides who to contact. The contact strategy changes how borrowers behave. That changed behavior becomes the next training dataset. The model is therefore, in part, learning from behavior it caused. This feedback loop is a well-documented source of model instability in interventional machine learning systems. Standard monitoring approaches that simply track aggregate performance metrics often miss this problem entirely until the drift becomes severe.

Quote stating that a propensity model trained only on contacted accounts reflects existing contact strategy rather than true customer behavior.

How iTuring Addresses This

iTuring’s collections AI platform deploys both default propensity and payment propensity models simultaneously, treating them as complementary components of a single decisioning architecture rather than separate tools applied sequentially.

The platform’s feature store contains over 25,000 pre-built signals, including payment velocity indicators, cross-product engagement signals, and financial stress markers, all maintained with the data lineage documentation SR 11-7 requires. Champion-challenger testing is built into the deployment architecture by default, providing the continuous independent performance comparison examiners look for during model risk reviews.

SHAP explainability is generated at the individual account level for every prediction, enabling both ECOA-compliant adverse action documentation and the OCC examination-ready model explanation packages SR 11-7 requires.

If you are building or reviewing a collections propensity modelling capability, iTuring’s team can walk through how the dual-model architecture performs on your specific portfolio composition and product mix.

Schedule a conversation for iTuring’s collections

Regulatory Disclaimer
This article is for informational purposes only and does not constitute legal or compliance advice. SR 11-7 model risk management and ECOA compliance requirements vary based on institution type, asset size, regulatory charter, and supervisory relationship. The information here reflects general industry practice and publicly available regulatory guidance as of the publication date. Consult qualified legal and compliance professionals for guidance specific to your institution.

Sources:ArXiv: Predicting Consumer Default, A Deep Learning Approach |INFORMS: Propensity Modeling to Minimize Collections Churn |FICO: Debt Collection Predictive Analytics |McKinsey: The Seven Pillars of Collections Wisdom |InDebted: Evolving Your Collections Strategy |OAJAIML: LightGBM Propensity Model for Financial Services |Pace Analytics: ECOA Adverse Actions and Explainable AI |Frontiers in AI: Fair Lending and Machine Learning Under ECOA |KPMG: Model Risk Management |Federal Reserve SR 11-7 |LinkedIn: Survivorship Bias in AI |AICorporation: Concept Drift in Interventional ML |teaminnovatics: AI Compliance

Frequently Asked Questions

What is the difference between default propensity and payment propensity in a collections AI model for US banks?

Default propensity predicts whether a currently performing account will miss a payment in the next 30 to 90 days, enabling pre-delinquency intervention. Payment propensity predicts whether an already delinquent account will self-cure if contacted today. Both require separate models, separate feature sets, and separate validation approaches. Deploying only one leaves significant recovery opportunity unaddressed.

Why do behavioural signals outperform static credit bureau attributes in a collections propensity model?

Static credit attributes change slowly and reflect a historical point in time. Behavioural signals, including payment velocity trends, digital login frequency, cross-product stress indicators, and channel response patterns, reflect current and shifting financial circumstances. In collections, payment likelihood can change week to week. Models built on behavioural signals capture that movement where bureau-only models remain blind to it.

What input signals should a collections propensity model use to predict payment behaviour before default in a US bank portfolio?

Four signal categories consistently outperform static credit attributes: payment velocity and momentum across rolling 30 to 90-day windows, digital engagement signals such as login frequency and payment link clicks, cross-product financial stress indicators including revolving utilisation spikes and overdraft activity, and historical channel response patterns showing which contact method each customer has previously responded to.

How do you validate a collections propensity model under SR 11-7 to satisfy OCC model risk management requirements?

Validation requires four components: out-of-time testing using chronological rather than random splits, champion-challenger deployment providing continuous live performance benchmarking, SHAP value analysis confirming feature conceptual soundness and legal permissibility under ECOA, and disparate impact testing across protected classes. All four must be documented in the model validation package and repeated at each revalidation cycle.

What is survivorship bias in a collections propensity model and how does it silently destroy model performance in production?

Survivorship bias occurs when a payment propensity model trains only on accounts that received historical collections treatment, excluding accounts that were never contacted. The model learns payment patterns from a non-representative sample, overestimating specific contact strategies because it has only seen accounts those strategies were applied to. The correction requires training datasets that include the full account population regardless of treatment history.

Why does a collections propensity model need SHAP explainability to satisfy both SR 11-7 and ECOA compliance requirements?

SR 11-7 requires model logic be reasonably understood by qualified individuals, with account-level explainability for OCC examination packages. ECOA requires specific reasons for adverse credit-related decisions in consumer-understandable language. SHAP values provide individual-level feature contribution scores that satisfy both simultaneously, mapping technical model outputs to the principal reasons ECOA adverse action notices require.

What is target leakage in a collections propensity model and how does it cause production performance collapse?

Target leakage occurs when training features contain information about the outcome being predicted that would not be available at decision time in production. A common example is including post-default resolution features, such as whether a payment plan was offered, in the training data. The model learns patterns that do not exist in the live environment, producing strong validation metrics that disappear entirely when the model goes to production.

What is a propensity model and how is it used in banking collections?

A propensity model in banking collections predicts the probability of a specific customer behaviour, either the likelihood of missing a payment (default propensity) or the likelihood of self-curing if contacted (payment propensity). Banks use these scores to prioritise contact queues, differentiate treatment strategies, and intervene before the cost of recovery multiplies post-default.

How do US banks build a propensity-to-pay model before default?

US banks build propensity-to-pay models by training on behavioural signals, payment velocity trends, digital engagement frequency, cross-product financial stress indicators, and historical channel response rates, rather than relying solely on static credit risk scoring attributes. The model is validated using out-of-time testing and champion-challenger deployment to satisfy SR 11-7 model risk management requirements.

How does a propensity model differ from a churn prediction model?

A churn prediction model forecasts voluntary customer disengagement, whether a customer will close an account or switch providers. A collections propensity model predicts an involuntary behaviour: the probability of payment failure or self-cure driven by financial stress. The signal sets, intervention timing, and regulatory validation requirements under SR 11-7 and ECOA are fundamentally different for each.

What feature engineering techniques improve propensity model accuracy?

The most impactful feature engineering techniques for collections propensity models include rolling payment velocity calculations across 30 to 90-day windows, cross-product stress aggregation, digital engagement signal extraction, and lag features capturing trajectory changes rather than point-in-time account status. SHAP analysis validates which engineered features add genuine predictive signal versus noise before production deployment.

What is a champion challenger model in the context of collections AI?

A champion challenger model setup runs a proven production model, the champion, alongside a retrained or alternative architecture, the challenger, simultaneously. A defined proportion of live accounts routes through the challenger. Performance comparison is continuous. When the challenger demonstrably outperforms the champion, a governed transition replaces it under full documentation, satisfying SR 11-7's ongoing monitoring and independent review requirements.

How is model risk management applied to propensity models in banking?

Model risk management for propensity models requires independent validation, out-of-time performance testing, SHAP explainability analysis, and disparate impact testing under ECOA. SR 11-7 mandates ongoing monitoring after deployment, including champion-challenger benchmarking and board-level performance reporting. For self-learning collections models that update between validation cycles, documented retraining governance distinguishing parameter updates from material model changes is an additional OCC requirement.

How do banks detect and manage model drift in propensity models over time?

Banks detect propensity model drift by monitoring population stability index on key inputs, tracking Gini and KS statistic against validation benchmarks, and running SHAP analysis to identify declining feature importance. Pre-specified threshold breaches trigger documented escalation and champion-challenger retraining protocols. Under SR 11-7, all drift events and governance responses must be logged and reported to the board's risk committee.

About the Author

Amit Kumar

Co-Founder & VP Product Engineering

Amit Kumar is Co-Founder and Vice President of Product Engineering at iTuring.ai.

He writes about building enterprise-grade AI infrastructure, designing platforms for reliability and scale, integrating AI with legacy banking systems, and the architectural decisions that separate proof-of-concepts from production-ready solutions.

Amit believes great engineering is invisible because it works, every time.

Share this resource

Latest Articles

April 17, 2026

SARFAESI Workflow Automation: How Indian Banks Digitize Secured NPA Enforcement and Reduce Recovery Time by 35%

Collections & Recovery

19 min read

April 10, 2026

Explainable AI in Banking: Meeting OCC Requirements for AI Model Transparency in Collections and Underwriting

Collections & Recovery

16 min read

April 9, 2026

Collections Automation in Banking: Reducing Cost-Per-Recovery by 48% While Staying FDCPA Compliant

Collections & Recovery

18 min read

See governance at work, not on slides.

In 15 minutes, walk through lineage, approvals, and traceability on a live flow for risk, fraud, collections, or growth – no decks, no pitch.

15

banks and insurers live

200

use case solutions

PLATFORM

INDUSTRIES

USE CASES

RESOURCES

COMPANY

Building a Propensity Model for Collections: How US Banks Predict Payment Behaviour Before Default

Table of Contents

What a Collections Propensity Model Actually Predicts

The Input Signals That Actually Matter

Model Architecture: Why Explainability Is Non-Negotiable

Model Risk Management and Validation Requirements for Collections Propensity Models

Translating Propensity Scores Into Collections Strategy

Three Propensity Model Failures That Quietly Destroy Performance

How iTuring Addresses This

What is the difference between default propensity and payment propensity in a collections AI model for US banks?

Why do behavioural signals outperform static credit bureau attributes in a collections propensity model?

What input signals should a collections propensity model use to predict payment behaviour before default in a US bank portfolio?

How do you validate a collections propensity model under SR 11-7 to satisfy OCC model risk management requirements?

What is survivorship bias in a collections propensity model and how does it silently destroy model performance in production?

Why does a collections propensity model need SHAP explainability to satisfy both SR 11-7 and ECOA compliance requirements?

What is target leakage in a collections propensity model and how does it cause production performance collapse?

What is a propensity model and how is it used in banking collections?

How do US banks build a propensity-to-pay model before default?

How does a propensity model differ from a churn prediction model?

What feature engineering techniques improve propensity model accuracy?

What is a champion challenger model in the context of collections AI?

How is model risk management applied to propensity models in banking?

How do banks detect and manage model drift in propensity models over time?

About the Author

Amit Kumar

Co-Founder & VP Product Engineering

Table of Contents

Share this resource

Latest Articles

SARFAESI Workflow Automation: How Indian Banks Digitize Secured NPA Enforcement and Reduce Recovery Time by 35%

Explainable AI in Banking: Meeting OCC Requirements for AI Model Transparency in Collections and Underwriting

Collections Automation in Banking: Reducing Cost-Per-Recovery by 48% While Staying FDCPA Compliant

See governance at work, not on slides.

15

200

Tarika Bhutani

Vipin Johnson

Rajnish Ranjan

Aishwarya Hegde

Bryan McLachlan

Mohammed Nawas M P

Amit Kumar

Valsan Ponnachath

Suman Singh