At its core, the lending business is all about extending credit with the minimum risk of loss. Bankers have devised multiple ways to constrain credit risk over the centuries.
Accepting collateral, varying the time horizon for exposures, and varying the permitted amount of interest rate fluctuation (fixed versus floating) have endured even as the process for identifying the probability of default at the portfolio level has become a numbers game. Historical loss rates for specific instruments and portfolios are now paired with a growing amount of alternative data culled from various digital sources.
It is a complicated process to get right under the best of circumstances. The pandemic requires risk managers to up their game urgently regarding default probability assessments. As discussed below, the pandemic particularly requires risk managers to incorporate more public policy considerations and alternative data points concerning public policy.
The Promise of Using Artificial Intelligence
Machine learning and artificial intelligence enhance considerably the ability to assess correlations and covariances across a broader range of data sets to identify more precisely the probability of default for any given set of exposures. Because machine learning does not require rule-based programming to draw conclusions concerning relationships across data sets, the process provides the potential for better insights into embedded risk relationships which enables bankers to make smarter decisions about potential default rates.
But offloading default probability calculations to AI systems is not a shortcut. Because the automated analysis operates thousands if not millions of iterations in a cycle and because many of the analytical processes occur in a hidden layer, successfully deploying AI into default probability assessments requires great care.
Enthusiasts eager to evangelize often gloss over the most important step in any machine learning project: carefully choosing the data sets used to train the AI systems. The issue is not merely about choosing the data.
The harder questions, which even today usually only humans can answer, are whether the data has been labelled properly and whether it has been accurately transitioned into the format to be used for training AI systems. For example: if a data set does not systematically and correctly identify which borrowers in the past defaulted, the automated analysis conducted within AI system will start on a faulty foundation. Incorrect estimates are inevitable.
Mistakes in data sets come in many flavors. Old fashioned human error associated with manual input processes is decreasing as the digital age advances. More current sources of input errors include incorrect definitions of when a loan is (or is not) performing based on current accounting regulations and regulatory requirements. When the rules change, loan performance patterns change. If the underlying technology processes do not reflect the new rules, then all future data collection processes create incorrect designations. Using this data within the machine learning process amplifies exponentially the mistake by misleading the computer processes regarding credit asset performance patterns.
Great care is also required when choosing which data sets to incorporate into AI training data. Vast data processing capabilities create the temptation to feed an AI system any and every kind of alternative data alongside historical default data and then hope that the automated system identifies previously unknown correlations and risk characteristics.
The problem is that even if the system identifies correlated risk characteristics occurring naturally within the data sets, the correlation may not signal a causation link which is crucial to understanding the shape of default probability.
For example, one would not choose to use aircraft leasing data sets to estimate default probabilities on retail credit card portfolios. Nor would one choose to use automobile route data to estimate interest rate swap counterparty default rates. Default rates and behavior might both change during a financial crisis, for example, but similarity in risk profiles across these data sets would shed no light on the probability that a portfolio would experience abnomally high default rates in the absence of a crisis.
Which brings us to 2020 and the Pandemic Problem. The pandemic amplifies further the potential for misleading analysis regarding default probabilities. Both humans and automated systems are at risk for mis-estimating default probabilities not just now but also in the future.
The Pandemic Problem
Consider our crazy 2020 COVID-19 spring. Central banks, financial regulators and finance ministries spent 8-10 weeks in a frantic scramble to keep the economy and financial system functioning amid the first (and hopefully only) simultaneously shuttering of all G7 economies. It looked like this at the monthly level:
But the big picture shows a different story. The aggregate year-to-date time series shows a rapid, steep effort to address the situation....and a sustained high level of activity for every month since the spring. In fact, the data show a slight increase in activity during July. The uptick is consistent with policymakers pivoting to start addressing proactively the nascent indicators of a potential second wave of infections as mobility restrictions were lifted:
Policymakers literally through the kitchen sink at the pandemic. They created massive government guarantee programs that substituted G7 sovereign credit risk for private credit risk. They committed central bank programs to purchase a broad range of assets in the open market ( government bonds, corporate bonds, corporate equities. Moratoria were issued, creating debt service holidays for certain types of assets in some countries. Financial regulators publicly encouraged banks to be lenient and flexible when determining if a loan was performing. Fiscal authorities delivered salary subsidies and additional cash payments to individuals.
By stepping in to prevent systemic collapse, policymakers have also effectively invalidated historical data regarding default probabilities for private sector assets. Assessing exposure to the probability of default requires far more explicit assessments of exposure to public policy risks.
Efforts to estimate default probabilities using AI-powered scenario analysis as well as more traditional extrapolation methods cannot generate meaningful risk measurements without addressing this break in the time series.
In addition, the pandemic creates twin data deficits that complicate any effort to generate robust training data for AI systems. Relaxed or suspended regulatory reporting requirements deliver less data to governments for aggregation, which in turn deliver less complete data sets to markets. This is a short-run situation, but if machine learning methods are being used without adjustment to shifts in data quality for routine components, model error risks increase silently.
Default probability parameterization must now also expressly incorporate both sovereign risk and regulatory rule changes implemented in the spring. The policy shifts implemented in the spring of 2020 were designed to be temporary. The policies that expired in June have already been renewed to year-end. Policies expiring in September will also likely be renewed for the most part to year-end.
The question then becomes whether -- or how -- pandemic era policies will be extended through 2021 and beyond as the economy adapts not only to the pandemic but also (hopefully) its end. For the next 18 months, public policy will play an abnormally large role in insulating whether, how, and which borrowers default....and which do not.
We can help with that.
Stay ahead of the curve regarding COVID-19 policy. Subscribe to the PolicyScope Platform today.
Seeking daily data-driven macrotrend analysis? Subscribe for the PolicyScope Risk Monitor.