Financial Institutions rely on credit risk scores to make various lending decisions for unsecured credit, such as loan approval, interest charges, repayment tenure, among others. However, only 25% of the Indian population has a credit history, hence access to formal lending. While this may be of help for the larger banks and financial institutions, small and medium-sized companies that provide loans to student and early-stage professionals might find it challenging.
Mounika Mydukur, head of analytics at mPokket took the audience through some of the challenges they face with credit scoring and how they leverage deep artificial intelligence and machine learning technologies to overcome them and identify new opportunities, at DLDC 2020. In her current role, she collaborates with business stakeholders to identify business needs and transform them into analytical models for behavioural predictions and predict the return on investment and collections.
In her talk, she highlighted the challenges from mPokket perspective, which is an instant loan app that provides loans to college students and young working professionals. Borrowers can avail loans ranging from ₹500 to ₹20,000, and the amount is sent via instant credit to the bank account or Paytm wallet. The borrowing limit may be increased depending on multiple factors, including timely loan repayments. But how is the credit risk analysed?
Data Points To Measure Credit Risk
Credit report and scoring is an extensive process which is carried out by premier financial bodies based on data points. These data points include — the total level of debt, types of loans taken in the past, number of open accounts, repayment of debts, amount of credit available, credit utilisations, outstanding debt collection and other public records such as tax liens, bankruptcy, foreclosure and more.
However, for smaller businesses that provide loans to students, it is a challenging task to get data points mentioned above. In the absence of these points, processing unsecured loans such as students becomes a difficult task. Risk assessment in such cases involves steps such as:
- Validation of a user’s identity
- Validation of address
- Validation of user’s income, education details
- Credit history of the user (if available)
After the preliminary risk assessment, the request goes through regulatory, technological and operational analysis to decide on the loan approval, interest rate, repayment tenure, among others.
Mydukur further shared some of the alternate data points to assess credit risk for the unbanked. Some of the data they analyse is from social media, location information, online shopping history, travel data, utility bills and most importantly, the phone and mobile data such as contacts and SMS. They explore these data points to see if the person applying for a loan is genuine and is not using a fake account or phone. Mydukur, however, said that this information is accessed only after user consent.
Using AI/ML Algorithms To Detect Potential Repayment Capacity Of User
Mydukur shared that they use machine learning models to understand the repayment capacity of the user and potential defaulters. Detailing on the steps, she said that they begin with exploratory data analysis which involves alternate data, transactional data and unstructured personal data.
The next step is usually clustering, where they use tools such as KNIME and DBScan to find out the first time users, repeat users, among others. The next step is feature engineering and selection, which is one of the most crucial steps in identifying the probability of a user being a defaulter or a good customer and uses techniques such as logistic regression.
Finally, ML model selection is done, which involves techniques such as deep learning, random forest, neural networks, GBM, among others. These steps help in finding out the probability of a defaulter and provide a credit score to users.
Mydukur further added that for the credit scorecard model, the feature selection is made based on Weight of Evidence (WoE) and information value, which is, however usually effective only if data shows a linear relationship. She further explained that WoE is a measure of the predictive power of an independent variable in relation to the target variable. It measures the extent to which a specific feature can differentiate between target classes — “which in our case is good and bad customers,” she said.
Classification machine learning algorithms such as random forest classification and Gradient Boost have contributed significantly to predict risky users and their probability of default. “However, there still remain significant challenges such as algorithms having hidden biases if underlying data is biased, which we try to resolve with best abilities,” said Mydukur on a concluding note.