Financial Institutions are in the business of giving out credits (loans) to their customers. These lenders use various factors such as (but not limited to) the loan amount approved, credit score, annual income, debt-to-income ratio, loan tenure, Central bank rates, prime rate (the lowest possible interest rate at which financial institutions can lend money to their most trustworthy customers, number of products utilized, number of accounts, etc. to determine the appropriate interest rate for each customer.
Assessing customer risk profiles in order to determine the required interest rate has been practiced for many years by lending institutions.
This project used a readily available dataset from Kaggle, to estimate (or calculate) interest rates of customers based on their individual risk factors. It is just to kick-start the machine learning process and to demonstrate how easy it is to implement this solution.
Step 1: Upload Data
Raw data downloaded from Kaggle were uploaded as a .csv file.
Fig1: View of the raw data uploaded
Step 2: Analyze and Wrangle the data
Data were inspected and analyzed for any missing values. Data were also edited(wrangled) to convert to the right data types as necessary. For example, removing commas (“,”) from numbers by changing datatype from “Text” to “Numbers” to be understood by the system.
Fig.2 Analyzing and wrangling data
Step 3: Define Datasets
There are three stages in defining a dataset. The first is the “Training & Target Features” tab, where I selected the Target (i.e. Output to predict – Interest rate) and then the Input/predictor variables (i.e. Features) as depicted below in Fig.3.
The “Find Feature Importance” helps in viewing the relative importance (shown as scores) of each input variable (Feature) on predicting the target.
Fig. 3: Choosing the target and features and finding feature importance
The next stage is the Feature preprocessing where different built-in algorithms are applied to different features that will be used for the machine learning model as shown in Fig.4
Fig.4 View of system algorithms being applied to selected features.
The last stage of defining the dataset is to give a name to the dataset which is done on the “Review and Save” tab. In this project, the name is riskpricing_D06” as depicted in Fig.5. I then generate the dataset by clicking on “Define Dataset”.
Fig 5: Save and generate the dataset.
Step 4: Cross validate dataset
Split the dataset into training and testing(validation) sub-datasets. The process is to use the training dataset for learning, which is then validated by the testing dataset. The rule of thumb to use for splitting data into train/test sub-datasets is 80/20. This is to provide enough data for the algorithm to learn from. This splitting is done on the “Cross Validation Dataset” tab by clicking the Generate Dataset button as shown in Fig. 6.
Fig.6 Split train-test sub-datasets.
Dataset is now ready for use to create the machine learning model which will be explained in the next step.
Step 5: Create a machine learning model
The focus of this exercise is to classify individual customers into three(3) categories of interest rates(e.g. High, Moderate, and Low ) based on their individual risk factors.
NB: This is a regression exercise so the “Regression” tab was selected.
I created a model container “Regress_Mod01 for the dataset. I chose an algorithm and clicked on “Create New Model”. One machine learning model was created but I wanted to see if there could be a better model so I clicked on the “Auto Pilot” which then generated many more models with different built-in algorithms. I compared the performance metrics of the different models and chose a better one with a higher performance score. I then published my chosen model to my supervisor for review and acceptance.
Fig.7 Select a model and publish it
Step 6: Model Governance
The system is designed such that, when a selected model is published for a peer-review, the reviewer is notified through email. The reviewer satisfies himself or herself that the model is performing as required. After he/she has accepted or rejected the model upon review, the modeler is also notified through email. My supervisor accepted the model which led me to the next step – to deploy the model.
Step 7: Deploy the model
It is now time to deploy the production model. All I did at this step is to create an App – RiskPrice_App and click on “Deploy”. This automatically created an Application Programming Interface(API) which can be called from an application.
Fig.8: Deploy the model
Step 8: Dashboard
The Dashboard was automatically created and RiskPrice_App can be accessed.
Fig.9: Run the App from the dashboard and click predict.
Based on the features (input variables) provided, the estimated interest rate this customer is 11.28.
Developers can develop an application to interface with models for use in their corporate businesses.