top of page

Bank Loan Case Study

Project Description:

This case study aims to give you an idea of applying EDA in a real business scenario. In this case study, apart from applying the techniques that you have learnt in the EDA module, you will also develop a basic understanding of risk analytics in banking and financial services and understand how data is used to minimize the risk of losing money while lending to customers.


Business Objective:

This case study aims to identify patterns that indicate if an applicant will repay their instalments which may be used for taking further actions such as denying the loan, reducing the amount of the loan, lending at a higher interest rate, etc. This will make sure that the applicants capable of repaying the loan are not rejected. Recognition of such aspirants using Exploratory Data Analysis (EDA) techniques is the main focus of this case study.


Approach:

All the questions are clearly explained and can be solved using Microsoft Excel only.


Tech-Stack Used:

Microsoft Excel for Microsoft Office 365


Dataset:


Insights:

1. Columns having Null value:

  • Number of columns having null value more than 50% : 41 Nos

  • These columns should be dropped. Number of columns having null value less than 15% : 13 Nos

  • These columns shall be imputed with suitable values which shall be explained subsequently. For analysis of imputation selected 7 variables.


2. Data Imputation analysis for columns having <15% null value:

For categorical variable the value which should be imputed should be the maximum in frequency. So the value to be imputed are:

NAME_TYPE_SUITE: Unaccompanied

OBS_30_CNT_SOCIAL_CIRCLE: 0.0

DEF_30_CNT_SOCIAL_CIRCLE: 0.0

OBS_60_CNT_SOCIAL_CIRCLE: 0.0

DEF_60_CNT_SOCIAL_CIRCLE: 0.0



  • Continuous variables: 'EXT_SOURCE_2‘ 'AMT_GOODS_PRICE‘

  • Categorical variables: 'OBS_30_CNT_SOCIAL_CIRCLE',' OBS_60_CNT_SOCIAL_CIRCLE', 'DEF_60_CNT_SOCIAL_CIRCLE', ‘DEF_30_CNT_SOCIAL_CIRCLE',' NAME_TYPE_SUITE

  • For 'EXT_SOURCE_2' there is no outliers present and missing values can be imputed with mean or median (median: 0.565)

  • There are high number of outliers present in the AMT_GOODS_PRICE data. Hence it is recommended to impute data with Median value i.e. 450000


3. Checking the outlier for numerical variables:

  • The first quartile almost missing for CNT_CHILDREN that means most of the data are present in the first quartile.

  • There is single high value data point as outlier present in AMT_INCOME_TOTAL and DAYS_EMPLOYED. Removal this point will drastically impact the box plot for further analysis.

  • The first quartiles is slim compare to third quartile for AMT_CREDIT,AMT_ANNUITY, DAYS_REGISTRATION. This mean data are skewed towards first quartile.


4. Univariate analysis for categorical variables:

AMT_INCOME_RANGE:

The people having 100000- 200000 are having higher number of loan and also having higher in defaulter.

The income segment having >500000 are having less defaulter.


AMT_CREDIT_RANGE:

The people having <100000 loan are less defaulter. Income having more than >100000 are almost equal % of loan defaulter.



NAME_INCOME_TYPE:

Student pensioner and business have higher percentage of loan repayment.

Working, State servant and Commercial associates have higher default percentage.

Maternity category is significantly higher problem in repayment.


NAME_CONTRACT_TYPE:

For contract type ‘cash loans’ is having higher number of credits than ‘Revolving loans’ contract type.

From the graphs we can see that the Revolving loans are small amount compared to Cash loans but the % of non-payment for the revolving loans are comparatively high.



CODE_GENDER:


The % of defaulters are more in Male than Female.


FLAG_OWN_CAR:


The person owning car is having higher percentage of defaulter.





5. Univariate analysis for continuous variables:

  • Days Birth: The people having higher age are having higher probability of repayment.

  • Some outliers are observed in In 'AMT_ANNUITY','AMT_GOODS_P RICE','DAYS_EMPLOYED', DAYS_LAST_PHONE_CHANGE in the dataset.

  • Less outlier observed in Days Birth

  • DAYS_EMPLOYED. Removal this point will drastically impact the box plot for further analysis.

  • Less outlier observed in DAYS_ID_PUBLISH

  • 1st quartile is smaller than third quartile in In 'AMT_ANNUITY','AMT_GOODS_P RICE', DAYS_LAST_PHONE_CHANGE.

  • In DAYS_ID_PUBLISH: people changing ID in recent days are relatively prone to be default.


6. Bivariate analysis for numerical variables – Target 0 (Client having no payment difficulties):

Family status of 'civil marriage', 'marriage' and 'separated' of Academic degree education are having higher number of credits than others.

Also, higher education of family status of 'marriage', 'single' and 'civil marriage' are having more outliers.


Civil marriage for Academic degree is having most of the credits in the third quartile.


In Education type 'Higher education' the income amount is mostly equal with family status. It does contain many outliers.


Less outlier are having for Academic degree but there income amount is little higher that Higher education.


Lower secondary of civil marriage family status are have less income amount than others.



7. Bivariate analysis for numerical variables – Target 1 (Client having payment difficulties):

Observations are Quite similar with Target 0

Family status of 'civil marriage', 'marriage' and 'separated' of Academic degree education are having higher number of credits than others.

Most of the outliers are from Education type 'Higher education' and 'Secondary'.

Civil marriage for Academic degree is having most of the credits in the third quartile.


There is also have some similarity with Target 0


Education type 'Higher education' the income amount is mostly equal with family status.


Less outlier are having for Academic degree but there income amount is little higher that Higher education.

Lower secondary are have less

income amount than others.


8. Correlation:

From the correlation analysis it is inferred that the highest correlation (1.0) is between (OBS_60_CNT_SOCIAL_CIRCLE with OBS_30_CNT_SOCIAL_CIRCLE) and (FLOORSMAX_MEDI with FLOORSMAX_AVG) which is same for both the data set.






9. Univariate analysis for combined dataset (Distribution of contract status with purpose):

Most rejection of loans came from purpose 'repairs'. For education purposes we have equal number of approves and rejection Paying other loans and buying a new car is having significant higher rejection than approves.


10. Univariate analysis for combined dataset (Distribution of the purpose with target):

Loan purposes with 'Repairs' are facing more difficulties in payment on time. There are few places where loan payment is significant higher than facing difficulties. They are 'Buying a garage', 'Business development', 'Buying land', 'Buying a new car' and 'Education' Hence we can focus on these purposes for which the client is having for minimal payment difficulties.


11. Bivariate analysis for a combined dataset:

  • The credit amount of Loan purposes like 'Buying a home', 'Buying a land', 'Buying a new car' and 'Building a house' is higher.

  • Income type of state servants have a significant amount of credit applied

  • Money for third person or a Hobby is having less credits applied.


For Housing type, office apartment is having higher credit of target 0 and co-op apartment is having higher credit of target 1.

So, we can conclude that bank should avoid giving loans to the housing type of co-op apartment as they are having difficulties in payment.

Bank can focus mostly on housing type with parents or House\apartment or municipal apartment for successful payments.


Result:

  • Banks should focus more on contract types 'Student', 'pensioner' and 'Businessman' with housing types other than ‘Co-op apartment’ for successful payments.

  • Banks should focus less on income type ‘Working’ as they are having most number of unsuccessful payments.

  • In loan purpose ‘Repairs’: - Although having higher number of rejection in loan purposes with 'Repairs' there are observed difficulties in payment on time. - There are few places where loan payment is delay is significantly high. - Banks should continue to be cautious while giving loans for this purpose.

  • Bank should avoid giving loans to the housing type of co-op apartment as they are having difficulties in payment.

  • Bank can focus mostly on housing type ‘with parents’ , ‘House\apartment’ and ‘municipal apartment’ for successful payments.

コメント


bottom of page