Did anyone tried to build a credit scoring model in RapidMiner?

Adiletkgz · April 2019

If yes, could you please share with your experience and what kind of models did you create?

yyhuang · April 2019

Are you looking for a classification or regression model?
No matter what kind of model you built in the training stage, you can store the model. In another process built for scoring, just retrieve the model object, apply the pre-processing and apply model, you will get a prediction!
Please check out these awesome scoring demo created by Jeff
https://academy.rapidminer.com/learn/video/scoring-demo?utm_source=studio
Inside RapidMiner, you can search for operators, tutorial process, also search for academic self-paced videos

Image: https://us.v-cdn.net/6030995/uploads/editor/sm/nw9wjne62eqw.png

Enjoy!

rfuentealba · April 2019

Hello, @Adiletkgz,

I just wanted to add something to the excellent question by the fabulous super data scientist extraordinaire @yyhuang (hellooo!!!), I can tell you a bit of my experience.

First let me begin telling you that Credit Scoring models are inherently classification problems. Same with fraud prevention, but holy moly that is way more complex.

Basically, you take a bunch of historical operations and analyse variables (each financial entity has its own set of variables), to provide a label. Through Rule Induction, you can extract the rules to classify new operations.

A simple example may be: Age, Salary, Business/Activity, Contract Type, Average Payment Delay, Average Debt, Credit, Loans.

Let's say that a 20 years old student who earns 400 USD on a temporary contract and no previous banking history has more risk than a 30 years old professional who earns 4000 USD on a stable work he has been at for 3 years, whose average payment delay is less than 5 days after taking 5-10 loans.

Once you define the data you want to use for scoring, you can balance your data using weighting or sampling (I've seen good results with both but your mileage may vary. If you decide to go with a tree that doesn't take weighting in account, you may want to try SMOTE sampling). This helps making classes have a similar representation. If you want to know if it's necessary, a simple k-Means or x-Means can help you checking what kind of data you have.

Next, you have to apply a classification algorithm such as a Decision Tree, a CN2, a Random Forest, CHAID. I've seen efforts using Linear and Logistic Regression but so far the project was abandoned before I could analyse it, so I don't know about it.

Once you get your rules, you can either use the pre-trained algorithm to score new entries, which may at certain times be expensive in terms of network and processing. What I've used here (and I know my good old friend @kypexin and I were discussing this somewhere on this community but cannot find the link) is that I analyse the history, adjust the Chilean market values and retrain the model, but the model is then built on a Rule Engine and made available to the commercial team (who normally see data science as black magic, so the rule engine helps them building/destroying things in a more human-readable way) to make adjustments such as promotions that help certain user profiles to engage in more businesses.

I have created a quite large amount of credit scoring models for different customers with this method. If you need any specific question, just ping me!

Hope this helps,

Rodrigo.

kypexin · April 2019

Hi @Adiletkgz

I have been building models with RapidMiner to be used for scorecards development. Traditionally, these are simple linear models (linear or logistic regression) which are transformed later to a 'classic' scorecard format; because of regulatory requirements one of the main things there is interpretability, which means each model decision should be explainable. Most important part, however, is not the modelling process itself (it's pretty simple in case of linear models) but variables selection; usually a credit score model starts with hundreds (if not thousands) of variables out of which only 20-30 most relevant variables are usually left at the end. Feature selection process is also a bit different in credit scoring, as it is usually based on metrics like weight of evidence and information value, which are relatively rarely used in other areas of predictive modelling.

Overall, credit scoring is one of the most conservative areas of predictive modelling, and the techniques used there haven't much changed in decades. In my experience, RapidMiner is good for data preparation, building and evaluating credit scoring models, but still lacks many default methods and metrics to be used for credit scoring (like for example above mentioned WoE and IV metrics), and many parts of traditional scorecard development require substantial amount of manual work in RapidMiner, which at the end result in some long and complicated processes. So I ended up with combination of RapidMiner and Python scripts to design scorecards, and I would say I tend to do most of the job in Python rather than in RM.

It's not easy to summarise the overall experience in a short forum post anyway

If you have any specific problems to be solved in the scope of credit scoring models, please let us know the details and we'd gladly help with these.

kypexin · April 2019

Hi @Adiletkgz

Yet another thing I forgot to mention, RapidMiner has a sample process for predicting credit risk, you can find it in Samples folder of the repository:

Image: https://us.v-cdn.net/6030995/uploads/editor/sx/xpcjix3f2s2t.png

Image: https://us.v-cdn.net/6030995/uploads/editor/0b/25fane6dtenu.png

However, need to mention that this is just a classification model and not a traditional scoring card which I was mentioning earlier.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Did anyone tried to build a credit scoring model in RapidMiner?

Best Answers