Implementing a New Feature Selection Algorithm for a Credit Scoring System

hsalimi · July 2013

I am working on a credit scoring system to predict the riskiness of new customer of a bank based on the history of current bank customers. I have gathered a set of features for each customer, i.e. age, sex, time at address, marital status, account status, etc. These features are categorized in 5 groups in which each group includes similar features. Each group contains features from various sources, and represents different independent information types that together form a risk profile. (e.g., demographics, inquiries, previous performance, trades, etc.).

For the feature selection, I need to select a set of features that best tune my logistic regression model. But for selection of features, I need at least 1-2 features from each group to exist in my selected features. This type of feature selection is supported in SAS enterprise miner by using PROC LOGISTIC with the “SEQUENTIAL=” and “INCLUDE=” options. The “START=” option also starts the step-wise regression with the first x variables specified as well.

I just need to know how this selection type could be implemented in RapidMiner?

MariusHelf · July 2013

Do you want to implement this feature selection algorithm yourself in Java code and integrate it into RapidMiner, or do you want to find a combination of RapidMiner operators that provide this algorithm? The latter will be at least very complicated because until now we don't support the described algorithm out of the box.

Best regards,
Marius

hsalimi · July 2013

Hi Marius,

Ain't it possible to support such kind of selection based on the existing selection methods? If not, how can I implement this algorithm in java and integrate in into RapidMiner?

BR
Hadi

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Implementing a New Feature Selection Algorithm for a Credit Scoring System

Answers