RapidMiner

RapidMiner

Implementing a New Feature Selection Algorithm for a Credit Scoring System

Contributor

Implementing a New Feature Selection Algorithm for a Credit Scoring System

I am working on a credit scoring system to predict the riskiness of new customer of a bank based on the history of current bank customers. I have gathered a set of features for each customer, i.e. age, sex, time at address, marital status, account status, etc. These features are categorized in 5 groups in which each group includes similar features. Each group contains features from various sources, and represents different independent information types that together form a risk profile. (e.g., demographics, inquiries, previous performance, trades, etc.).

For the feature selection, I need to select a set of features that best tune my logistic regression model. But for selection of features, I need at least 1-2 features from each group to exist in my selected features. This type of feature selection is supported in SAS enterprise miner by using PROC LOGISTIC with the “SEQUENTIAL=” and “INCLUDE=” options. The “START=” option also starts the step-wise regression with the first x variables specified as well.

I just need to know how this selection type could be implemented in RapidMiner?
2 REPLIES
Super Contributor

Re: Implementing a New Feature Selection Algorithm for a Credit Scoring System

Do you want to implement this feature selection algorithm yourself in Java code and integrate it into RapidMiner, or do you want to find a combination of RapidMiner operators that provide this algorithm? The latter will be at least very complicated because until now we don't support the described algorithm out of the box.

Best regards,
Marius
Contributor

Re: Implementing a New Feature Selection Algorithm for a Credit Scoring System

Hi Marius,

Ain't it possible to support such kind of selection based on the existing selection methods? If not, how can I implement this algorithm in java and integrate in into RapidMiner?

BR
Hadi