Implementing a New Feature Selection Algorithm for a Credit Scoring System
I am working on a credit scoring system to predict the riskiness of new customer of a bank based on the history of current bank customers. I have gathered a set of features for each customer, i.e. age, sex, time at address, marital status, account status, etc. These features are categorized in 5 groups in which each group includes similar features. Each group contains features from various sources, and represents different independent information types that together form a risk profile. (e.g., demographics, inquiries, previous performance, trades, etc.).
For the feature selection, I need to select a set of features that best tune my logistic regression model. But for selection of features, I need at least 1-2 features from each group to exist in my selected features. This type of feature selection is supported in SAS enterprise miner by using PROC LOGISTIC with the “SEQUENTIAL=” and “INCLUDE=” options. The “START=” option also starts the step-wise regression with the first x variables specified as well.
I just need to know how this selection type could be implemented in RapidMiner?
Re: Implementing a New Feature Selection Algorithm for a Credit Scoring System
Do you want to implement this feature selection algorithm yourself in Java code and integrate it into RapidMiner, or do you want to find a combination of RapidMiner operators that provide this algorithm? The latter will be at least very complicated because until now we don't support the described algorithm out of the box.