Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Feature ranking
Hi all,
I apologise if this has come up before, I did a quick search but couldn't find anything specifically addressing my issue.
I have a dataset which consists of a number of variables: continuous, date, multinomial and binomial. The data label is binomial.
There are a number of examples and tutorials for running subset selection in order to find the most informative variables in the data. However, I would like to do something more simple to begin, merely rank the variables (i.e. rank the features by a given metric).
Is there an easy way to do this using an operator? I.e. to feed my dataset into a method, and get an ordered list of variables out? Of course, the added complication is that I have different types of variable (i.e. continuous vs. categorical), but I suppose ranking by p.value would allow me to fuse the outputs.
Thanks in advance for any help you can give
I apologise if this has come up before, I did a quick search but couldn't find anything specifically addressing my issue.
I have a dataset which consists of a number of variables: continuous, date, multinomial and binomial. The data label is binomial.
There are a number of examples and tutorials for running subset selection in order to find the most informative variables in the data. However, I would like to do something more simple to begin, merely rank the variables (i.e. rank the features by a given metric).
Is there an easy way to do this using an operator? I.e. to feed my dataset into a method, and get an ordered list of variables out? Of course, the added complication is that I have different types of variable (i.e. continuous vs. categorical), but I suppose ranking by p.value would allow me to fuse the outputs.
Thanks in advance for any help you can give
0
Answers
The approach you are looking for is called "Filter" in the area of feature subset selection. Rapidminer provides a good amount of operators for this. See lefthandside Modelling -> Attribute Weighting.
greetings,
steffen