Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Is weight by Information Gain the right operator for me?"
mohammadreza
Member Posts: 23 Contributor II
Hi all,
I am using the operator "weight by Information Gain" in order to select the most predictive attributes from a data set with 218000 attribute and 60000 examples. (Actually, this is the resultant example set I got by of RapidMiner text processing.)
I have been waiting for 4 days so far and the process is still running on a PC with 32 GB of RAM. I am afraid this is not the right operator for my problem. Would you please explain if I have done something wrong.
BTW, as far as I could understand, the computational complexity of calculating information gain might be proportional to "number of attributes" * "number of examples" which is in my case 218000 * 60000 calculations. Do you think this might not be tractable in a PC? if yes, I do appreciate if you can propose any alternate solution.
Thanks in advance
I am using the operator "weight by Information Gain" in order to select the most predictive attributes from a data set with 218000 attribute and 60000 examples. (Actually, this is the resultant example set I got by of RapidMiner text processing.)
I have been waiting for 4 days so far and the process is still running on a PC with 32 GB of RAM. I am afraid this is not the right operator for my problem. Would you please explain if I have done something wrong.
BTW, as far as I could understand, the computational complexity of calculating information gain might be proportional to "number of attributes" * "number of examples" which is in my case 218000 * 60000 calculations. Do you think this might not be tractable in a PC? if yes, I do appreciate if you can propose any alternate solution.
Thanks in advance
Tagged:
0
Answers
200.000 Attributes is really a lot. Even in text mining you usually have less.
You might want to batch it and work on a subset of every attributes, write the weights to file and use it afterwards. Also a sample might be a good solution. Don't forget to use materialze data after the select attributes.
Cheers,
Martin
Dortmund, Germany
Just, would you please explain what is materialized data?
Thanks again
In Rapidminer an example set is usually just held one time in memory. If you select attributes, you do not delete them, but just deselect them. In order to get a real copy in memory you need to use the Materialze Data operator.
This is usually not needed. But in this special case you want to be sure to have an example without those attributes, thus i would recommend using it.
Cheers,
Martin
Dortmund, Germany
Thanks in advance,