Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Applying an operation to a large example set
Hi,
I have an example set with 10,000 examples and 3,800 attributes. These are document file names and the TF-IDF values for 3800 terms in those documents. I want to raise each TF-IDF value by the power of 0.75. Is there a simple, fast way to do this?
What I have tried is looping through each of the attributes and generating a new attribute that is the TF-IDF value raised by the power of 0.75, then looping through the resulting collection and using recall, join, and remember operators to join each collection example to the previous ones as I iterate through the loop. The problem is that this slows down and eventually stalls out or crashes as the iterations increase and the joined example set gets larger and larger. So I am wondering if there is some more efficient way to do the (seemingly) simple thing of applying one operation like this to every value in the example set.
I should also mention that I looked at the Generate Function Set operator. This looks like what I want, except that the specific operation I want to do is not included as one of the choices in that operator.
Thanks in advance for your help.
I have an example set with 10,000 examples and 3,800 attributes. These are document file names and the TF-IDF values for 3800 terms in those documents. I want to raise each TF-IDF value by the power of 0.75. Is there a simple, fast way to do this?
What I have tried is looping through each of the attributes and generating a new attribute that is the TF-IDF value raised by the power of 0.75, then looping through the resulting collection and using recall, join, and remember operators to join each collection example to the previous ones as I iterate through the loop. The problem is that this slows down and eventually stalls out or crashes as the iterations increase and the joined example set gets larger and larger. So I am wondering if there is some more efficient way to do the (seemingly) simple thing of applying one operation like this to every value in the example set.
I should also mention that I looked at the Generate Function Set operator. This looks like what I want, except that the specific operation I want to do is not included as one of the choices in that operator.
Thanks in advance for your help.
0
Answers
Groovy is the answer. Use the Script operator with this code. I did an experiment with 10,000 examples by 3,800 attributes and it took 2 minutes on my laptop. Obviously other's results may vary
regards
Andrew
Thanks! I think that will work for me.
mikeb