Save results of operators which have a long runtime in one process

MoWeiMoWei Member Posts: 18 Maven

Hello everyone,

I have a large data set with about 10000 examples and about 40 attributes. There are only numeric attributes (real and integer). I used the „Weight by SVM“ operator to weight the attribute and afterwards I took the „Select by Weight“ operator to continue with the top ten attributes. Now I want extend the process to predict the label attribute. So I have to try different operators like Decision Tree an so on. The problem is every time I start to run the process the „Weight by SVM“ need about 20 minutes so that I have to wait a lot of time if I run the process every time from the beginning.

Now the question: What is the best way to save the results of the „Weight by SVM“ operator? At the moment I just want to change operators after I used the „Weight by SVM“ and „Select by Weight“ operator so that the selected attributes for prediction are always the same. 

My solution at the moment: I select the attributes and store the reduced data set in one process and in an other process I retrieve the reduced data set and try to predict the label in there.

Is it somehow possible to put all the operations in one process without waiting a lot of time while the „Weight by SVM“ is running? Cache or something like that?

Thanks you very much.

Best regards

Moritz

Best Answers

Answers

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @MoWei

    There are multiple ways to deal with this,

    1. You can build all your processes in the same process using the subprocess operator. In this, you can connect the output of select by weight to pick the attributes and input the data related to these attributes to train and test different predictive algorithms. Each algorithm will be in a different subprocess. You can use Store operator to store the results of each model performance in rapidminer repository. You can see in below image, I am training 6 models on in a single process with the help of subprocess operator. Inside these subprocesses, there are relevant operators like cross-validation, model, performance, store, etc.


    2. You can store the weight by SVM results using store operator in the repository and then access it in different processes you are building. You don't need to run multiple times (Weight by SVM) as the results are stored in the repository, you need to just drag and drop it in the current process and use it in select weights.

    Hope this helps, please inform if you need more information.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • MoWeiMoWei Member Posts: 18 Maven

    Hey @varunm1

    Thank you for your answer.

    To 1: I know that I can use the „subprocess“ operator, but when I click on „Run“ every time the hole subprocesses are running too, or not? Then also the „Weight by SVM“ ist running, even when I put it in an subprocess.

    To 2: Yes that is what I do at the moment, but I wanted to know if there is a possibility without the „Store“ and „Retrieve“ operator? 

    In the following Picture is what I want to have, but without running the „Weight by SVM“ all the time I click on „run the process“


    In the following picture is what I do at the moment. Let the process run one time with enabled „Weight by SVM“ and „Store Weight by SVM“ operator when I changed something ist „preprocessing subprocess“ and next time when I run the process I used the „Retrieve“ which gives me the „Weight by SVM“ results. It is okay but I thought there is maybe an other nicer way. I don’t want to have more processes (at the moment) I want to see everything in one process. Hopefully you understand what I mean. 


    But thank you for now. If you have an idea to solve my problem, it would be nice but it is also okay, how I do it at the moment. 

    Best regards

    Moritz

  • MoWeiMoWei Member Posts: 18 Maven
    Hey @varunm1 @yyhuang

    perfect, that's what I´ve been looking for. And also pretty easy with the "Select Subprocess" operator. I could have figured it out myself.  :(
    In the following two screenshots, how I do it now:

    Operator "Select Subprocess":


    So I just have to set the "Select which" parameter of the "Select Subprocess" operator to "1" to start the "Weight by SVM" operator and store "the new results" or set "2" to use "the old results". Perfect!

    Thank you very much!

    Best regards

    Moritz


Sign In or Register to comment.