Tutorial for Creating Custom Operators

BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
edited March 2020 in Knowledge Base

This is a tutorial for creating a custom operator from a process, and putting it into a new extension.

Make sure that you have the Custom Operators extension installed from the Marketplace, and that you're working with a Java Development Kit (JDK), not just a Java Runtime Engine (JRE). The JRE doesn't contain the necessary tools.

The operator will optimize a Decision Tree automatically and return the optimized decision tree as a model, and its validation results.

Create a folder structure in the repository to store the new extension and the operator processes.


The first operator will be called “Optimized Decision Tree”. Save a new process under this name in the extension subfolder. (The folder names are arbitrary, this is how I structured my extensions.)

Let's work with a sample example set as the input of the process (activate the Context panel if you didn't yet in the View/Show Panel menu). (We will remove this later, but we need data to test the process.) We're going with Iris here (very creative, I know).


This is how the process looks like. Note that the input connector is “filled” because of the specified data input in the process context. The model output of Optimize Parameters is the first output of the process.


Inside Optimize Parameters there is a Cross Validation with a Decision Tree.


Then we set up Optimize Parameters with some parameter combinations. This will try a lot of combinations, so it will be slow on large datasets. (1,728 combinations * 11 models (because of 10-fold cross validation) take a while.)


Feel free to try on a different dataset and possibly reduce the number of combinations to test.

Test the process and make sure that it works as expected. It should also work when changing Iris to Labor Negotiations and Sonar in the process context. Thorough testing is important!

After testing the process it's time to create a new custom operator. For this, remove the test input in the process context!

Create a new folder on the hard disk (not in a RapidMiner repository!). This folder will contain all operators that go into the new extension. Inside this folder I recommend to save a file inputs.txt (the name is arbitrary) to record the inputs for the Custom Operator window. You don't want to type the entire description again every time you save a new version of the operator process!

Now select File / Create Custom Operator. (If you don't have this menu entry, check if the Custom Operators extension is properly installed.)


Enter the data into the appropriate text boxes. Here we select two exposed parameters, log_all_criteria from Optimize Parameters and number_of_folds from Cross Validation. When the form is set up, click Save and select the folder on the hard disk to put the custom operator file into.


Create the extension by clicking Extensions / Create Custom Extension.


Make sure to select the correct folder where you saved the custom operator. You can also select a color for your extension, it will apply to every operator in this extension. It's a good idea to activate Force dependencies if you want to share your extension with others.

You only need the Jars folder if your extension uses external software libraries that should be included in the extension. This is normally not the case.

Click OK. Now RapidMiner will work for a while and then ask if you want to restart with the new extension. After the restart, you see the new extension:


It has the icon that was specified when creating the custom operator. The operator offers the selected parameters:


It's a good idea to create a test process with the operator and apply it to multiple example sets.


There is a surprise when executing the process with the custom operator: When we developed the original process, the log table from the parameter optimization was visible. Now it's not. It seems that the Custom Operators extension doesn't expose the process logs. This also means that the parameter “log all criteria” doesn't do anything. So we'll remove that parameter the next time we're creating the custom operator.

Now we have seen that the operator takes a lot of time on medium or large size datasets, so we'd like to create an option to make it faster by reducing the number of parameter combinations.

First we add a Category Parameter Macro operator from the Custom Operators extension and define the choices “fast” and “slow” there. We name the macro “speed”.

We also need to expose the parameter number_of_folds as before. However, we have two cross validations here, so we want to set a macro to configure both from one source. So we use Set Macro as a simple parameter input.


We create a branch to execute the optimization depending on the speed macro value.

The Fast optimization operator is set up with a reduced number of combinations, so it might not find the best option like the slow one, but it will be much faster.

The cross validations inside the parameter optimizations now use to the %{folds} macro.

So this is how the new parameters are set up:


Save the custom operator in the same folder as before, and recreate the extension with the same settings as before.

When you create a new version of your extension, make sure to enter the same or a higher version number as before. RapidMiner will load the latest version of an extension, and that is what you want to work with.

Testing the new process shows the updated operator parameters:


Congratulations! You've learned how to create a custom operator from a RapidMiner process.

Comments

  • hbajpaihbajpai Member Posts: 102 Unicorn
    BalazsBarany. I was planning to create a few too, waiting for 9.6 release so that I can use Python to build operators and wrap them to an extension. I am really psyched about it.
    Best,
    Harshit
  • pallavpallav Member Posts: 39 Contributor II
    @sgenzer -  Is "Python Operator Framework" exists as of now. i was able to create a new operator using python script but not sure how to add parameters. 
    For example if i want a operator to perform addition of  input A and B. but i want to define a operator to alter something in input say suppose i want to give some parameter  which will 2X or 3X given 2or3 the value of input A. How can we define parameter using python .
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @pallav the Python Operator Framework is still in beta. Reach out to @bhupendra_patil.

    Scott
  • anaRodriguesanaRodrigues Member Posts: 33 Contributor II
    Hi @BalazsBarany can this be used to create a superoperator with subprocesses?
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi @anaRodrigues,

    no, that's unfortunately not possible with this extension. 

    Regards,
    Balázs
Sign In or Register to comment.