Creating operators and extensions inside RapidMiner, without programming

BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 373   Unicorn
edited February 13 in Knowledge Base
The Custom Operators extension has been presented at RapidMiner Wisdom 2020 and published in the Marketplace.
It allows RapidMiner users to convert existing processes to operators, and bundle these operators to RapidMiner extensions. The extensions can be easily shared with others and even put on the Marketplace when they are mature enough.

An initial set of extensions created with this technique is already available:
  • Database Envy: Implements two operators for things that are available in databases, and now in RapidMiner: Window functions (groupwise aggregates and rankings) and expression-based joins to join example sets on criteria like inequality or even mathematical expressions.
  • GeoProcessing: Finally, geodata processing in RapidMiner! Import data from Shapefiles and GIS databases, transform the geometries and calculate measures like area and distance on them, join on geometry relations and more.

How to create your own extension

Install the Custom Operators extension in your Studio. After restarting Studio, you will see new operators (Extensions/Custom Operators/Parameter Helper) and menu entries (File/Create Custom Operator and Extensions/Create Custom Extension). 

Open the process you want to turn into an operator. You'll most like want to define some parameters. These can be operator parameters in your process, or macros created with Set Macro or Boolean/Category Parameter Macro from the Custom Operators extensions. 
Test the process thoroughly with different inputs. You will likely connect the inp port to an operator inside the process if your future operator acts on some input. You can use test data by selecting an input in the process context, just remember to remove this before publishing the operator.

When you're ready to create your first operator from the process, click File/Create Custom Operator in the menu. Create a folder on your computer to save the custom operator into. Fill out the Create Custom Operator form. It's a good idea to record your inputs (especially the description and the parameter help texts) for later, for example in a separate text file or on a Wiki page. 
Select an icon from the icon search site. The icons there are licensed for usage in RapidMiner. 
Go through the list of required parameters that should be exposed. Activate the checkbox for each parameter that you'd like to expose. You can enter the default value for the parameter, and the parameter's short description (optionally including HTML code).

When you're done, click Save. A new file with the extension .cusop will be created in the selected folder. It contains the process definition and the data you entered.

Repeat for a new process if you want to create another operator. If the operators belong together into the same extension, put them into the same folder.

When you're ready to create the extension, select Extensions/Create Custom Extension in the menu. Fill in the data. It's again a good idea to record the inputs somewhere else for the next time. 
The extension name is an arbitrary, human readable name. It will be displayed in the Extensions folder in the operator list. 
The custom operator folder is the folder you put the custom operators in. If you need to bundle java modules (JAR files) with the extension, put them into a separate folder and enter the path under Jars folder. 
Activate "Force dependencies" if you want to put the extension on the Marketplace later.

Click on OK. It will take about a minute to create the extension. RapidMiner will offer to restart with the new extension when it's done. Enjoy your new extension! 
It's a good idea to create some test processes with test data to verify that the new version is still working correctly. 

When you right-click on a custom extension in RapidMiner Studio, there's an option to open the defining process. You and other users of the custom extension can always look inside to understand how the process works.

You might want to put an icon into the extension. Open the generated extension .jar file with a Zip archive manager and put a PNG file into it, naming it icon.png in the META-INF folder inside the jar archive. 

Publishing an extension in the Marketplace

When your extension is ready and you want to share it with the RapidMiner community, go to the Marketplace and register there. (This user account is not integrated with the other RapidMiner services.) After registering and confirming the email address, you'll see a link Interested in Sharing? Contact us on the main page. 

Be careful to enter the correct product ID, it can't be changed later. It is composed of "rmx_" and the name of the extension you have in your RapidMiner extensions folder. If it is called my_extension-1.0.000.jar on your computer, the product ID will be rmx_my_extension. 
Fill out the other input fields and confirm that you read the developer agreement (after you read it). 

Somebody at RapidMiner will check your submission, usually in a few days. They might contact you with questions.
When the approval process is done, you can select the extension in the menu on the top of the Marketplace page and upload your extension. After the upload you click the Activate link, and your extension is available in the Marketplace! It will be found by the operator search, just like other extensions. 
mbslionelderkrikorDocMushergmeierjwpfautkenezPavithra_Rao

Comments

  • kaymankayman Member Posts: 452   Unicorn
    really exited about this, but some examples with images in the documentation would help as I'm a bit struggling to understand exactly how I can use all the different options to create my own operators. 

    So like an A to Z example of a simple (even dummy) conversion of an existing process in a single / set of operator(s)
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 373   Unicorn
    You are right, @kayman
    I guess I'll create a step by step guide with pictures. Do you have an idea for a new operator that would be simple enough for a tutorial but actually helpful?

    Balázs
  • kaymankayman Member Posts: 452   Unicorn
    one that's on my shortlist would be some of our NLP workflows. I have as an example processes generating wordlists from text, so a potential operator on our side would be dataset in > define which attribute you like to analyse > wordlist as exampleset out. This way 'normal people' are not directly exposed to the internal workflow (attribute to text - process documents (all lowercase - tokenize - stopwords etc)

    I'm wondering in this case if I would be able to use parameters of my new operator (or how to do this) to define parameters for operators used within the new operator, like the prune setting or so.

    But this goes already fairly deep, so a simple process with one or 2 dummy operators and an understanding how parameters of these can be defined (if this is an option to start with) would already be pretty useful
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,245  RM Data Scientist
    Hi @kayman,
    isnt this what Text Vectorization is doing?

    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • kaymankayman Member Posts: 452   Unicorn
    Partially. I miss a few important options on the text vectorization one (like all to lower caps, filter out small tokens, big tokens, n-gram generation, stopwords and so on).

    Granted, for my given example the Text Vectorization would do the trick indeed but in reality I need more options, so let's say I'd like to create my own improved operator similar to that one.
    mschmitz
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 373   Unicorn
    Would this be a good example operator? Optimized Decision Tree. It would consist of  Optimize Parameters, Cross Validation and Decision Tree. Some options would be exposed and it would return the model, the performance and the parameters.
Sign In or Register to comment.