RapidMiner

Set dynamical parameters in RapidMiner process

by RMStaff a week ago - edited a week ago

Question

Can we dynamically select/deselect ‘enable parallel execution’ for a loop based on time of day it is scheduled to run?

Answer

Absolutely.

 

That is a smart idea to avoid API rate limits during a rush hour on RapidMiner server. Suppose we would like to schedule the job on a weekday 9-5. The server is busy during that time and a loop without parallelization may be a good choice. As for the jobs scheduled on Friday evening or weekends, we would like to leverage all the computing power and enable parallel execution of the loop.

 

First things first, you will need the extension "Operator Toolbox" from marketplace or manually download & install from

https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_operator_t...

 

In the process, you can easily get the timestamp of the execution and make good use of the operator "Set Parameters from ExampleSet" from Toolbox.

Screen Shot 2017-05-19 at 9.58.52 AM.png

 

You can generate macro with built-in date_now() function and put it in an example set. The input data for "Set Parameters from ExampleSet" need at least three columns: 1. operator name, 2. parameter name, 3. value of that parameter

 

Sample process is attached here. 

Screen Shot 2017-05-19 at 10.01.32 AM.png

 

Make sure you have the right execution order: "Set parameters from ExampleSet" before the operators (e.g. loop) you wanna execute dynamically.

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.5.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="generate_macro" compatibility="7.5.000" expanded="true" height="68" name="Generate Macro" width="90" x="45" y="34">
        <list key="function_descriptions">
          <parameter key="timestamp" value="date_now()"/>
          <parameter key="hour" value="date_get(date_now(),DATE_UNIT_HOUR)"/>
        </list>
      </operator>
      <operator activated="true" class="generate_data_user_specification" compatibility="7.5.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="246" y="34">
        <list key="attribute_values">
          <parameter key="operator" value="&quot;Loop Attributes&quot;"/>
          <parameter key="parameter" value="&quot;enable_parallel_execution&quot;"/>
          <parameter key="value" value="if(eval(%{hour})&gt;9 &amp;&amp; eval(%{hour})&lt;17,true, false)"/>
        </list>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="operator_toolbox:set_parameter_from_ES" compatibility="0.3.000" expanded="true" height="82" name="Set Parameters from ExampleSet" width="90" x="447" y="34">
        <parameter key="Operator name column" value="operator"/>
        <parameter key="Parameter name column" value="parameter"/>
        <parameter key="Value column" value="value"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="7.5.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="581" y="34">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="concurrency:loop_attributes" compatibility="7.5.000" expanded="true" height="82" name="Loop Attributes" width="90" x="715" y="34">
        <process expanded="true">
          <connect from_port="input 1" to_port="output 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="false" class="concurrency:loop" compatibility="7.5.000" expanded="true" height="68" name="Loop" width="90" x="715" y="136">
        <process expanded="true">
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Generate Data by User Specification" from_port="output" to_op="Set Parameters from ExampleSet" to_port="exampleset"/>
      <connect from_op="Retrieve Iris" from_port="output" to_op="Loop Attributes" to_port="input 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>