RapidMiner 9.8 Beta is now available

Be one of the first to get your hands on the new features. More details and downloads here:

GET RAPIDMINER 9.8 BETA

Automating a RM5 Process

CleoCleo Member Posts: 44  Guru
edited November 2018 in Help
Hello,

I am interested in automating a RM5 process

I have created a very simple RM5 application which has the operators: Read Model, Read Database, which join to Apply Model then Write database. 

What I need is a way to automate this process.  My database gets updated once a minute and I have 10 different data sources and models.  Also I write predictions to 10 different tables. 

I would like to continuously loop through this application and just change the parameters: Read Model – input file, Read Database –Sql statement and Write Database – Output table. 

I have looked at the various process control loops without success.  Should I create 10 different applications with the various configurations and load them through a Scheduled Task (windows cron job) or is there a better solution. 

I am also working with RM5 Beta, and have been unable to load this application through the command line.  (OS is Windows XP)

Thanks in advance,
Cleo

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi,
    there will be something like an enterprise server which beneath many other things supports you by performing scheduled tasks. It will be released during the first quarter of 2010.
    Executing RapidMiner from command line should work again with the final version of RapidMiner 5.

    Beside this, you could wrap your process inside a loop operator. You then could enter 2.000.000.000 as number of executions. This will execute the inner process 2 billion times. If the inner process runs one second, this will keep the thing running for 63 years.
    If your inner process executes too fast, you could insert a scripting operator, which just waits a little bit. For example the following script would wait 2 seconds and return the first input as first output:
    synchronized(this) {
    wait(2000);
    }
    return input[0];
    Greetings,
     Sebastian
  • CleoCleo Member Posts: 44  Guru
    Hello Sebastian,
    Thank you for the quick response and the suggestions.  I have really enjoyed working with RM.
    I have successfully wrapped my process within a loop operator and added the script.  This solves my problem continuously loading my data. 

    Problem 1:
    I have 10 different models, input data and predictions.  My current setup has a loop operator.  Within the loop operator there is a Read Model operator – which has a model file of “model i”, Read Database Operator with a query file of “query i”.  These two operators join an apply model operator then it connects with a Write Database operator with a table name of “output_i”.  “i” should iterate through 1 to 10. 
    I can think of two ways to accomplish this:

    Setup 1: - I have 10 different instances of RM running.
    Setup2: - Somehow iterate “i” with a one of the loop operators or the script.  I would prefer this setup but I am not sure which operator or combination of operators would accomplish this.

    Unrelated script question:
    I have not looked at the scripting operator before this, but I think it could perform some custom preprocessing easier then my current method of using triggers to run sql queries within my mysql DB.  Is the RM scripting language vbs scripting? Can I set break points within the script or what are the debug strategies you use?  Have you seen a sample script which does any preprocessing?

    Thanks again,
    Cleo
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi Cleo,
    let me first address your problem. You could simply add an loop parameter operator. If you insert an set macro operator as a child, you might define values for the macro value, which are inserted subsequently into the macro operator during the iteration. You could use this macro to attach the index into the file name. Please take a look at the following example process:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="280" width="346">
          <operator activated="true" class="loop_parameters" expanded="true" height="76" name="Loop Parameters" width="90" x="112" y="30">
            <list key="parameters">
              <parameter key="Set Macro.value" value="1,2,3,4,5,6,7,8,9,10"/>
            </list>
            <process expanded="true" height="527" width="922">
              <operator activated="true" class="set_macro" expanded="true" height="76" name="Set Macro" width="90" x="112" y="30">
                <parameter key="macro" value="index"/>
                <parameter key="value" value="will be replaced"/>
              </operator>
              <operator activated="true" class="read_model" expanded="true" height="60" name="Read Model" width="90" x="246" y="30">
                <parameter key="model_file" value="your model file %{index}.mod"/>
              </operator>
              <connect from_port="input 1" to_op="Set Macro" to_port="through 1"/>
              <connect from_op="Set Macro" from_port="through 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
            </process>
          </operator>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
    This will give you hints, how to solve the problem.

    To your scripting question: The language we use is Groovy, that's rather java like. There are no breakpoints available, so debugging of scripts is more complex. There's the possibility of writing your own rapid miner operators, I'm just writing an tutorial how to do this.
    But I guess that RapidMiner's own operators already cope with almost all possible preprocessing steps needed. What do you miss? Maybe you just need to insert a chain of RapidMiner Operators...

    Greetings,
      Sebastian
  • CleoCleo Member Posts: 44  Guru
    Hello Sebastian,

    Thanks again! With a couple of simple modifications to your RM5 file suits my needs perfectly.

    The data I am working with is a time series and I am attempting preprocess the data in three ways. 
    1) Moving Average (average value of the last 2 values of col2 and col3 )[ Average(col2(t),col2(t+1),col3(t),col3(t+1)]
    2) Percent change ([Col1(t)-Col1(t-1)]/Col1(t)*100)
    3) Custom binomial result based on: if (col1(t)-col2(t+x))>const1 before (col1(t)-col3(t+x))>const2 then Result=Yes else Result=No  ie which statement is true first

    So far I have done this and other preprocessing in the database, but I think RM would be better at it.  I believe cases 1 and 2 could be achievable with standard RM operators but I feel case 3 will require custom coding.

    I have unsuccessfully tried to implement a “Hello world” groovy  example from http://groovy.codehaus.org/.
    If possible I would appreciate a small example script in Groovy. I have included the pseudo code of an example I which could adapted to my personal needs.  Assuming the execute script operator is working with an exampleSet loaded from a Read excel operator containing with one sheet and the number 1,2,3,4,5 in column A.

    Step 1) Load the exampleSet from the Excel Operator
    Step2)Create a loop for each row:
    Step2a) Print the value of the attribute Column A (row) {Print to a log file, or to the screen or anywhere else for debug proposes}
    Step2b) Call a function passing it row
    and have the function return the result row+10
    Step3) Return to RM the new ExampleSet containing two columns A and A+10 ie (1,11),(2,12),(3,13),(4,14),(5,15)

    If you would like I could give you some feedback on the tutorial you are writing.

    Thanks  again,
    Cleo
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi Cleo,
    the first two steps would be perfectly fulfilled with the operators of the time series extension, we will publish with the final version.
    The last step can be done without scripting, just using the Construct Attribute operator. It handles if-conditions, and even nested conditions. So it should be possible to extract the nominal target value with that.

    As an example for your script, I will quote the still unfinished tutorial:

    Let’s assume we have the following situation: We get data from a machine, that count’s the seconds since it was switched on. Each entry in this log file has this time stamp. Unfortu-nately other data sources we are going to use don’t have this relative time stamp. So we have to transform the relative format into a regular date and time format. Since RapidMiner doesn’t provide an operator solving this particular problem, we decide to write a small script. This problem doesn’t seem to be worth the effort of building a complete extension, because we can’t believe there are many other such stupid machines around, that don’t have an integrated clock. We build a simple process, which should do the trick:

    Image 1: A simple process for applying a script
    As a first step we are going to load the data and then directly apply our script. As a last step we will do some date adjustment, but we will come back to this later. After loading we have an ExampleSet consisting of a number of attributes, describing the machine’s state. They are called att1, att2 to att500. The time stamp is contained in an attribute named relative time. During scripting we might ignore the state’s attribute. We just want to focus on the one single attribute.
    And here's the resulting code after two types of explanations:

    1. import com.rapidminer.tools.Ontology;
    2.
    3. ExampleSet exampleSet = input[0];
    4. Attributes attributes = exampleSet.getAttributes();
    5. Attribute sourceAttribute = attributes.get("relative time");
    6. String newName = ("date(" + sourceAttribute.getName() + ")";
    7. Attribute targetAttribute = AttributeFactory.createAttribute(newName, Ontology.DATE_TIME);
    8. targetAttribute.setTableIndex(sourceAttribute.getTableIndex());
    9. attributes.addRegular(targetAttribute);
    10. attributes.remove(sourceAttribute);
    11.
    12. for (Example example: exampleSet) {
    13. double timeStampValue = example.getValue(targetAttribute);
    14. example.setValue(targetAttribute, timeStampValue * 1000);
    15. }
    16.
    17. return(exampleSet);

    Hope that will help you.

    Greetings,
      Sebastian
  • CleoCleo Member Posts: 44  Guru
    This works perfectly.

    Thanks for your help.

    Cheers,
    Cleo
Sign In or Register to comment.