Options

"R scripts"

frankiefrankie Member Posts: 26 Contributor II
edited May 2019 in Help
Hello,

pardon my ignorance but I can seem to import a dataset and do a simle computation on this dataset using the "Execute R script" operator.

- First, can I connect any RM datset directly to the R operator?
- Is the logic behind the input that if I name it "MyData" I can reference to any variable from this dataset with the normal "MyData$variable_name" command? Ie. how it would be done in R.
- I try to do this but since it is a bit difficult to follow the very brief tutorial video I cannot understand what I'm doing wrong: all I get is an error "The data delivered by R in the variable {0} was not in the correct format for importing as an ExampleSet"

Thanks in advance,
Frankie
Tagged:

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Frankie,
    that's what the operator is designed for.

    You can export all data sets, but R might see the data in a different way. For example the time attribute will be exported as milliseconds after 1980 I think. So you have to be cautious about this.
    The data is then exported as a data frame under the variable name you enter. If you name it "MyData" you can use the normal methods of accessing a data frame called "MyData" in R.
    If you are again importing Data from the R Script, the given name of the variable must reference a DataFrame containing only Vectors and Factors. If the variable is called "MyImport" you can define "MyImport.label", too, which might refer to the column name that is used as label.

    Greetings,
      Sebastian
  • Options
    frankiefrankie Member Posts: 26 Contributor II
    Could someone please provide a simple example of how I can use R code on a RapidMiner dataset? An example is so much easier to understand, so please, if somebody could find the time..

    I've been trying to build a simple process that:

    1. Retrieve the Iris dataset that comes bundled with RM
    2. Use R code to sum two of the variables, say a1+a2


    Thanks
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    sure, no problem. Here you go:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
        <process expanded="true" height="296" width="413">
          <operator activated="true" class="retrieve" compatibility="5.0.8" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.0.8" expanded="true" height="94" name="Multiply" width="90" x="179" y="30"/>
          <operator activated="true" class="generate_attributes" compatibility="5.0.8" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="120">
            <list key="function_descriptions">
              <parameter key="Sum" value="a1+a2"/>
            </list>
          </operator>
          <operator activated="true" class="r:execute_script_r" compatibility="5.0.1" expanded="true" height="76" name="Execute Script (R)" width="90" x="313" y="30">
            <parameter key="script" value="x &lt;- as.data.frame(c(data, data[1] + data[2]))&#10;colnames(x)[6] = &quot;Sum&quot;"/>
            <enumeration key="inputs">
              <parameter key="name_of_variable" value="data"/>
            </enumeration>
            <list key="results">
              <parameter key="x" value="Data Table"/>
            </list>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Execute Script (R)" to_port="input 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 2"/>
          <connect from_op="Execute Script (R)" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="72"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Cheers,
    Ingo
Sign In or Register to comment.