Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"R scripts"

frankiefrankie Member Posts: 26 Contributor II
edited May 2019 in Help
Hello,

pardon my ignorance but I can seem to import a dataset and do a simle computation on this dataset using the "Execute R script" operator.

- First, can I connect any RM datset directly to the R operator?
- Is the logic behind the input that if I name it "MyData" I can reference to any variable from this dataset with the normal "MyData$variable_name" command? Ie. how it would be done in R.
- I try to do this but since it is a bit difficult to follow the very brief tutorial video I cannot understand what I'm doing wrong: all I get is an error "The data delivered by R in the variable {0} was not in the correct format for importing as an ExampleSet"

Thanks in advance,
Frankie
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Frankie,
    that's what the operator is designed for.

    You can export all data sets, but R might see the data in a different way. For example the time attribute will be exported as milliseconds after 1980 I think. So you have to be cautious about this.
    The data is then exported as a data frame under the variable name you enter. If you name it "MyData" you can use the normal methods of accessing a data frame called "MyData" in R.
    If you are again importing Data from the R Script, the given name of the variable must reference a DataFrame containing only Vectors and Factors. If the variable is called "MyImport" you can define "MyImport.label", too, which might refer to the column name that is used as label.

    Greetings,
      Sebastian
  • frankiefrankie Member Posts: 26 Contributor II
    Could someone please provide a simple example of how I can use R code on a RapidMiner dataset? An example is so much easier to understand, so please, if somebody could find the time..

    I've been trying to build a simple process that:

    1. Retrieve the Iris dataset that comes bundled with RM
    2. Use R code to sum two of the variables, say a1+a2


    Thanks
  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    sure, no problem. Here you go:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
        <process expanded="true" height="296" width="413">
          <operator activated="true" class="retrieve" compatibility="5.0.8" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.0.8" expanded="true" height="94" name="Multiply" width="90" x="179" y="30"/>
          <operator activated="true" class="generate_attributes" compatibility="5.0.8" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="120">
            <list key="function_descriptions">
              <parameter key="Sum" value="a1+a2"/>
            </list>
          </operator>
          <operator activated="true" class="r:execute_script_r" compatibility="5.0.1" expanded="true" height="76" name="Execute Script (R)" width="90" x="313" y="30">
            <parameter key="script" value="x &lt;- as.data.frame(c(data, data[1] + data[2]))&#10;colnames(x)[6] = &quot;Sum&quot;"/>
            <enumeration key="inputs">
              <parameter key="name_of_variable" value="data"/>
            </enumeration>
            <list key="results">
              <parameter key="x" value="Data Table"/>
            </list>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Execute Script (R)" to_port="input 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 2"/>
          <connect from_op="Execute Script (R)" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="72"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Cheers,
    Ingo
Sign In or Register to comment.