"Get and set values using Groovy Script"

wesselwessel Member Posts: 537  Guru
edited May 23 in Help
Dear All,

I want to learn how to use groovy script.

How to get example values for a particular attribute?
and how to set example values for a particular attribute?

The example by Ingo, pasted below only shows how to iterate over all attributes and all examples.
This is nice, but what if you only want to iterate over all examples of attribute 1,
calculate the sum, and place the result in attribute 2?
How to do this?

Like you start with:

att1 att2
0.1 NaN
0.2 NaN
0.3 NaN
0.4 NaN

And the result will be:
att1 att2
0.1 0.1
0.2 0.3
0.3 0.6
0.4 1.0

Best regards,

Wesesl

ExampleSet exampleSet = operator.getInput(ExampleSet.class);

exampleSet.recalculateAllAttributeStatistics();

for (Attribute attribute : exampleSet.getAttributes()) {
    double mean = exampleSet.getStatistics(attribute, Statistics.AVERAGE);
    String name = attribute.getName();
    for (Example example : exampleSet) {
        example[name] = example[name] - mean;
    }
}

return exampleSet;

Tagged:

Answers

  • colocolo Member Posts: 236  Guru
    Hi wessel,

    I guess you don't have the tutorial for the extension development at hand. This starts with an example for using the script operator. Since the other existing documentation is very rare, I built a little process for your example (just containing other values). I added a third attribute to demonstrate the creation of new attributes. The whole process is appended to the end of the post, here comes just the content of the script operator:
    import com.rapidminer.tools.Ontology;

    ExampleSet exampleSet = input[0];

    Attributes attributes = exampleSet.getAttributes();
    Attribute att1 = attributes.get("att1");
    Attribute att2 = attributes.get("att2");

    // generate additional attribute
    Attribute att3 = AttributeFactory.createAttribute("att3", Ontology.REAL);
    // add new attribute to example set
    attributes.addRegular(att3);
    // insert new column into example set's data table
    exampleSet.getExampleTable().addAttribute(att3);

    double sum = 0.0;

    for (Example example : exampleSet) {
    double currentValue = example.getValue(att1);
    sum += currentValue;
    example.setValue(att2, sum);
    example.setValue(att3, sum);
    }

    return exampleSet;
    I hope this makes things a bit clearer.

    Best regards
    Matthias
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
        <process expanded="true" height="116" width="547">
          <operator activated="true" class="generate_data" compatibility="5.1.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_examples" value="5"/>
            <parameter key="number_of_attributes" value="2"/>
            <parameter key="attributes_lower_bound" value="0.0"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.1.008" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="label"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="loop_examples" compatibility="5.1.008" expanded="true" height="76" name="Loop Examples" width="90" x="313" y="30">
            <process expanded="true" height="607" width="740">
              <operator activated="true" class="set_data" compatibility="5.1.008" expanded="true" height="76" name="Set Data" width="90" x="45" y="30">
                <parameter key="example_index" value="%{example}"/>
                <parameter key="attribute_name" value="att2"/>
                <parameter key="value" value="NaN"/>
                <list key="additional_values"/>
              </operator>
              <connect from_port="example set" to_op="Set Data" to_port="example set input"/>
              <connect from_op="Set Data" from_port="example set output" to_port="example set"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="execute_script" compatibility="5.1.008" expanded="true" height="76" name="Execute Script" width="90" x="447" y="30">
            <parameter key="script" value="import com.rapidminer.tools.Ontology;&#10;&#10;ExampleSet exampleSet = input[0];&#10;&#10;Attributes attributes = exampleSet.getAttributes();&#10;Attribute att1 = attributes.get(&quot;att1&quot;);&#10;Attribute att2 = attributes.get(&quot;att2&quot;);&#10;&#10;// generate additional attribute&#10;Attribute att3 = AttributeFactory.createAttribute(&quot;att3&quot;, Ontology.REAL);&#10;// add new attribute to example set&#10;attributes.addRegular(att3);&#10;// insert new column into example set's data table&#10;exampleSet.getExampleTable().addAttribute(att3);&#10;&#10;double sum = 0.0;&#10;&#10;for (Example example : exampleSet) {&#10;&#9;double currentValue = example.getValue(att1);&#10;&#9;sum += currentValue;&#10;&#9;example.setValue(att2, sum);&#10;&#9;example.setValue(att3, sum);&#10;}&#10;&#10;return exampleSet;"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Loop Examples" to_port="example set"/>
          <connect from_op="Loop Examples" from_port="example set" to_op="Execute Script" to_port="input 1"/>
          <connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • dgibbonsdgibbons Member Posts: 5 Contributor II
    Hi all,

    Thank you for the help above, very useful.

    I have a simple script (shown below) to add 300 to each numerical example. However when I run the script the script outputs correctly but also changes all the previous blocks in my process. Is it possible to make the script operator work in one direction only? i.e. not effect previous results in the process?

    Many Thanks,
    David


    ExampleSet exampleSet2 = input[0];


    Attributes attributes = exampleSet2.getAttributes();
    Attribute att2 = attributes.get("Midterm Exam");

    String name = att2.getName();
      for (Example example : exampleSet2) {
     
      example[name] = example[name] + 300;
      }


    return exampleSet2;







    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.011">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.011" expanded="true" name="Process">
        <process expanded="true" height="417" width="480">
          <operator activated="true" class="read_excel" compatibility="5.1.011" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
            <parameter key="excel_file" value="\\SBSSRV\Users\david.gibbons\My Documents\MarkA.xls"/>
            <parameter key="imported_cell_range" value="A1:B13"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Midterm Exam.true.integer.attribute"/>
              <parameter key="1" value="Final Exam.true.integer.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.1.011" expanded="true" height="94" name="Multiply" width="90" x="180" y="30"/>
          <operator activated="true" class="execute_script" compatibility="5.1.011" expanded="true" height="76" name="Script" width="90" x="380" y="210">
            <parameter key="script" value="&#10;&#10;ExampleSet exampleSet2 = input[0];&#10;&#10;&#10;Attributes attributes = exampleSet2.getAttributes();&#10;Attribute att2 = attributes.get(&quot;Midterm Exam&quot;);&#10;&#10;String name = att2.getName();&#9;&#10;  for (Example example : exampleSet2) {&#10;  &#10;  &#9;example[name] = example[name] + 300;&#10;  }&#10;&#10;&#10;return exampleSet2;&#10;&#10;"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_port="result 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Script" to_port="input 1"/>
          <connect from_op="Script" from_port="output 1" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>

  • dgibbonsdgibbons Member Posts: 5 Contributor II
    HI,

    I think that this RapidMiner description which I found:

    "The process logic which RapidMiner uses is not "linear", but recursive. We dont apply operators linearly, one after another."

    explains my query a bit. Could anybody expand on this description please?

    Thanks a lot,
    David
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    If you change the actual data and not just meta-data like attribute names, roles or add and remove attributes, the changes are also reflected to previous data, because by default only the meta-data is copied for each operator, but not the actual data. This is necessary for performance and memory usage reasons.

    You can however create a deep copy of an example set prior to your script with the Materialize operator. That way the changes won't get progagated backwards.

    Best, Marius
  • dgibbonsdgibbons Member Posts: 5 Contributor II
    Thank you Marius, that works. My issue regarding the script operator is [SOLVED].

    That explanation makes sense. I will be more careful with the operators. By any chance, is there a way to turn off the default setting (meta-data only passed) for a process?
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    No, that's not possible.
  • dgibbonsdgibbons Member Posts: 5 Contributor II
    Hello,

    Could you please let me know if it is possible to select a specific instance attribute?

    For example, the first example in the attribute "Midterm Exam".

    Attributes attributes = exampleSet.getAttributes();
    Attribute exam = attributes.get("Mideterm Exam");

    float ff = exam.getElementAt[0];


    Many Thanks.
Sign In or Register to comment.