YAGGA - attribute constructions

danjeharrydanjeharry Member Posts: 20 Contributor II
edited November 2018 in Help
Hey,

How do I recreate the attribute constructions generated by YAGGA onto a new test dataset if some of the attribute constructions are based on generated attributes which are no longer in the original example set? (e.g. gensym100 = Attribute1 + gensym99, but gensym99 is not defined in the attribute construction data).

Thanks.

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    below there is a small sample process which first generates some attributes, then stores their constructions to a file, rereads them and applies them to another dataset. Please be sure to adjust the paths in the Read/Write Constructions operators.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.011">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.011" expanded="true" name="Process">
        <process expanded="true" height="505" width="547">
          <operator activated="true" class="generate_data" compatibility="5.1.011" expanded="true" height="60" name="Generate Data" width="90" x="45" y="75">
            <parameter key="target_function" value="random classification"/>
          </operator>
          <operator activated="true" class="optimize_by_generation_yagga" compatibility="5.1.011" expanded="true" height="94" name="Generate" width="90" x="179" y="75">
            <parameter key="reciprocal_value" value="false"/>
            <process expanded="true" height="527" width="725">
              <operator activated="true" class="naive_bayes" compatibility="5.1.011" expanded="true" height="76" name="Naive Bayes" width="90" x="112" y="30"/>
              <operator activated="true" class="apply_model" compatibility="5.1.011" expanded="true" height="76" name="Apply Model" width="90" x="313" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.1.011" expanded="true" height="76" name="Performance" width="90" x="514" y="30"/>
              <connect from_port="example set source" to_op="Naive Bayes" to_port="training set"/>
              <connect from_op="Naive Bayes" from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_op="Naive Bayes" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="performance sink"/>
              <portSpacing port="source_example set source" spacing="0"/>
              <portSpacing port="sink_performance sink" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="write_constructions" compatibility="5.1.011" expanded="true" height="60" name="Write Constructions" width="90" x="380" y="30">
            <parameter key="attribute_constructions_file" value="C:\Users\mhelf\Documents\tmp\constructions"/>
          </operator>
          <operator activated="true" class="generate_data" compatibility="5.1.011" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="255">
            <parameter key="target_function" value="random classification"/>
          </operator>
          <operator activated="true" class="read_constructions" compatibility="5.1.011" expanded="true" height="60" name="Read Constructions" width="90" x="380" y="255">
            <parameter key="attribute_constructions_file" value="C:\Users\mhelf\Documents\tmp\constructions"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate" to_port="example set in"/>
          <connect from_op="Generate" from_port="example set out" to_op="Write Constructions" to_port="input"/>
          <connect from_op="Generate" from_port="attribute weights out" to_port="result 2"/>
          <connect from_op="Write Constructions" from_port="through" to_port="result 1"/>
          <connect from_op="Generate Data (2)" from_port="output" to_op="Read Constructions" to_port="example set"/>
          <connect from_op="Read Constructions" from_port="example set" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>
  • danjeharrydanjeharry Member Posts: 20 Contributor II
    Thanks for the info Marius, but I set up my process exactly as yours. The issue I'm having is that I have one attribute called gensym100 = att1 and gensym99. The generated example set has att1 but does not have gensym99, which appears to have been generated earlier in the evolutionary process. So when I save the constructions down, gensym99 is not defined, which no longer allows me to generate the correct attributes on a new test data set.
Sign In or Register to comment.