"RM 4.3 Feature Generation problem"

keithkeith Member Posts: 157 Maven
edited May 2019 in Help

I've installed the new RM 4.3 EE release, but I am having problems with the Feature Generation not recognizing generated features when used in later steps.

This example adds one to the first attribute (successfully) to generate a new "plusone" attribute.  Then, it tries to use "plusone" in a subsequent step, but RM returns an error that "plusone" doesn't exist.
<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="attributes_lower_bound" value="0.0"/>
        <parameter key="attributes_upper_bound" value="1.0"/>
        <parameter key="target_function" value="random"/>
    </operator>
    <operator name="FeatureGeneration" class="FeatureGeneration">
        <list key="functions">
          <parameter key="plusone" value="+(att1,const[1]())"/>
          <parameter key="nextval" value="+(plusone,att2)"/>
        </list>
        <parameter key="keep_all" value="true"/>
    </operator>
</operator>
I then tried splitting the computation up across two FeatureGeneration nodes, but it yields the same error:

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="attributes_lower_bound" value="0.0"/>
        <parameter key="attributes_upper_bound" value="1.0"/>
        <parameter key="target_function" value="random"/>
    </operator>
    <operator name="FeatureGeneration" class="FeatureGeneration">
        <list key="functions">
          <parameter key="plusone" value="+(att1,const[1]())"/>
        </list>
        <parameter key="keep_all" value="true"/>
    </operator>
    <operator name="FeatureGeneration (2)" class="FeatureGeneration">
        <list key="functions">
          <parameter key="nextval" value="+(plusone,att2)"/>
        </list>
        <parameter key="keep_all" value="true"/>
    </operator>
</operator>
This appears to be a regression from 4.2, as I had a working process in RM 4.2 that now fails.  Also note that I had to use the prefix notation to get the computation to work, not the infix notation that is supposed to be in RM 4.3.  Either my installation is messed up (perhaps from an incomplete uninstall?), or there's a bug in RM 4.3. 

Thanks,
Keith

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Keith,

    you are right: we had to remove the feature that freshly created attributes can be directly re-used from the FeatureGeneration operator since it unfortunately caused bugs in other settings. Since it was quite unpredictable (even for us who always work on predictions  ;)  )in which cases everything works and in which cases not, we decided to remove this feature.

    This was also supported by the decision to create a new operator "AttributeConstruction" which should now be preferred for the construction of new attributes instead of the FeatureGeneration operator. This new operator "AttributeConstruction" also is the one which supports infix formulas and nicer constants (just "1" instead of "const[1]()") as well as many new functions - including a very nice if-function, e.g.

    if (attribute1 > 5, sin(attribute2), cos(attribute3))

    which will create a new attribute with the value sin(attribute2) if the value of attribute1 is larger than 5 and cos(attribute3) otherwise. Even nominal values are supported in this if-statement, e.g.

    if (attribute1 == "dog", attribute2 + attribute3, 42)


    So why did we keep the old FeatureGeneration operator at all (and it is not even marked as deprecated)? The reason is simple: it is faster on real large datasets and so we decided to keep it but we unfortunately had to remove the re-use-just-created-attributes functionality.

    Hope that clarifies things about feature construction a bit.

    Cheers,
    Ingo
  • keithkeith Member Posts: 157 Maven
    Thanks for the clarification, Ingo.  That makes sense, although it's unfortunate that backward compatibility couldn't be maintained.  I now have to rewrite several existing processes that no longer run properly under RM 4.3. :(
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hello Keith,

    yes, that's a PITA, sorry about that. If you are not yet have rewritten all of your processes, we could also try to include the old functionality of the old operator FeatureGeneration under a new name, e.g. "FeatureGenerationDeprecated" and deliver this with the next EE update. Then it is simply a matter of replacing all "FeatureGeneration" with "FeatureGenerationDeprecated" which might be easier. Of course, the deprecated operator will be removed for some version in the future but it would give you some more time to update your processes. Please let me know if this would be useful, then I would ask one of our developers to add this "new" (old) operator.

    Cheers,
    Ingo
  • keithkeith Member Posts: 157 Maven
    Thanks for the offer, Ingo.  At the moment, I'll probably just bite the bullet and rewrite my processes now.  I'd much rather have you guys working on new features!  But if its possible in the future to preserve some modicum of backward compatibility when making other changes, it would be appreciated.  :-)

    Keith
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    But if its possible in the future to preserve some modicum of backward compatibility when making other changes, it would be appreciated.
    We will do our best  :)

    Cheers,
    Ingo
Sign In or Register to comment.