Options

Keep original attributes and ignore attributes

Legacy UserLegacy User Member Posts: 0 Newbie
edited June 2019 in Help
Please add control whether the operators remove the original columns from the example set.

For example , in experiment ExampleSource-Normalization-PrincipalComponentGenerator, I would like to click some checkbox and see in the output all the columns:
original attributes, z-transformed attributes and PCs - all in one table.

Also, add an ability to mark a column "inactive" or "pass-thru", so that is shows in the example set, but is ignored by the operators, like the id type does now.

Thanks!

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Victor,

    thanks for sending in these suggestions. We intend to revise some of the preprocessing operators anyway to better support views and will have the "original attributes" request in mind for that.

    For example , in experiment ExampleSource-Normalization-PrincipalComponentGenerator, I would like to click some checkbox and see in the output all the columns:
    original attributes, z-transformed attributes and PCs - all in one table.
    Of course, you could achieve this by first multiplying the data set, materializing the copy, applying the preprocessing and joining them back. Here is the basic idea:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="sum classification"/>
        </operator>
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="IOMultiplier" class="IOMultiplier">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="MaterializeDataInMemory" class="MaterializeDataInMemory">
        </operator>
        <operator name="Normalization" class="Normalization">
        </operator>
        <operator name="ExampleSetJoin" class="ExampleSetJoin">
            <parameter key="remove_double_attributes" value="false"/>
        </operator>
    </operator>
    Of course, a simple parameter would be much easier  ;)

    Also, add an ability to mark a column "inactive" or "pass-thru", so that is shows in the example set, but is ignored by the operators, like the id type does now.
    This is actually already possible: just change the role of the attribute with the ChangeAttributeRole operator to something arbitrary like "pass-through" and it will not be regarded by the learner. Here is an example:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="sum"/>
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="att1"/>
            <parameter key="target_role" value="pass-through"/>
        </operator>
        <operator name="LinearRegression" class="LinearRegression">
        </operator>
    </operator>
    As you can see, the first attribute changed to role "pass-throgh" is not used by the model.

    Cheers,
    Ingo
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    By the way: I just splitted the topic from the general thread to give this one here a new topic headline.

    Cheers,
    Ingo
  • Options
    keithkeith Member Posts: 157 Maven
    I ran into this annoyance again today, and so wanted to bump this old feature request and see if it's going to make the cut in the near future.  Related to that is a secondary feature request that would complement the original, and a possible bug in the existing functionality.

    Feature request:

    I'd like to be able to mark multiple attributes to be ignored by learners, but kept in the example set.  I can do this for a single attribute using a user-defined role, but if I have multiple attributes I want to ignore, I need to create a unique role for each one, as you can't assign multiple attributes to a single user-defined role.  Since all the attributes I want to disregard all have the same purpose (a.k.a. role), namely to be passed along without being used in learning, it makes more logical sense to me for them all to have a single role.

    Possible solutions to this would include:

    1) Allow user-defined roles to support multiple attributes assigned (perhaps optionally, so the existing behavior could be retained).
    2) If changing how user-defined role behave it too difficult, then having a built-in special role for "ignore" which supports multiple attributes could be an option.

    Secondary request: 

    Allowing ChangeAttributeRole to take a regular expression instead of just a string would make this extra useful.  You could set multiple attributes to "ignore" in one step.

    Possible bug: 

    Also, I noticed that if you change one attribute's role to a user-defined role, and then change a second attribute's role to the same user-defined role, the first attribute apparently disappears from the example set.  I'm not sure if this is a bug or intended behavior.  If the latter, I'd ask why it was designed that way.  Seems like changing an attribute's metadata (role) shouldn't lead to accidental deletion of another attribute.

    Thanks,
    Keith
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Keith,
    thank you for your suggestion, we will add a feature like that in the upcoming version.
    There are some reasons for not allowing multiple attributes having the same rule, for example this is needed to ensure, that a second model applier removes the already existing prediction attribute before creating a new one. Nevertheless, the solution will fulfill your needs.

    For the time being, I can present you a solution, which should work already. In the process below I used a FeatureIteration for changing multiple attributes to an ignore role using regular expressions.
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="sum classification"/>
            <parameter key="number_examples" value="200"/>
            <parameter key="number_of_attributes" value="6"/>
        </operator>
        <operator name="FeatureIterator" class="FeatureIterator" expanded="yes">
            <parameter key="filter" value="att1|att2"/>
            <parameter key="work_on_input" value="false"/>
            <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
                <parameter key="name" value="%{loop_feature}"/>
                <parameter key="target_role" value="ignore%{a}"/>
            </operator>
        </operator>
    </operator>
    Greetings,
      Sebastian
  • Options
    keithkeith Member Posts: 157 Maven
    Thanks, Sebastian.  Glad to hear that the request will make it into a future version of RM.  I see now that there are cases where only allowing one attribute to a specific role (like prediction) would make sense, but I think having some roles support multiple attributes will make sense, too.  I figured that you could use a macro to loop through a set of variables to create custom ignore roles for each one.  It's a good workaround.
Sign In or Register to comment.