Keep original attributes and ignore attributes

Legacy User · July 2008

Please add control whether the operators remove the original columns from the example set.

For example , in experiment ExampleSource-Normalization-PrincipalComponentGenerator, I would like to click some checkbox and see in the output all the columns:
original attributes, z-transformed attributes and PCs - all in one table.

Also, add an ability to mark a column "inactive" or "pass-thru", so that is shows in the example set, but is ignored by the operators, like the id type does now.

Thanks!

IngoRM · July 2008

Hi Victor,

thanks for sending in these suggestions. We intend to revise some of the preprocessing operators anyway to better support views and will have the "original attributes" request in mind for that.

For example , in experiment ExampleSource-Normalization-PrincipalComponentGenerator, I would like to click some checkbox and see in the output all the columns:
original attributes, z-transformed attributes and PCs - all in one table.

Of course, you could achieve this by first multiplying the data set, materializing the copy, applying the preprocessing and joining them back. Here is the basic idea:


<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function"	value="sum classification"/>
    </operator>
    <operator name="IdTagging" class="IdTagging">
    </operator>
    <operator name="IOMultiplier" class="IOMultiplier">
        <parameter key="io_object"	value="ExampleSet"/>
    </operator>
    <operator name="MaterializeDataInMemory" class="MaterializeDataInMemory">
    </operator>
    <operator name="Normalization" class="Normalization">
    </operator>
    <operator name="ExampleSetJoin" class="ExampleSetJoin">
        <parameter key="remove_double_attributes"	value="false"/>
    </operator>
</operator>

Of course, a simple parameter would be much easier

Also, add an ability to mark a column "inactive" or "pass-thru", so that is shows in the example set, but is ignored by the operators, like the id type does now.

This is actually already possible: just change the role of the attribute with the ChangeAttributeRole operator to something arbitrary like "pass-through" and it will not be regarded by the learner. Here is an example:


<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function"	value="sum"/>
    </operator>
    <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
        <parameter key="name"	value="att1"/>
        <parameter key="target_role"	value="pass-through"/>
    </operator>
    <operator name="LinearRegression" class="LinearRegression">
    </operator>
</operator>

As you can see, the first attribute changed to role "pass-throgh" is not used by the model.

Cheers,
Ingo

IngoRM · July 2008

By the way: I just splitted the topic from the general thread to give this one here a new topic headline.

Cheers,
Ingo

keith · June 2009

I ran into this annoyance again today, and so wanted to bump this old feature request and see if it's going to make the cut in the near future. Related to that is a secondary feature request that would complement the original, and a possible bug in the existing functionality.

Feature request:

I'd like to be able to mark multiple attributes to be ignored by learners, but kept in the example set. I can do this for a single attribute using a user-defined role, but if I have multiple attributes I want to ignore, I need to create a unique role for each one, as you can't assign multiple attributes to a single user-defined role. Since all the attributes I want to disregard all have the same purpose (a.k.a. role), namely to be passed along without being used in learning, it makes more logical sense to me for them all to have a single role.

Possible solutions to this would include:

1) Allow user-defined roles to support multiple attributes assigned (perhaps optionally, so the existing behavior could be retained).
2) If changing how user-defined role behave it too difficult, then having a built-in special role for "ignore" which supports multiple attributes could be an option.

Secondary request:

Allowing ChangeAttributeRole to take a regular expression instead of just a string would make this extra useful. You could set multiple attributes to "ignore" in one step.

Possible bug:

Also, I noticed that if you change one attribute's role to a user-defined role, and then change a second attribute's role to the same user-defined role, the first attribute apparently disappears from the example set. I'm not sure if this is a bug or intended behavior. If the latter, I'd ask why it was designed that way. Seems like changing an attribute's metadata (role) shouldn't lead to accidental deletion of another attribute.

Thanks,
Keith

land · June 2009

Hi Keith,
thank you for your suggestion, we will add a feature like that in the upcoming version.
There are some reasons for not allowing multiple attributes having the same rule, for example this is needed to ensure, that a second model applier removes the already existing prediction attribute before creating a new one. Nevertheless, the solution will fulfill your needs.

For the time being, I can present you a solution, which should work already. In the process below I used a FeatureIteration for changing multiple attributes to an ignore role using regular expressions.

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function"	value="sum classification"/>
        <parameter key="number_examples"	value="200"/>
        <parameter key="number_of_attributes"	value="6"/>
    </operator>
    <operator name="FeatureIterator" class="FeatureIterator" expanded="yes">
        <parameter key="filter"	value="att1|att2"/>
        <parameter key="work_on_input"	value="false"/>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name"	value="%{loop_feature}"/>
            <parameter key="target_role"	value="ignore%{a}"/>
        </operator>
    </operator>
</operator>

Greetings,
Sebastian

keith · June 2009

Thanks, Sebastian. Glad to hear that the request will make it into a future version of RM. I see now that there are cases where only allowing one attribute to a specific role (like prediction) would make sense, but I think having some roles support multiple attributes will make sense, too. I figured that you could use a macro to loop through a set of variables to create custom ignore roles for each one. It's a good workaround.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Keep original attributes and ignore attributes

Answers