RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

What is the correct way to replace all instances of 999 in data with empty?

martynsmartyns Member Posts: 15  Maven
edited November 2018 in Help
I have a dataset where all missing values are represented by 999.

If I try to replace them with:
<operator name="FeatureIterator" class="FeatureIterator" expanded="yes">
        <parameter key="work_on_input" value="false"/>
        <operator name="Mapping" class="Mapping">
            <parameter key="attributes" value="%{loop_feature}"/>
            <list key="value_mappings">
            </list>
            <parameter key="replace_what" value="999"/>
            <parameter key="replace_by" value="?"/>
        </operator>
    </operator>
then everything goes horribly wrong and when I try to send the data through a model it fails drastically in terms of the nominal variables.

If I set the filter to numeric only, then it changes the order of the variable list which messes up the model application in that there is an error:
May 28, 2009 10:12:33 AM: [Warning] W-J48: The order of attributes is not equal for the training and the application example set. This might lead to problems for some models.

A great helper on the list suggested that 999 should not input a replace_by value but then nothing appears to happen at all.

I have unticked work on input as it then seems to pass along the modified example set to the model applier further along. Should I be ticking work on input and placing the model applier differently?

So, what is the correct way to replace all values of 999 in a dataset with empty or blank values?
And how about just for numeric values?

Thanks!

Answers

  • haddockhaddock Member Posts: 849  Guru
    G'Day!

    Bit long winded, but this does it....

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random"/>
        </operator>
        <operator name="SetData" class="SetData">
            <parameter key="attribute_name" value="att1"/>
            <parameter key="example_index" value="1"/>
            <parameter key="value" value="999"/>
        </operator>
        <operator name="Numerical2FormattedNominal" class="Numerical2FormattedNominal">
        </operator>
        <operator name="Replace" class="Replace">
            <parameter key="attributes" value=".*"/>
            <parameter key="replace_what" value="999"/>
        </operator>
        <operator name="NominalNumbers2Numerical" class="NominalNumbers2Numerical">
        </operator>
        <operator name="Numerical2Real" class="Numerical2Real">
        </operator>
    </operator>
Sign In or Register to comment.