Options

What is the correct way to replace all instances of 999 in data with empty?

martynsmartyns Member Posts: 15 Maven
edited November 2018 in Help
I have a dataset where all missing values are represented by 999.

If I try to replace them with:
<operator name="FeatureIterator" class="FeatureIterator" expanded="yes">
        <parameter key="work_on_input" value="false"/>
        <operator name="Mapping" class="Mapping">
            <parameter key="attributes" value="%{loop_feature}"/>
            <list key="value_mappings">
            </list>
            <parameter key="replace_what" value="999"/>
            <parameter key="replace_by" value="?"/>
        </operator>
    </operator>
then everything goes horribly wrong and when I try to send the data through a model it fails drastically in terms of the nominal variables.

If I set the filter to numeric only, then it changes the order of the variable list which messes up the model application in that there is an error:
May 28, 2009 10:12:33 AM: [Warning] W-J48: The order of attributes is not equal for the training and the application example set. This might lead to problems for some models.

A great helper on the list suggested that 999 should not input a replace_by value but then nothing appears to happen at all.

I have unticked work on input as it then seems to pass along the modified example set to the model applier further along. Should I be ticking work on input and placing the model applier differently?

So, what is the correct way to replace all values of 999 in a dataset with empty or blank values?
And how about just for numeric values?

Thanks!

Answers

  • Options
    haddockhaddock Member Posts: 849 Maven
    G'Day!

    Bit long winded, but this does it....

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random"/>
        </operator>
        <operator name="SetData" class="SetData">
            <parameter key="attribute_name" value="att1"/>
            <parameter key="example_index" value="1"/>
            <parameter key="value" value="999"/>
        </operator>
        <operator name="Numerical2FormattedNominal" class="Numerical2FormattedNominal">
        </operator>
        <operator name="Replace" class="Replace">
            <parameter key="attributes" value=".*"/>
            <parameter key="replace_what" value="999"/>
        </operator>
        <operator name="NominalNumbers2Numerical" class="NominalNumbers2Numerical">
        </operator>
        <operator name="Numerical2Real" class="Numerical2Real">
        </operator>
    </operator>
Sign In or Register to comment.