The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

What is the correct way to replace all instances of 999 in data with empty?

martynsmartyns Member Posts: 15 Maven
edited November 2018 in Help
I have a dataset where all missing values are represented by 999.

If I try to replace them with:
<operator name="FeatureIterator" class="FeatureIterator" expanded="yes">
        <parameter key="work_on_input" value="false"/>
        <operator name="Mapping" class="Mapping">
            <parameter key="attributes" value="%{loop_feature}"/>
            <list key="value_mappings">
            <parameter key="replace_what" value="999"/>
            <parameter key="replace_by" value="?"/>
then everything goes horribly wrong and when I try to send the data through a model it fails drastically in terms of the nominal variables.

If I set the filter to numeric only, then it changes the order of the variable list which messes up the model application in that there is an error:
May 28, 2009 10:12:33 AM: [Warning] W-J48: The order of attributes is not equal for the training and the application example set. This might lead to problems for some models.

A great helper on the list suggested that 999 should not input a replace_by value but then nothing appears to happen at all.

I have unticked work on input as it then seems to pass along the modified example set to the model applier further along. Should I be ticking work on input and placing the model applier differently?

So, what is the correct way to replace all values of 999 in a dataset with empty or blank values?
And how about just for numeric values?



  • Options
    haddockhaddock Member Posts: 849 Maven

    Bit long winded, but this does it....

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random"/>
        <operator name="SetData" class="SetData">
            <parameter key="attribute_name" value="att1"/>
            <parameter key="example_index" value="1"/>
            <parameter key="value" value="999"/>
        <operator name="Numerical2FormattedNominal" class="Numerical2FormattedNominal">
        <operator name="Replace" class="Replace">
            <parameter key="attributes" value=".*"/>
            <parameter key="replace_what" value="999"/>
        <operator name="NominalNumbers2Numerical" class="NominalNumbers2Numerical">
        <operator name="Numerical2Real" class="Numerical2Real">
Sign In or Register to comment.