"Introducing Missing Values"

cherokeecherokee Member Posts: 82  Guru
edited June 11 in Help
Hi!

How can I mark some value as missing?

I have some data where each feature (real-valued) of each instance has a value. Now I have mark some values as missing. How can I do this in RM?

Best regards,
chero
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,525   Unicorn
    Hi,
    this depends when you want to change the data? Before importing the data into RapidMiner or inside a RapidMiner process?

    Greetings,
      Sebastian
  • cherokeecherokee Member Posts: 82  Guru
    Hi!

    I want to change it inside a RM process.

    Best regards,
    chero
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,525   Unicorn
    Hi,
    I would use the Generate Attributes operator for this. You might define conditions there to decide when a value should become unknown. Unfortunately this isn't encodable directly, but you could enter 0/0 to define a value as missing. Here's a sample process:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="parallelize_main_process" value="true"/>
        <process expanded="true" height="595" width="366">
          <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="98" y="74"/>
          <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes" width="90" x="246" y="75">
            <list key="function_descriptions">
              <parameter key="att1_new" value="if (att1 &gt; 3, 0/0, att1)"/>
            </list>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Greetings,
      Sebastian

  • cherokeecherokee Member Posts: 82  Guru
    Hi,

    this is no solution to my problem. I use some program to encode specific data. This program doesn't allow to set some data as missing furthermore there is no value which i can use to mark some value as missing. The resulting data is written in a specific format.

    I wrote my own RM operator to import this data. Now I have to say something like this: attribute 1 of example 13 is missing. This is NOT codeable in the data.

    The only solution I see is importing it into RM and exporting it to csv (or similar). In that file I could give those attributes specific values which I could recode in RM the way you suggested. Is there an easier way.

    Best regards,
    chero
  • steffensteffen Member Posts: 347  Guru
    Hello

    Since you have written your own input operator, setting missing values should be piece of cake for you ;):

    Example example = <get-example-from-anywhere>;
    example.setValue(<attribute>,Double.NaN);
    Did you try that ? No guarantee for functionality, I am currently not able to test it.

    steffen
  • cherokeecherokee Member Posts: 82  Guru
    Hi!

    I haven't tried it yet. But it seams to be the easiest solution. I'll give it a try in the next days.

    Best regards,
    chero
Sign In or Register to comment.