RapidMiner

Impute Missing Values by KNN

Contributor II

Impute Missing Values by KNN

Hi Experts,

 

I walked through the operator of 'Impute Missing Values' that the tutorial is using K-NN scheme, and the configuration of parameters with ticked "iterate" and "learn on complete cases". May I know the default of this parameter is using K-NN scheme for imputation?

 

Thanks,

Derek

2 REPLIES
Highlighted
Elite III

Re: Impute Missing Values by KNN

Hi Derek,

 

For the tutorial process kNN with a default of 1 is useful because kNN simply selects the value from the nearest record (using distance measures) to the missing value.  It's a pretty logical choice for default.  
However, you are not limited to only kNN.  Here is an example using a Decision Tree for nominal value attributes and a Neural Network for numerical attributes.  

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.5.001" expanded="true" height="68" name="Labor-Negotiations" width="90" x="112" y="85">
        <parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="7.5.001" expanded="true" height="103" name="Multiply" width="90" x="246" y="85"/>
      <operator activated="true" class="materialize_data" compatibility="7.5.001" expanded="true" height="82" name="DT then NN" width="90" x="380" y="34"/>
      <operator activated="true" class="materialize_data" compatibility="7.5.001" expanded="true" height="82" name="kNN" width="90" x="447" y="187"/>
      <operator activated="true" class="impute_missing_values" compatibility="7.3.001" expanded="true" height="68" name="Impute Missing Values" width="90" x="514" y="34">
        <parameter key="attribute_filter_type" value="value_type"/>
        <parameter key="value_type" value="nominal"/>
        <process expanded="true">
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="7.5.001" expanded="true" height="82" name="Decision Tree" width="90" x="380" y="34"/>
          <connect from_port="example set source" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_port="model sink"/>
          <portSpacing port="source_example set source" spacing="0"/>
          <portSpacing port="sink_model sink" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="impute_missing_values" compatibility="7.3.001" expanded="true" height="68" name="Impute Missing Values (2)" width="90" x="648" y="34">
        <parameter key="attribute_filter_type" value="value_type"/>
        <parameter key="value_type" value="numeric"/>
        <process expanded="true">
          <operator activated="true" class="neural_net" compatibility="7.5.001" expanded="true" height="82" name="Neural Net" width="90" x="179" y="34">
            <list key="hidden_layers"/>
          </operator>
          <connect from_port="example set source" to_op="Neural Net" to_port="training set"/>
          <connect from_op="Neural Net" from_port="model" to_port="model sink"/>
          <portSpacing port="source_example set source" spacing="0"/>
          <portSpacing port="sink_model sink" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="impute_missing_values" compatibility="7.3.001" expanded="true" height="68" name="Impute Missing Values (3)" width="90" x="581" y="187">
        <parameter key="value_type" value="nominal"/>
        <process expanded="true">
          <operator activated="true" class="k_nn" compatibility="7.5.001" expanded="true" height="82" name="k-NN" width="90" x="112" y="34"/>
          <connect from_port="example set source" to_op="k-NN" to_port="training set"/>
          <connect from_op="k-NN" from_port="model" to_port="model sink"/>
          <portSpacing port="source_example set source" spacing="0"/>
          <portSpacing port="sink_model sink" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Labor-Negotiations" from_port="output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="DT then NN" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 2" to_op="kNN" to_port="example set input"/>
      <connect from_op="DT then NN" from_port="example set output" to_op="Impute Missing Values" to_port="example set in"/>
      <connect from_op="kNN" from_port="example set output" to_op="Impute Missing Values (3)" to_port="example set in"/>
      <connect from_op="Impute Missing Values" from_port="example set out" to_op="Impute Missing Values (2)" to_port="example set in"/>
      <connect from_op="Impute Missing Values (2)" from_port="example set out" to_port="result 1"/>
      <connect from_op="Impute Missing Values (3)" from_port="example set out" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

 

 

 

 

 

 

 

-- Training, Consulting, Sales in China, Hong Kong & Taiwan --
www.RapidMinerChina.com
Contributor II

Re: Impute Missing Values by KNN

When I tried to apply this operator by using decision tree or knn, it also showed the same error message "Missing attributes: Input ExampleSet has no attributes. Learning schemes cannot be applied without at least one valide attribute." May I know if I missed anything to apply these algorithms?

 

Thanks,
Derek