RapidMiner

Delete Examples with 2 missing attributes

SOLVED
Wisdom logo Registration now open for RapidMiner Wisdom Americas | New Orleans | October 10-12, 2018   Learn More
Highlighted
Contributor I t_liebe
Contributor I

Delete Examples with 2 missing attributes

Hello,

 

I know how to delete missing values of a column in different ways. However, I only want to remove the Examples which have two missing attributes:

 

Size    Item 1     Item 2

1          ?            milk

2         cookie     milk

2           ?              ?

2         cookie     chocolate

2         cookie     crackers

2         cookie     ?

2         cookie     raspberries

 

After that, I would like to combine the two tables to know the percentage of how often cookies and milk occure together and which is the absolute frequency from the occurence of cookie and milk.

How can I use FP-Growth for this?

 

Thank you in advance !

 

 

 

 

 

 

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_excel" compatibility="9.0.002" expanded="true" height="68" name="Read Excel" width="90" x="112" y="136">
        <parameter key="excel_file" value="\\ADS.DLH.DE\LHuser$\LHT\HAM98\U717465\Documents\02_Data_Mining\01_rapidminer\closed_events_q-star.xlsx"/>
        <list key="annotations"/>
        <parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="Event ID.true.polynominal.attribute"/>
          <parameter key="1" value="Event Title.true.polynominal.attribute"/>
          <parameter key="2" value="Event Description.true.polynominal.attribute"/>
          <parameter key="3" value="Event resp\. dept\..true.polynominal.attribute"/>
          <parameter key="4" value="Risk Level.true.polynominal.attribute"/>
          <parameter key="5" value="Severity Level.true.polynominal.attribute"/>
          <parameter key="6" value="Severity Driver.true.polynominal.attribute"/>
          <parameter key="7" value="Closed date event.true.date_time.attribute"/>
          <parameter key="8" value="Total Event Time.true.integer.attribute"/>
          <parameter key="9" value="Total Investigation Time.true.integer.attribute"/>
          <parameter key="10" value="Total Implement\. Time.true.integer.attribute"/>
          <parameter key="11" value="Resp\. for coordination.true.polynominal.attribute"/>
          <parameter key="12" value="Resp\. for investigation.true.polynominal.attribute"/>
          <parameter key="13" value="Source.true.polynominal.attribute"/>
          <parameter key="14" value="Event type.true.polynominal.attribute"/>
          <parameter key="15" value="Investigation type.true.polynominal.attribute"/>
          <parameter key="16" value="Related requirements.true.polynominal.attribute"/>
          <parameter key="17" value="CNQ.true.integer.attribute"/>
          <parameter key="18" value="A/C Reg.true.polynominal.attribute"/>
          <parameter key="19" value="Engine type.true.polynominal.attribute"/>
          <parameter key="20" value="PNR.true.polynominal.attribute"/>
          <parameter key="21" value="Customer/ Operator.true.polynominal.attribute"/>
          <parameter key="22" value="MOR relevant.true.polynominal.attribute"/>
          <parameter key="23" value="Repetitive Event.true.polynominal.attribute"/>
          <parameter key="24" value="Reason for no or discont\. Investigation.true.polynominal.attribute"/>
          <parameter key="25" value="Implemented CA/PA.true.polynominal.attribute"/>
          <parameter key="26" value="Implemented Correction.true.polynominal.attribute"/>
          <parameter key="27" value="Date of report.true.date_time.attribute"/>
          <parameter key="28" value="Coordination closed date.true.date_time.attribute"/>
          <parameter key="29" value="Investigation closed date.true.date_time.attribute"/>
        </list>
        <parameter key="read_not_matching_values_as_missings" value="false"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="9.0.002" expanded="true" height="82" name="Select Attributes (2)" width="90" x="313" y="136">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Risk Level|Severity Level"/>
      </operator>
      <connect from_op="Read Excel" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
2 REPLIES
Unicorn
Unicorn
Solution

Re: Delete Examples with 2 missing attributes

Hello, @t_liebe,

 

Use the Filter Examples operator with the following configuration:

 

Screen Shot 2018-10-08 at 06.32.06.png

Notice that at the bottom, on your left hand, there is a Match all option. You must select it, as it's an AND operator. Otherwise, that will filter data where the records have one or the other attribute as well.

Rodrigo Fuentealba Cartes
Senior Software Developer & Data Scientist at The Pegasus Group Company S. A. - Chile
https://www.pegasus.cl/
Contributor I t_liebe
Contributor I

Re: Delete Examples with 2 missing attributes

Thank you for your quick answer ! Smiley Happy