How to compare data before and after missing values handling?

EdisonLeeEdisonLee Member Posts: 3 Contributor I
edited July 25 in Help

Dear everyone, 

 

I'm learning RapidMiner using a NBA dataset from data.world. I noticed that there are missing data in the 3P% column. The way I filterd out these 11 rows was clicking missing_attritubes in the top-right. 

螢幕快照 2018-02-24 14.29.50.png

 

So I used Raplace Missing Values to set missing data to 0. The process worked successfully but what I want to know is: How could I show only these 11 rows after replacing missing to 0? Because after replacing, I can't filter data by selecting missing_attritubes. 

 

Can anyone help me on this case? I've been stucked for several days... Do I need to do any change in my process or there are other solutions? 

 

My process: 

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve" width="90" x="45" y="34">
<parameter key="repository_entry" value="//PredictNBARookie/Data/nba_logreg"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<operator activated="true" class="replace_missing_values" compatibility="8.0.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="179" y="34">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="3P%"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="default" value="zero"/>
<list key="columns"/>
</operator>
</process>

Thanks in advance!

Best, Lee

Best Answer

  • lionelderkrikorlionelderkrikor Posts: 757   Unicorn
    Solution Accepted

    Hi @EdisonLee,

     

    I used the Generate Attribute operator to create a copy of your attribute 3P% named 3P% back_up.

    and then I used the Join Operator to join this created attribute to your dataset.

    Here the results after filtering : 

    NBA_missing_value.pnglolol

    You can find the process here : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\NBA_missing_values\nba_logreg.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Name.true.polynominal.attribute"/>
    <parameter key="1" value="GP.true.integer.attribute"/>
    <parameter key="2" value="MIN.true.real.attribute"/>
    <parameter key="3" value="PTS.true.real.attribute"/>
    <parameter key="4" value="FGM.true.real.attribute"/>
    <parameter key="5" value="FGA.true.real.attribute"/>
    <parameter key="6" value="FG%.true.real.attribute"/>
    <parameter key="7" value="3P Made.true.real.attribute"/>
    <parameter key="8" value="3PA.true.real.attribute"/>
    <parameter key="9" value="3P%.true.real.attribute"/>
    <parameter key="10" value="FTM.true.real.attribute"/>
    <parameter key="11" value="FTA.true.real.attribute"/>
    <parameter key="12" value="FT%.true.real.attribute"/>
    <parameter key="13" value="OREB.true.real.attribute"/>
    <parameter key="14" value="DREB.true.real.attribute"/>
    <parameter key="15" value="REB.true.real.attribute"/>
    <parameter key="16" value="AST.true.real.attribute"/>
    <parameter key="17" value="STL.true.real.attribute"/>
    <parameter key="18" value="BLK.true.real.attribute"/>
    <parameter key="19" value="TOV.true.real.attribute"/>
    <parameter key="20" value="TARGET_5Yrs.true.real.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="replace_missing_values" compatibility="8.1.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="246" y="34">
    <list key="columns">
    <parameter key="3P%" value="zero"/>
    </list>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID" width="90" x="648" y="34"/>
    <operator activated="true" class="generate_attributes" compatibility="8.1.000" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="187">
    <list key="function_descriptions">
    <parameter key="3P%_back_up" value="[3P%]"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="187">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="3P%_back_up"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="715" y="187"/>
    <operator activated="true" class="concurrency:join" compatibility="8.1.000" expanded="true" height="82" name="Join" width="90" x="849" y="34">
    <list key="key_attributes"/>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
    <connect from_op="Replace Missing Values" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Replace Missing Values" from_port="original" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
    <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Join" from_port="join" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Does this process answer to your need ?

     

    Regards,

     

    Lionel

Answers

  • EdisonLeeEdisonLee Member Posts: 3 Contributor I

    Hi @lionelderkrikor

     

    Thank you for helping me. This is a very nice way to achieve my goal. I can easily understand how you did that. But I don't know why I couldn't let your process run on my computer. How should I connect operators? 

    螢幕快照 2018-02-24 19.37.03.png

     

    Thanks, 

    Lee

     

     

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 757   Unicorn

    HI @EdisonLee,

     

    It's weird, it's seems that the Join operator is considered as "deprecated" by RapidMiner.

    Try the following operations : 

     - Delete this Join operator.

     - Search the Join operator thanks to the operator search box.

     - Drag and drop the Join operator in the process window.

     - Connect manually the Join operator to the two Generate ID operators.

     

    I hope it helps,

     

    Best regards,

     

    Lionel

     

     

    EdisonLee
  • EdisonLeeEdisonLee Member Posts: 3 Contributor I

    Dear @lionelderkrikor

     

    The process worked after I followed your instructions. Your solution really solves my question. Thanks again to give me different thought to do data processing in RapidMiner. :smileyhappy:

     

    Best Regards, 

    Lee

    sgenzersunnyallionelderkrikor
Sign In or Register to comment.