RAPIDMINER 9.7 BETA ANNOUNCEMENT

The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!

CLICK HERE TO DOWNLOAD

How to compare data before and after missing values handling?

EdisonLeeEdisonLee Member Posts: 3 Contributor I
edited July 2019 in Help

Dear everyone, 

 

I'm learning RapidMiner using a NBA dataset from data.world. I noticed that there are missing data in the 3P% column. The way I filterd out these 11 rows was clicking missing_attritubes in the top-right. 

螢幕快照 2018-02-24 14.29.50.png

 

So I used Raplace Missing Values to set missing data to 0. The process worked successfully but what I want to know is: How could I show only these 11 rows after replacing missing to 0? Because after replacing, I can't filter data by selecting missing_attritubes. 

 

Can anyone help me on this case? I've been stucked for several days... Do I need to do any change in my process or there are other solutions? 

 

My process: 

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve" width="90" x="45" y="34">
<parameter key="repository_entry" value="//PredictNBARookie/Data/nba_logreg"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<operator activated="true" class="replace_missing_values" compatibility="8.0.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="179" y="34">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="3P%"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="default" value="zero"/>
<list key="columns"/>
</operator>
</process>

Thanks in advance!

Best, Lee

Best Answer

  • lionelderkrikorlionelderkrikor Posts: 1,051   Unicorn
    Solution Accepted

    Hi @EdisonLee,

     

    I used the Generate Attribute operator to create a copy of your attribute 3P% named 3P% back_up.

    and then I used the Join Operator to join this created attribute to your dataset.

    Here the results after filtering : 

    NBA_missing_value.pnglolol

    You can find the process here : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\NBA_missing_values\nba_logreg.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Name.true.polynominal.attribute"/>
    <parameter key="1" value="GP.true.integer.attribute"/>
    <parameter key="2" value="MIN.true.real.attribute"/>
    <parameter key="3" value="PTS.true.real.attribute"/>
    <parameter key="4" value="FGM.true.real.attribute"/>
    <parameter key="5" value="FGA.true.real.attribute"/>
    <parameter key="6" value="FG%.true.real.attribute"/>
    <parameter key="7" value="3P Made.true.real.attribute"/>
    <parameter key="8" value="3PA.true.real.attribute"/>
    <parameter key="9" value="3P%.true.real.attribute"/>
    <parameter key="10" value="FTM.true.real.attribute"/>
    <parameter key="11" value="FTA.true.real.attribute"/>
    <parameter key="12" value="FT%.true.real.attribute"/>
    <parameter key="13" value="OREB.true.real.attribute"/>
    <parameter key="14" value="DREB.true.real.attribute"/>
    <parameter key="15" value="REB.true.real.attribute"/>
    <parameter key="16" value="AST.true.real.attribute"/>
    <parameter key="17" value="STL.true.real.attribute"/>
    <parameter key="18" value="BLK.true.real.attribute"/>
    <parameter key="19" value="TOV.true.real.attribute"/>
    <parameter key="20" value="TARGET_5Yrs.true.real.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="replace_missing_values" compatibility="8.1.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="246" y="34">
    <list key="columns">
    <parameter key="3P%" value="zero"/>
    </list>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID" width="90" x="648" y="34"/>
    <operator activated="true" class="generate_attributes" compatibility="8.1.000" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="187">
    <list key="function_descriptions">
    <parameter key="3P%_back_up" value="[3P%]"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="187">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="3P%_back_up"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="715" y="187"/>
    <operator activated="true" class="concurrency:join" compatibility="8.1.000" expanded="true" height="82" name="Join" width="90" x="849" y="34">
    <list key="key_attributes"/>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
    <connect from_op="Replace Missing Values" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Replace Missing Values" from_port="original" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
    <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Join" from_port="join" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Does this process answer to your need ?

     

    Regards,

     

    Lionel

Answers

  • EdisonLeeEdisonLee Member Posts: 3 Contributor I

    Hi @lionelderkrikor

     

    Thank you for helping me. This is a very nice way to achieve my goal. I can easily understand how you did that. But I don't know why I couldn't let your process run on my computer. How should I connect operators? 

    螢幕快照 2018-02-24 19.37.03.png

     

    Thanks, 

    Lee

     

     

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,051   Unicorn

    HI @EdisonLee,

     

    It's weird, it's seems that the Join operator is considered as "deprecated" by RapidMiner.

    Try the following operations : 

     - Delete this Join operator.

     - Search the Join operator thanks to the operator search box.

     - Drag and drop the Join operator in the process window.

     - Connect manually the Join operator to the two Generate ID operators.

     

    I hope it helps,

     

    Best regards,

     

    Lionel

     

     

    EdisonLee
  • EdisonLeeEdisonLee Member Posts: 3 Contributor I

    Dear @lionelderkrikor

     

    The process worked after I followed your instructions. Your solution really solves my question. Thanks again to give me different thought to do data processing in RapidMiner. :smileyhappy:

     

    Best Regards, 

    Lee

    sgenzersunnyallionelderkrikor
Sign In or Register to comment.