Options

"join by ID doesn't work..."

Fred12Fred12 Member Posts: 344 Unicorn
edited June 2019 in Help

hi,

I did 2 (3) different outlier detection techniques and want to join my rows by ID, and add the outlier column from the 3 detectors into 1 dataset by doing a JOIN by ID...

<?xml version="1.0" encoding="UTF-8"?><process version="7.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.2.001" expanded="true" height="68" name="Retrieve Master3Klassen_nominal" width="90" x="45" y="30">
<parameter key="repository_entry" value="../../../data/Master3Klassen_nominal"/>
</operator>
<operator activated="true" class="generate_id" compatibility="7.2.001" expanded="true" height="82" name="Generate ID" width="90" x="112" y="165"/>
<operator activated="true" class="normalize" compatibility="7.1.001" expanded="true" height="103" name="Normalize" width="90" x="246" y="165">
<parameter key="method" value="range transformation"/>
<parameter key="min" value="-1.0"/>
</operator>
<operator activated="true" class="multiply" compatibility="7.2.001" expanded="true" height="103" name="Multiply" width="90" x="380" y="136"/>
<operator activated="true" class="anomalydetection:Local Outlier Factor (LOF)" compatibility="2.3.002" expanded="true" height="103" name="Local Outlier Factor (LOF)" width="90" x="581" y="255">
<parameter key="k_min (MinPtsLB)" value="3"/>
<parameter key="k_max (MinPtsUB)" value="30"/>
<parameter key="numerical_measure" value="CamberraDistance"/>
<parameter key="parallelize evaluation process" value="true"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="391">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value="outlier"/>
<parameter key="attributes" value="id|outlier"/>
</operator>
<operator activated="true" class="anomalydetection:k-NN Global Anomaly Score" compatibility="2.3.002" expanded="true" height="103" name="k-NN Global Anomaly Score (2)" width="90" x="514" y="75">
<parameter key="k" value="5"/>
<parameter key="use k-th neighbor distance only (no average)" value="true"/>
<parameter key="numerical_measure" value="CamberraDistance"/>
<parameter key="parallelize evaluation process" value="true"/>
</operator>
<operator activated="true" class="rename" compatibility="7.2.001" expanded="true" height="82" name="Rename" width="90" x="380" y="391">
<parameter key="old_name" value="outlier"/>
<parameter key="new_name" value="outlier2"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="join" compatibility="7.2.001" expanded="true" height="82" name="Join" width="90" x="514" y="544">
<parameter key="remove_double_attributes" value="false"/>
<list key="key_attributes"/>
</operator>
<connect from_op="Retrieve Master3Klassen_nominal" from_port="output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="k-NN Global Anomaly Score (2)" to_port="example set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Local Outlier Factor (LOF)" to_port="example set"/>
<connect from_op="Local Outlier Factor (LOF)" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Local Outlier Factor (LOF)" from_port="model" to_port="result 2"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="k-NN Global Anomaly Score (2)" from_port="example set" to_op="Rename" to_port="example set input"/>
<connect from_op="k-NN Global Anomaly Score (2)" from_port="model" to_port="result 1"/>
<connect from_op="Rename" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="236" y="425">Type your comment</description>
</process>
</operator>
</process>

however, there is always only 1 outlier column produced... I don't understand why, I want them joined by ID but it doesnt work..

Tagged:

Best Answers

  • Options
    bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist
    Solution Accepted

    oh sorry, I now understand what you mean, that is happening, because outlier2 role is "Outlier" and it gets dropped because it is on the right table.

     

    The key attributes will always be taken from the left ExampleSet. Please note that this check for double attributes will only be applied for regular attributes. Special attributes of the right ExampleSet which do not exist in the left ExampleSet will simply be added. If they already exist they are simply skipped.

     

    You can simply attach a "Set Role" and change the Role to Regular for that column

    See the attached process

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    Solution Accepted

    it's working for me, both outlier scores are in the example set - see attached screenshotoutlier.png

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    Fred12Fred12 Member Posts: 344 Unicorn
    Solution Accepted

    you were right, I did not see outlier2 as it was not coloured as I thought ;)

Answers

  • Options
    bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist

    Hello @Fred12

     

    I tried by replaclng your input with the golf dataset.

    Seems to be working fine to me.

    I get all the outputs, can you see if the input has the data you expect

  • Options
    Fred12Fred12 Member Posts: 344 Unicorn

    input has the data, but I tried with golf dataset, however in the join result, there is also only 1 column "outlier",

    but I want "outlier", and "outlier2" for each ID, from the 2 different detect outlier algorithms into one dataset side by side near eachother ;) ... 

    and this doesnt work as I get only 1 column "outlier" ??

  • Options
    Fred12Fred12 Member Posts: 344 Unicorn

    that's weird, I used your process, but it is still only 1 column outlier present, altough I changed the role, but that didn't affect anything, is the set role operator broken?

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist

    Can you post an example process showing this?

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    Fred12Fred12 Member Posts: 344 Unicorn

    here it is:

    I set Role operator after renaming outlier to outlier2:

    <?xml version="1.0" encoding="UTF-8"?><process version="7.2.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.2.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Golf"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="7.2.001" expanded="true" height="82" name="Generate ID" width="90" x="112" y="165"/>
    <operator activated="true" class="normalize" compatibility="7.1.001" expanded="true" height="103" name="Normalize" width="90" x="246" y="165">
    <parameter key="method" value="range transformation"/>
    <parameter key="min" value="-1.0"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.2.001" expanded="true" height="103" name="Multiply" width="90" x="380" y="136"/>
    <operator activated="true" class="anomalydetection:Local Outlier Factor (LOF)" compatibility="2.3.002" expanded="true" height="103" name="Local Outlier Factor (LOF)" width="90" x="581" y="238">
    <parameter key="k_min (MinPtsLB)" value="3"/>
    <parameter key="k_max (MinPtsUB)" value="30"/>
    <parameter key="numerical_measure" value="CamberraDistance"/>
    <parameter key="parallelize evaluation process" value="true"/>
    <parameter key="number of threads" value="8"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="391">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value="outlier"/>
    <parameter key="attributes" value="id|outlier"/>
    </operator>
    <operator activated="true" class="anomalydetection:k-NN Global Anomaly Score" compatibility="2.3.002" expanded="true" height="103" name="k-NN Global Anomaly Score (2)" width="90" x="514" y="75">
    <parameter key="k" value="5"/>
    <parameter key="use k-th neighbor distance only (no average)" value="true"/>
    <parameter key="numerical_measure" value="CamberraDistance"/>
    <parameter key="parallelize evaluation process" value="true"/>
    <parameter key="number of threads" value="8"/>
    </operator>
    <operator activated="true" class="rename" compatibility="7.2.001" expanded="true" height="82" name="Rename" width="90" x="246" y="493">
    <parameter key="old_name" value="outlier"/>
    <parameter key="new_name" value="outlier2"/>
    <list key="rename_additional_attributes"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.2.001" expanded="true" height="82" name="Set Role" width="90" x="380" y="493">
    <parameter key="attribute_name" value="outlier2"/>
    <list key="set_additional_roles">
    <parameter key="outlier2" value="regular"/>
    </list>
    </operator>
    <operator activated="true" class="join" compatibility="7.2.001" expanded="true" height="82" name="Join" width="90" x="581" y="544">
    <parameter key="remove_double_attributes" value="false"/>
    <list key="key_attributes"/>
    <parameter key="keep_both_join_attributes" value="true"/>
    </operator>
    <connect from_op="Retrieve Golf" from_port="output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Normalize" to_port="example set input"/>
    <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="k-NN Global Anomaly Score (2)" to_port="example set"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Local Outlier Factor (LOF)" to_port="example set"/>
    <connect from_op="Local Outlier Factor (LOF)" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Local Outlier Factor (LOF)" from_port="model" to_port="result 2"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="k-NN Global Anomaly Score (2)" from_port="example set" to_op="Rename" to_port="example set input"/>
    <connect from_op="k-NN Global Anomaly Score (2)" from_port="model" to_port="result 1"/>
    <connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Join" from_port="join" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    </process>
    </operator>
    </process>
Sign In or Register to comment.