Options

Sign parity / calculated data to weight

qwertz2qwertz2 Member Posts: 49 Guru
edited November 2018 in Help

Dear community,

I am looking for a hint on how to realize the following algorithm in Rapidminer:

Given is an example set with a label and a few attributes:

label att1 att2
1 2 -1
3 2 -2
-1 1 -3


Next I want to calculate sign parity (-> ratio of identical sign):
Sign parity label / att1 => +/+ and +/+ and -/+ divided by 3 => 0,66
Sign parity label / att2 => +/- and +/- and -/- divided by 3 => 0,33


Finally, the results shall be assigned as weights to each attribute.
Weight att1 = 0,66
Weight att2 = 0,33


I managed to calculate sign parity so far by performing a simple statement (e.g. label * att1 >= 0) which I can loop through all examples and then divide by the number of examples. But how to transfer this back to weights?



Best regards
Sachs
Tagged:

Answers

  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    The Set Role operator let's you select an attribute column and set it to weight. Just select "weight" in the drop down menu and RapidMiner will recongize it as a weight. 

  • Options
    qwertz2qwertz2 Member Posts: 49 Guru
    Hi Thomas,

    Thank you for your input. The set role operator allows to define one attribute as weight. This will give me one weight per example.

    Contrary to this I want to have a weight for each attribute (based on how often the sign of each exampel equal the label example).


    Kind regards
    Sachs
  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Ok, I see so label = 0.66att1 + 0.33att2?  Will the sum of the attributes equal 1? Did you try the Weights to Data or Data to Weights operator?

  • Options
    qwertz2qwertz2 Member Posts: 49 Guru

     

    Hi Thomas,

     

    Maybe my process description was a bit misleading. I try again in other words:

     

    1) Determine the weight for each attribute.

    This is done by comparing the sign of each example in an attribute with the sign of the label's examples. Then the overall ratio shall be computed. So that I get a statement like 75% of the examples of label and attX have the same sign.

     

    2) Assign weight to attribute.

    The calculated values (e.g. 75% for attX) shall then be assigned as weights to the corresponding attributes.

    (The final step would be to select top n attributes with "select by weights" operator.)

     

    The "data to weight" operator sounded good but actually it does nothing else that assigning a weight of "1" to each attribute and there is no way to feed in the determined weight.

     

     

    Best regards

    Sachs

     

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Why not do this via the Generate Attributes operator?

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    qwertz2qwertz2 Member Posts: 49 Guru

    Hi Brian,

     

    Thank you for trying to help! Your post came just a second after my last one, where I tried to give a better description of what the result should be.

     

    The generate attributes operator is indeed what I use to do the comparison on the examples (label n * att1 n >=0). But how to accumulate the results and transform to a weight of the ATTRIBUTE?

     

     

    Kind regards

    Sachs

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    A simple Aggregate using the average function should do the trick after that, once you have the values for every example, which will give you one overall value per attribute.  Then if you want you can transpose the resulting data to get a table of overall values per attribute (one attribute being each example in the transposed data) which can then be sorted and the top N can be selected. 

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    qwertz2qwertz2 Member Posts: 49 Guru
    Hi Brian,

    Thank you for your contribution. In my attempt to implement your suggestion the aggregate operator with average function on a generated attribute does the job to calculate the desired value. I also can imagine how transpose will look like. But currently I am stuck in the middle of this process.

    Aggregate does now calculate the "weight" for att1 but the operator's result still needs to be moved to the last example of att1. Only if in the end the weights of all attributes are in the same example row I can start with transpose.


    Best regards
    Sachs
  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Perhaps if you can post a small dataset with some examples and your process then it would be easier to try to work this through?  

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    qwertz2qwertz2 Member Posts: 49 Guru

     

    Hi Brian,

     

    Here is a piece of code that calculates what I want to have as weights. The point where I am struggling now is to use this information in order to filter the original attributes' list.

     

    Best regards

    Sachs

     

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="7.5.000" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
    <parameter key="number_examples" value="75"/>
    <parameter key="number_of_attributes" value="10"/>
    </operator>
    <operator activated="true" class="concurrency:loop_attributes" compatibility="7.5.000" expanded="true" height="103" name="Loop Attributes" width="90" x="179" y="34">
    <parameter key="include_special_attributes" value="true"/>
    <process expanded="true">
    <operator activated="true" class="generate_attributes" compatibility="7.5.000" expanded="true" height="82" name="Generate Attributes" width="90" x="45" y="34">
    <list key="function_descriptions">
    <parameter key="%{loop_attribute}_weight" value="if([label]*eval(%{loop_attribute})&gt;=0,1,0)"/>
    </list>
    </operator>
    <operator activated="true" class="aggregate" compatibility="7.5.000" expanded="true" height="82" name="Aggregate" width="90" x="246" y="34">
    <list key="aggregation_attributes">
    <parameter key="%{loop_attribute}_weight" value="average"/>
    </list>
    </operator>
    <connect from_port="input 1" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_port="output 2"/>
    <connect from_op="Aggregate" from_port="original" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    <portSpacing port="sink_output 3" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Loop Attributes" to_port="input 1"/>
    <connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
Sign In or Register to comment.