Options

[Solved] Weighting examples

qwertzqwertz Member Posts: 130 Contributor II
edited June 2019 in Help
Dear all,

Does anyone happen to know whether there is a good way to weight examples?
I would like to achieve that newer examples are weighted higher.


ID     att1     att2     weight
a        12       45          1
b        10       27          2
c        33       17          3


I tried to loop over all examples and write the iteration macro into a new generated attribute but that didn't work.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
   <process expanded="true" height="458" width="640">
     <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
     <operator activated="true" class="loop_examples" compatibility="5.2.003" expanded="true" height="76" name="Loop Examples" width="90" x="179" y="30">
       <process expanded="true" height="476" width="640">
         <operator activated="true" class="generate_attributes" compatibility="5.2.003" expanded="true" height="76" name="Generate Attributes" width="90" x="45" y="30">
           <list key="function_descriptions">
             <parameter key="weight" value="%{example}"/>
           </list>
         </operator>
         <connect from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
         <connect from_op="Generate Attributes" from_port="example set output" to_port="example set"/>
         <portSpacing port="source_example set" spacing="0"/>
         <portSpacing port="sink_example set" spacing="0"/>
         <portSpacing port="sink_output 1" spacing="0"/>
       </process>
     </operator>
     <operator activated="true" class="set_role" compatibility="5.2.003" expanded="true" height="76" name="Set Role" width="90" x="313" y="30">
       <parameter key="name" value="weight"/>
       <parameter key="target_role" value="weight"/>
       <list key="set_additional_roles"/>
     </operator>
     <connect from_op="Generate Data" from_port="output" to_op="Loop Examples" to_port="example set"/>
     <connect from_op="Loop Examples" from_port="example set" to_op="Set Role" to_port="example set input"/>
     <connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>


Thank you for sharing your ideas...
Sachs
Tagged:

Answers

  • Options
    Nils_WoehlerNils_Woehler Member Posts: 463 Maven
    Hi,

    you can try to use the Generate ID operator


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="458" width="640">
          <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
          <operator activated="true" class="generate_id" compatibility="5.2.008" expanded="true" height="76" name="Generate ID" width="90" x="179" y="30"/>
          <operator activated="true" class="rename" compatibility="5.2.008" expanded="true" height="76" name="Rename" width="90" x="313" y="30">
            <parameter key="old_name" value="id"/>
            <parameter key="new_name" value="weight"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
            <parameter key="name" value="weight"/>
            <parameter key="target_role" value="weight"/>
            <list key="set_additional_roles"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Best,
    Nils
  • Options
    qwertzqwertz Member Posts: 130 Contributor II

    Hi Nils,

    thank you for your idea. It seams that I was not precise enough in my formulation. The weight is not supposed to be incremented by one for each example but by a value that could be different each time the process is being run.

    So it could also be that weights like this have to be applied:

    ID    att1    att2    weight
    a        12      45          2
    b        10      27          4
    c        33      17          6


    Bye for now & take care
    Sachs
  • Options
    haddockhaddock Member Posts: 849 Maven
    Just add a 'generate attributes" operator to Nils' answer, then you can have whatever you want.

    Best
    H

  • Options
    qwertzqwertz Member Posts: 130 Contributor II


    Ok, that means that I have to create an ID first. In a second step I can generate another attribute then which is a function of ID.
    Thank you very much :)

    Kind regards
    Sachs

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="458" width="882">
          <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
          <operator activated="true" class="generate_id" compatibility="5.2.008" expanded="true" height="76" name="Generate ID" width="90" x="179" y="30"/>
          <operator activated="true" class="rename" compatibility="5.2.008" expanded="true" height="76" name="Rename" width="90" x="313" y="30">
            <parameter key="old_name" value="id"/>
            <parameter key="new_name" value="temp"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="447" y="30">
            <list key="function_descriptions">
              <parameter key="weight" value="temp*2"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="581" y="30">
            <parameter key="name" value="weight"/>
            <parameter key="target_role" value="weight"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="715" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="temp"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    qwertzqwertz Member Posts: 130 Contributor II


    When I generate a new ID the former ID is being removed. Therefore, I have to set another role to the former ID first, generate a new ID, set role of the new ID to weight and finally set role of the former ID back to ID. Just wanted to share that...

    All the best
    Sachs
  • Options
    haddockhaddock Member Posts: 849 Maven
    No, just set the role of your new attribute to weight.

  • Options
    qwertzqwertz Member Posts: 130 Contributor II


    Hi haddock,

    I tried your proposal and found that it works if the former id is a number.
    However, in my data set the id is a date and in this case it doesn't work. No idea why ???

    In the attached sample process represents an implementation of your proposal. --> Result is that "data" id attribute is missing.
    Connect and activate the two "set role" operators as described in my last post and it works.
    Seems to be a bug related to the date type.


    http://datahost.bplaced.net/sample4.xls

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="405" width="1016">
          <operator activated="true" class="read_excel" compatibility="5.2.008" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
            <parameter key="excel_file" value="C:\sample4.xls"/>
            <parameter key="imported_cell_range" value="A1:G100"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="date_format" value="dd.MM.yyyy"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="date.true.date.id"/>
              <parameter key="1" value="a.true.real.attribute"/>
              <parameter key="2" value="b.true.real.attribute"/>
              <parameter key="3" value="c.true.real.attribute"/>
              <parameter key="4" value="d.true.real.attribute"/>
              <parameter key="5" value="e.true.real.attribute"/>
              <parameter key="6" value="f.true.real.attribute"/>
            </list>
          </operator>
          <operator activated="false" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role (2)" width="90" x="179" y="75">
            <parameter key="name" value="date"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="5.2.008" expanded="true" height="76" name="Generate ID" width="90" x="313" y="30"/>
          <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="447" y="30">
            <list key="function_descriptions">
              <parameter key="weight" value="2*id"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="581" y="30">
            <parameter key="name" value="weight"/>
            <parameter key="target_role" value="weight"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="715" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="id"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="false" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role (3)" width="90" x="849" y="75">
            <parameter key="name" value="date"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


    Best regards
    Sachs
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi again,

    Don't want to sound like the Thought Police, so here are some tips towards RM Nirvana.

    1. Treat dates as dates!
    2. Observe Marius' etiquette on questions.
    3. Be careful about bug calling.

    That being said, here's some code.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
       <process expanded="true" height="405" width="1016">
         <operator activated="true" class="read_excel" compatibility="5.2.003" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
           <parameter key="excel_file" value="/home/cjfpainter/Downloads/sample4.xls"/>
           <parameter key="imported_cell_range" value="A1:G100"/>
           <parameter key="first_row_as_names" value="false"/>
           <list key="annotations">
             <parameter key="0" value="Name"/>
           </list>
           <parameter key="date_format" value="dd.MM.yyyy"/>
           <list key="data_set_meta_data_information">
             <parameter key="0" value="date.true.date_time.attribute"/>
             <parameter key="1" value="a.true.numeric.attribute"/>
             <parameter key="2" value="b.true.numeric.attribute"/>
             <parameter key="3" value="c.true.numeric.attribute"/>
             <parameter key="4" value="d.true.numeric.attribute"/>
             <parameter key="5" value="e.true.numeric.attribute"/>
             <parameter key="6" value="f.true.real.attribute"/>
           </list>
         </operator>
         <operator activated="true" class="generate_id" compatibility="5.2.003" expanded="true" height="76" name="Generate ID" width="90" x="179" y="30"/>
         <operator activated="true" class="generate_attributes" compatibility="5.2.003" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="30">
           <list key="function_descriptions">
             <parameter key="weight" value="2*id"/>
           </list>
         </operator>
         <operator activated="true" class="set_role" compatibility="5.2.003" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
           <parameter key="name" value="weight"/>
           <parameter key="target_role" value="weight"/>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="select_attributes" compatibility="5.2.003" expanded="true" height="76" name="Select Attributes" width="90" x="581" y="30">
           <parameter key="attribute_filter_type" value="single"/>
           <parameter key="attribute" value="id"/>
           <parameter key="invert_selection" value="true"/>
           <parameter key="include_special_attributes" value="true"/>
         </operator>
         <connect from_op="Read Excel" from_port="output" to_op="Generate ID" to_port="example set input"/>
         <connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
         <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
         <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
         <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    Best

    H



  • Options
    qwertzqwertz Member Posts: 130 Contributor II

    Hi haddock,

    Your are right, bug calling was probably a little too hasty.

    Referring to the issue again: In my process the date column is classified as type "date" and role "id". Therefore, my understanding is, that it is treated as a date already. Consequently, I don't understand why I cannot have a column which is both, type "date" and role "id" at the same time in the given setup. (Code see my last post).


    All the best
    Sachs

  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi

    Fair enough, you can always declare "date" as an "id" later - that avoids your double id issue, and saves an operator, because you can use the " no double ids " property to advantage, like this..
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
       <process expanded="true" height="440" width="567">
         <operator activated="true" class="read_excel" compatibility="5.2.003" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
           <parameter key="excel_file" value="/home/cjfpainter/Downloads/sample4.xls"/>
           <parameter key="imported_cell_range" value="A1:G100"/>
           <parameter key="first_row_as_names" value="false"/>
           <list key="annotations">
             <parameter key="0" value="Name"/>
           </list>
           <parameter key="date_format" value="dd.MM.yyyy"/>
           <list key="data_set_meta_data_information">
             <parameter key="0" value="date.true.date_time.attribute"/>
             <parameter key="1" value="a.true.numeric.attribute"/>
             <parameter key="2" value="b.true.numeric.attribute"/>
             <parameter key="3" value="c.true.numeric.attribute"/>
             <parameter key="4" value="d.true.numeric.attribute"/>
             <parameter key="5" value="e.true.numeric.attribute"/>
             <parameter key="6" value="f.true.real.attribute"/>
           </list>
         </operator>
         <operator activated="true" class="generate_id" compatibility="5.2.003" expanded="true" height="76" name="Generate ID" width="90" x="179" y="30"/>
         <operator activated="true" class="generate_attributes" compatibility="5.2.003" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="30">
           <list key="function_descriptions">
             <parameter key="weight" value="2*id"/>
           </list>
         </operator>
         <operator activated="true" class="set_role" compatibility="5.2.003" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
           <parameter key="name" value="weight"/>
           <parameter key="target_role" value="weight"/>
           <list key="set_additional_roles">
             <parameter key="date" value="id"/>
           </list>
         </operator>
         <connect from_op="Read Excel" from_port="output" to_op="Generate ID" to_port="example set input"/>
         <connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
         <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
         <connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    I spend most of my time CUDA programming, and am probably a bit obsessed by speed and clarity!

    Best

    H

    PS Ignore ( nearly always ) the warnings, they are only warnings, just press the green j!
    PPS Green if running on RA, Blue on RM.
  • Options
    qwertzqwertz Member Posts: 130 Contributor II

    So it seems to be a kind of a hidden feature that RapidMiner only allows a single ID in the data set and removes the others automatically.


    Thanks & have a nice day
    Sachs
  • Options
    haddockhaddock Member Posts: 849 Maven
    G'Day!

    Indeedy, data doesn't make much sense when it has more than one identity, bit like humans  ;) On the other hand we each contributed to a neat solution, so grouping is cool  8)

    Best

    H
Sign In or Register to comment.