Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"How to get the last row after windowing"

WinKadWinKad Member Posts: 9 Contributor II
edited May 2019 in Help
Hi everybody,
I use the following test data [as you can see, all is in matrix form - (row,column)]:
1,1;1,2;1,3;1,4;1,5
2,1;2,2;2,3;2,4;2,5
3,1;3,2;3,3;3,4;3,5
4,1;4,2;4,3;4,4;4,5
5,1;5,2;5,3;5,4;5,5
6,1;6,2;6,3;6,4;6,5
7,1;7,2;7,3;7,4;7,5
8,1;8,2;8,3;8,4;8,5
9,1;9,2;9,3;9,4;9,5
10,1;10,2;10,3;10,4;10,5

After windowing with window size =3 for processing I want to get the last row of the data after windowing with window size = 2 as feed (unlabel data) for the process.

Perhaps is this question posted in another form, I didn't found it.

Here is my code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
    <process expanded="true" height="521" width="415">
      <operator activated="true" class="read_csv" compatibility="5.0.11" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="file_name" value="D:\Eigene Dateien\Meine Projekte\Lotto\Rapidminer\Test.csv"/>
        <parameter key="encoding" value="windows-1252"/>
        <parameter key="trim_lines" value="true"/>
        <parameter key="use_first_row_as_attribute_names" value="false"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="attribute_0.true.1.regular"/>
          <parameter key="1" value="attribute_1.true.1.regular"/>
          <parameter key="2" value="attribute_2.true.1.regular"/>
          <parameter key="3" value="attribute_3.true.1.regular"/>
          <parameter key="4" value="attribute_4.true.1.regular"/>
        </list>
        <parameter key="attribute_names_already_defined" value="true"/>
      </operator>
      <operator activated="true" class="rename_by_replacing" compatibility="5.0.11" expanded="true" height="76" name="Rename by Replacing" width="90" x="179" y="30">
        <parameter key="replace_what" value="(attribute_)"/>
        <parameter key="replace_by" value="Z"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="5.0.11" expanded="true" height="112" name="Multiply" width="90" x="45" y="120"/>
      <operator activated="true" class="series:windowing" compatibility="5.0.2" expanded="true" height="76" name="Windowing" width="90" x="179" y="165">
        <parameter key="window_size" value="3"/>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="5.0.2" expanded="true" height="76" name="Windowing (2)" width="90" x="179" y="255">
        <parameter key="window_size" value="2"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
      <connect from_op="Rename by Replacing" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_port="result 1"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Windowing" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 3" to_op="Windowing (2)" to_port="example set input"/>
      <connect from_op="Windowing" from_port="example set output" to_port="result 2"/>
      <connect from_op="Windowing (2)" from_port="example set output" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="90"/>
      <portSpacing port="sink_result 2" spacing="36"/>
      <portSpacing port="sink_result 3" spacing="162"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>

Is there a special opeator for this?
Tagged:

Answers

  • wesselwessel Member Posts: 537 Maven
    What you want to achieve with this last row?

    Make a prediction for the last window in your data?
  • WinKadWinKad Member Posts: 9 Contributor II
    Hallo wessel,
    yes, that's what I want.
    But I have seen by trying to apply both outputs after windowing - one with window size=3 and the other with window size=2 -  together that RM say NO to this managing. I suppose that there is a problem with the names or/and the order of the columns.
    I have just looked at the output with 9,1 10,1 9,2 10,2 9,3 10,3 ... 9,5 10,5. But that is just what I want to get. Do I have to rename the names of the columns (with a macro-Iterator)?
    Ciao
    Winkad
  • wesselwessel Member Posts: 537 Maven
    I did something very similar but I was not happy with my solution, so I hope someone else can suggest something better.

    If you have a dataset lets say:
    x
    1
    2
    3
    4
    5
    6
    7
    8
    9

    and you have windowSize = 3, horizon = 2, you get
    x-2 x-1 x-0 label  (where label is x+2)
    1  2  3  5
    2  3  4  6
    3  4  5  7
    4  5  8  9

    so what you want is
    7  8  9  ?

    you can get this by filter example range 7 to 9
    which gives
    x
    7
    8
    9

    if you do windowing on this dataset without a horizon you get
    x-2 x-1 x-0
    7  8  9

    Rapid Miner automatically adds the label attribute, it will give a warning that the label is missing, but it will work.
  • WinKadWinKad Member Posts: 9 Contributor II
    Hi everybody,
    oh, what am I stupid. I thought that Filtering by Example Range is meaning the content of the rows...
    Now here is what I found out:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.10" expanded="true" name="Process">
        <process expanded="true" height="386" width="480">
          <operator activated="true" class="subprocess" compatibility="5.0.10" expanded="true" height="130" name="Subprocess" width="90" x="45" y="30">
            <parameter key="parallelize_nested_chain" value="true"/>
            <process expanded="true" height="431" width="567">
              <operator activated="true" class="read_csv" compatibility="5.0.10" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
                <parameter key="file_name" value="D:\Eigene Dateien\Meine Projekte\Lotto\Rapidminer\Test.csv"/>
                <parameter key="encoding" value="windows-1252"/>
                <parameter key="trim_lines" value="true"/>
                <parameter key="use_first_row_as_attribute_names" value="false"/>
                <list key="data_set_meta_data_information">
                  <parameter key="0" value="attribute_0.true.1.regular"/>
                  <parameter key="1" value="attribute_1.true.1.regular"/>
                  <parameter key="2" value="attribute_2.true.1.regular"/>
                  <parameter key="3" value="attribute_3.true.1.regular"/>
                  <parameter key="4" value="attribute_4.true.1.regular"/>
                </list>
                <parameter key="attribute_names_already_defined" value="true"/>
              </operator>
              <operator activated="true" class="rename_by_replacing" compatibility="5.0.11" expanded="true" height="76" name="Rename by Replacing" width="90" x="179" y="30">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attributes" value="attribute_4|attribute_3|attribute_2|attribute_1|attribute_0"/>
                <parameter key="regular_expression" value="(attribute_)"/>
                <parameter key="replace_what" value="(attribute_)"/>
                <parameter key="replace_by" value="col"/>
              </operator>
              <operator activated="true" class="multiply" compatibility="5.0.11" expanded="true" height="94" name="Multiply" width="90" x="45" y="165"/>
              <operator activated="true" class="series:windowing" compatibility="5.0.2" expanded="true" height="76" name="Windowing2" width="90" x="179" y="255">
                <parameter key="window_size" value="2"/>
              </operator>
              <operator activated="true" class="filter_example_range" compatibility="5.0.11" expanded="true" height="76" name="Filter Example Range" width="90" x="313" y="255">
                <parameter key="first_example" value="4"/>
                <parameter key="last_example" value="4"/>
              </operator>
              <operator activated="true" class="series:windowing" compatibility="5.0.2" expanded="true" height="76" name="Windowing3" width="90" x="179" y="165">
                <parameter key="window_size" value="3"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="5.0.11" expanded="true" height="76" name="Set Role" width="90" x="313" y="165">
                <parameter key="name" value="col0-0"/>
                <parameter key="target_role" value="label"/>
              </operator>
              <operator activated="true" class="naive_bayes" compatibility="5.0.11" expanded="true" height="76" name="Naive Bayes" width="90" x="447" y="165"/>
              <operator activated="true" class="apply_model" compatibility="5.0.11" expanded="true" height="76" name="Apply Model" width="90" x="447" y="255">
                <list key="application_parameters"/>
              </operator>
              <connect from_op="Read CSV" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
              <connect from_op="Rename by Replacing" from_port="example set output" to_op="Multiply" to_port="input"/>
              <connect from_op="Multiply" from_port="output 1" to_op="Windowing3" to_port="example set input"/>
              <connect from_op="Multiply" from_port="output 2" to_op="Windowing2" to_port="example set input"/>
              <connect from_op="Windowing2" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
              <connect from_op="Filter Example Range" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Windowing3" from_port="example set output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_op="Naive Bayes" to_port="training set"/>
              <connect from_op="Naive Bayes" from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_op="Apply Model" from_port="labelled data" to_port="out 1"/>
              <connect from_op="Apply Model" from_port="model" to_port="out 2"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="234"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
              <portSpacing port="sink_out 4" spacing="0"/>
              <portSpacing port="sink_out 5" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Subprocess" from_port="out 1" to_port="result 1"/>
          <connect from_op="Subprocess" from_port="out 2" to_port="result 2"/>
          <connect from_op="Subprocess" from_port="out 3" to_port="result 3"/>
          <connect from_op="Subprocess" from_port="out 4" to_port="result 4"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
          <portSpacing port="sink_result 5" spacing="0"/>
        </process>
      </operator>
    </process>
    The operator 'Naive Bayes' is nonsens here, but I want check if the filtered row would be accepted by the 'Apply Model'-operator.
    But now, what is the meaning of
    PM WARNING: SimpleDistribution: The number of regular attributes of the given example set does not fit the number of attributes of the training example set, training: 14, application: 10
    PM WARNING: SimpleDistribution: The given example set does not contain a regular attribute with name 'col0-2'. This might cause problems for some models depending on this particular attribute.
    ?

  • WinKadWinKad Member Posts: 9 Contributor II
    Additional question: how can I get the number of the last row?
  • WinKadWinKad Member Posts: 9 Contributor II
    Hi,
    :'( Note: I suppose there is an error!  Let's see...
    Windowing with window size=3 give with an original data set of 2 columns, labeled as C0 and C1, and with the header  (here in Excel notation) :
    C0-2 C0-1 C0-0 C1-2 C1-1 C0-0
    A1    A2  A3    B1    B2    B3
    A2    A3  A4    B2    B3    B4
    A3    A4  A5    B3    B4    B5

    Windowing with window size=2
    C0-1 C0-0 C1-1 C1-0
    A1  A2    B1    B2
    A2  A3    B2    B3
    A3  A4    B3    B4
    A4  A5    B4    B5

    Using this 2 example sets, the second one as unlabeled, with ApplyModel don't match.
    It's a great pity!
    I cannot make head or tail of it. ???

    Ciao
    WinKad
Sign In or Register to comment.