how to SELECT RANGE of observations

fritmorefritmore Member Posts: 90 Contributor II
edited November 2018 in Help
???  How do I select only a certain range of observations in Rapidminer? I found only operators for selecting attributes and splitting for validation.
thx

Answers

  • haddockhaddock Member Posts: 849 Maven
    Hi there,

    You have to know the jargon... Observation->Example. So "Filter Example Range" could be handy.

    Good weekend to all!

  • fritmorefritmore Member Posts: 90 Contributor II
    haddock wrote:

    Hi there,

    You have to know the jargon... Observation->Example. So "Filter Example Range" could be handy.

    Good weekend to all!


    hi
    Filter Example Range does not select a desired subset of examples(observations). It gives error to change the range and puts first example -2 and last example -1.

    The closest operator is Sample but i cannot chose whatever range e.g. n...n+1500

    cheerz
  • fritmorefritmore Member Posts: 90 Contributor II
    hmmm it looks like it doesnt work with CSV Read but it works with Retrieve, I would call it bug. what do u say?
  • haddockhaddock Member Posts: 849 Maven
    Filter Example Range does not select a desired subset of examples(observations). It gives error to change the range and puts first example -2 and last example -1.
    ???
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.003" expanded="true" name="Process">
        <process expanded="true" height="-20" width="-50">
          <operator activated="true" class="generate_data" compatibility="5.1.003" expanded="true" height="60" name="Generate Data" width="90" x="34" y="14"/>
          <operator activated="true" class="filter_example_range" compatibility="5.1.003" expanded="true" height="76" name="Filter Example Range" width="90" x="236" y="16">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="10"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    I would call it bug. what do u say?
    :-\
  • fritmorefritmore Member Posts: 90 Contributor II
    this does work but NOT with read CSV operator!
    haddock wrote:

    ???
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.003" expanded="true" name="Process">
        <process expanded="true" height="-20" width="-50">
          <operator activated="true" class="generate_data" compatibility="5.1.003" expanded="true" height="60" name="Generate Data" width="90" x="34" y="14"/>
          <operator activated="true" class="filter_example_range" compatibility="5.1.003" expanded="true" height="76" name="Filter Example Range" width="90" x="236" y="16">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="10"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    :-\
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Yes, it does!

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.003" expanded="true" name="Process">
        <process expanded="true" height="161" width="547">
          <operator activated="true" class="retrieve" compatibility="5.1.003" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="write_csv" compatibility="5.1.003" expanded="true" height="60" name="Write CSV" width="90" x="179" y="30">
            <parameter key="csv_file" value="C:\iris_test.csv"/>
          </operator>
          <operator activated="true" class="read_csv" compatibility="5.1.003" expanded="true" height="60" name="Read CSV" width="90" x="313" y="30">
            <parameter key="csv_file" value="C:\iris_test.csv"/>
            <list key="annotations"/>
            <list key="data_set_meta_data_information"/>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="5.1.003" expanded="true" height="76" name="Filter Example Range" width="90" x="447" y="30">
            <parameter key="first_example" value="10"/>
            <parameter key="last_example" value="29"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Write CSV" to_port="input"/>
          <connect from_op="Read CSV" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    The process above works like a charm (beware that it is going to write a file "C:\iris_test.csv"). Import the process and run it and you will see that you end up with 20 examples between 10 and 29 (instead of the 150 of the original data set).

    I think the reason why you think it is not working is the fact that the Problem view shows you two errors. But take a close look: it states that those are potential problems. And that's actually perfectly right: we can not get the necessary meta data from the CSV file without reading it and for exactly that reason it is not available during design time but only during execution time. If you use the configuration wizard, we already will know about the attribute meta data but still not about the data set size.

    You should use the repository as often as possible for data storage in order to avoid this type of confusion - we are always suggesting this!

    Cheers,
    Ingo
  • haddockhaddock Member Posts: 849 Maven
    this does work but NOT with read CSV operator!
    This is not the case on my machine. What happens is that the 'Filter Example Range' errors tab shows 'two potential errors' - each that the parameter value exceeds the example set size; however if I press the start button all works as expected.

    And you know what? That is exactly what should happen!!! How can the operator know how many examples are in the file? The file could be dud.  What you are being told is exactly that there are potential errors, which is perfectly correct.

    PS... This is bizarre, I've just seen the post of the pointy one, seems rather similar... Which one of us studied under zu Guttenberg?

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    PS... This is bizarre, I've just seen the post of the pointy one, seems rather similar... Which one of us studied under zu Guttenberg?
    Dr. Who?

    ;)
  • haddockhaddock Member Posts: 849 Maven
    :D
  • fritmorefritmore Member Posts: 90 Contributor II
    Haddock, Ingo, glad you having phun  ;D

    I see both your points.

    I still think this should not show as an ugly red error message it should rather Inform that the indexes May be outside the matrix dimensions. Or throw Just a runtime error if that is indeed the case.

    (ehm i did not know I could just ignore it and execute, i dont like error messages and am a toal newb to RM)

    Thank you both :-*
  • haddockhaddock Member Posts: 849 Maven
    Hi Fritmore,

    Actually I think you are right when you say 'it should rather Inform' - over the years I've noticed that people are freaked out unnecessarily by the metadata messages ( info about the dataflow ). If you are an old slob like me you just bash the start button and watch the springs fall out of the back of the machine.

    Perhaps there should be a paranoia setting in the preferences, so that these pesky messages can be served only on demand.

  • fritmorefritmore Member Posts: 90 Contributor II
    haddock wrote:

    Hi Fritmore,

    Actually I think you are right when you say 'it should rather Inform' - over the years I've noticed that people are freaked out unnecessarily by the metadata messages ( info about the dataflow ). If you are an old slob like me you just bash the start button and watch the springs fall out of the back of the machine.

    Perhaps there should be a paranoia setting in the preferences, so that these pesky messages can be served only on demand.


    ;D ;D do not crack me up like that around sleepy time

    you know maybe along the paranoia setting people would appreciate Difficulty setting in the preferences too like "Hurt me plenty" or "Hey, not too rough"
Sign In or Register to comment.