Data partitioning, subsequent plotting

trhaynestrhaynes Member Posts: 2 Contributor I
edited November 2018 in Help
I am new to RM, but am very excited by its possibilities!  I've read through lots of forum messages, documentation, and videos, but cannot seem to figure out how to do a something that seems fairly simple.  Any help would be wonderful!

Here is a an example of my data
id1id2periodvalue
1a2.11
2a3.23
3a4.12
4b3.14
5b4.11
6b5.19
7b6.18
I would like RM to partition the data according to the value in id2.  In this simple example, it would split this data into two tables based on the values of 'a' and 'b' in id2:
id1id2periodvalue
1a2.11
2a3.23
3a4.12
id1id2periodvalue
4b3.14
5b4.11
6b5.19
7b6.18
Then I'd like to generate plots for these 2 tables (possibly separate plots, or just overlaid on the same plot) which graphs value on the Y-axis and period on the X-axis.

The original data will actually be partitioned into about 100 tables as a result, with varying numbers of records per table.  Tables with more periods will have more records, for example.

I have surmounted getting RM to read the data from SQL 2008R2, but can't seem to figure out how to get it to split the data up into separate example sets.

Thanks for any help!!

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hello,

    I would like RM to partition the data according to the value in id2.  In this simple example, it would split this data into two tables based on the values of 'a' and 'b' in id2:
    well, as almost always there are multiple ways how you can achieve this within RapidMiner. Which is the best one often depends on what you are doing afterwards.


    Option 1: Sequence of "Filter Examples"

    You could use a sequence of operators "Filter Examples" to divide the data into subsets. The process below is an example for this on the Golf data set which is splitted according to Wind=true vs. Wind=false. You can then use for example the Reporting Extension or RapidAnalytics for creating the plots automatically.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="206" width="480">
          <operator activated="true" class="retrieve" compatibility="5.1.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.1.006" expanded="true" height="76" name="Filter Examples" width="90" x="179" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="Wind=true"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.1.006" expanded="true" height="76" name="Filter Examples (2)" width="90" x="313" y="120">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="Wind=false"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <connect from_op="Filter Examples" from_port="original" to_op="Filter Examples (2)" to_port="example set input"/>
          <connect from_op="Filter Examples (2)" from_port="example set output" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="72"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

    Option 2: "Loop Values" plus Macro plus "Filter Examples"

    This option is probably more suitable if you have many (i.e. more than two) different values. So after trying Option 1 for a couple of values you will probably end up with Option 2. As for Option 1, you could then use RapidAnalytics or the Reporting Extension for creating the desired plots automatically. Below is a process showing the basic loop:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="206" width="480">
          <operator activated="true" class="retrieve" compatibility="5.1.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="loop_values" compatibility="5.1.006" expanded="true" height="76" name="Loop Values" width="90" x="179" y="30">
            <parameter key="attribute" value="Wind"/>
            <process expanded="true" height="744" width="887">
              <operator activated="true" breakpoints="after" class="filter_examples" compatibility="5.1.006" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="Wind=%{loop_value}"/>
              </operator>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    Option 3: Reporting with RapidAnalytics

    Another quite elegant option, especially if you want to visualize the data in the web or want to embed the results into other solutions, is the Enterprise Edition of RapidAnalytics which would allow to create interarctive reports showing exactly this kind of visualizations where users can select the group or all plots for all values can be created automatically.


    Probably there are more options but those should be sufficient for the beginning  ;D

    Cheers,
    Ingo
  • trhaynestrhaynes Member Posts: 2 Contributor I
    Thanks very much for the ideas, Ingo!  I'll play around with them and see what I can accomplish.  Thanks again!
  • pmcnallypmcnally Member Posts: 4 Contributor I
    I know this is an old topic, but I just wanted to reply that I successfully used the code Ingo supplied for Option 2.  There was one hiccup. When I imported the xml it automatically set up a break point inside the loop.  So, it did not exit the inner process and iterate through to the second loop value.  Instead it just hung up.  After removing the break point it successfully executed and I was able to modify it for my particular use.
Sign In or Register to comment.