Fitting 'by groups'

stereotaxonstereotaxon Member Posts: 10 Contributor II
edited November 2018 in Help
Hi,

I have grouped data and I'd like to fit a model to each group's data.  To do this in SAS, I use a BY <grouping var> statement and in SPSS it's a split file statement.  Is there a way to do this in RapidMiner?  As I'm doing it now, I'm writing out each subgroups data which is extremely slow and difficult to maintain. 

Thanks for your help,

Mike

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Mike,

    yes, this is possible by using the ExampleFilter operator. Let's say, you have three groups "A", "B", and "C" which are specified in the attribute (column) "groups". You can then use the filter operator together with the ParameterIteration operator (but then you would have to define the groups manually). If you access the latest CVS version, this is possible much more comfortable by using the new ValueSubgroupIterator operator like in the following example:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="DataCreation" class="OperatorChain" expanded="no">
            <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
                <parameter key="target_function" value="sum classification"/>
            </operator>
            <operator name="BinDiscretization" class="BinDiscretization">
                <parameter key="number_of_bins" value="3"/>
                <parameter key="use_long_range_names" value="false"/>
            </operator>
            <operator name="AttributeValueMapper" class="AttributeValueMapper">
                <parameter key="attributes" value="att1"/>
                <parameter key="replace_by" value="A"/>
                <parameter key="replace_what" value="range1"/>
            </operator>
            <operator name="AttributeValueMapper (2)" class="AttributeValueMapper">
                <parameter key="attributes" value="att1"/>
                <parameter key="replace_by" value="B"/>
                <parameter key="replace_what" value="range2"/>
            </operator>
            <operator name="AttributeValueMapper (3)" class="AttributeValueMapper">
                <parameter key="attributes" value="att1"/>
                <parameter key="replace_by" value="C"/>
                <parameter key="replace_what" value="range3"/>
            </operator>
            <operator name="ChangeAttributeName" class="ChangeAttributeName">
                <parameter key="new_name" value="groups"/>
                <parameter key="old_name" value="att1"/>
            </operator>
        </operator>
        <operator name="ValueSubgroupIterator" class="ValueSubgroupIterator" expanded="yes">
            <list key="attributes">
              <parameter key="groups" value="all"/>
            </list>
            <operator name="DecisionTree" class="DecisionTree">
            </operator>
        </operator>
    </operator>
    The first operator chain is just used for data creation. Please note that you have to access the latest CVS version for this which is described here: http://rapid-i.com/content/view/25/48/
    Or you can simply wait since we will make a new release probably next week.

    Cheers,
    Ingo
  • stereotaxonstereotaxon Member Posts: 10 Contributor II
    Thanks for your help.  I was able to get it working before you responded by doing something like you suggest.  I have a ton of groups so I created a dummy grouping variable (1-n) and then used an iteratingOperatorChain and the {a} macro to select distinct groups and then run the analysis repetitively.
    -Mike
Sign In or Register to comment.