Keeping only the last data generated.

ShubhaShubha Member Posts: 139 Maven
edited November 2018 in Help
Hi,

I am applying an operation by groups by using "ValueSubgroupIterator". I have clicked for the option, "apply_on_complete_set" too. So, the number of output examplesets generated are (number of groups in the grouping attribute + 1). Now, I want to keep only that ExampleSet, which is generated by clicking the "apply_on_complete_set". How do i do this automatically for any number of groups in the grouping variable using IOSelector?

Thanks for your help,
Shubha

Answers

  • ShubhaShubha Member Posts: 139 Maven
    Does the option "Iteration_macro" in the ValueSubgroupIterator help?
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    One option would be to use the IOStorage + IORetriever mechanism - another one would be to use a counter macro by the new operator MacroConstruction (available with RM 4.4).

    Cheers,
    Ingo
  • haddockhaddock Member Posts: 849 Maven
    I am applying an operation by groups by using "ValueSubgroupIterator". I have clicked for the option, "apply_on_complete_set" too. So, the number of output examplesets generated are (number of groups in the grouping attribute + 1). Now, I want to keep only that ExampleSet, which is generated by clicking the "apply_on_complete_set".
    I'm a bit puzzled by the question, wouldn't that example set be exactly the same as would be produced by just applying the operator to the original example set? Here is an example....
    <operator name="Root" class="Process" expanded="yes">
        <description text="#ylt#p#ygt#In cases where a learning scheme cannot handle numerical attributes it might be necessary to apply a discretization step. In this process we use FrequencyDiscretization which tries to identify split points in a way that all bins contain the same number of examples.#ylt#/p#ygt#"/>
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes" value="../data/sonar.aml"/>
        </operator>
        <operator name="FrequencyDiscretization" class="FrequencyDiscretization">
            <parameter key="range_name_type" value="short"/>
        </operator>
        <operator name="IdTagging" class="IdTagging" breakpoints="after">
        </operator>
        <operator name="ValueSubgroupIterator" class="ValueSubgroupIterator" expanded="yes">
            <parameter key="apply_on_complete_set" value="true"/>
            <list key="attributes">
              <parameter key="attribute_1" value="all"/>
            </list>
            <parameter key="filter_attribute" value="false"/>
            <operator name="IdTagging (2)" class="IdTagging">
            </operator>
        </operator>
    </operator>
    The examples are converted to polynominals and given IDs, making Example Set 1 , then they are grouped by the value for attribute_1 and given new IDs , making Example Sets 2 & 3, and finally because "apply_on_complete_set" is enabled the whole example set is given IDs, making Example set 4.

    Being a bear of little brain I can't see what the difference would ever be between Example Set 1 and Example Set 4. So I've obviously missed the point of the question, perhaps you could elaborate?
  • ShubhaShubha Member Posts: 139 Maven
    [quote author=haddock link=topic=668.msg2514#msg2514 date=1236697494]
    I'm a bit puzzled by the question, wouldn't that example set be exactly the same as would be produced by just applying the operator to the original example set?


    If i am correct, while applying the operator "Normalization", this may not work. Please check with the below code(attached is the data i used):


    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes" value="C:\Documents and Settings\shubhak\Desktop\try.aml"/>
        </operator>
        <operator name="Complete_Norm" class="Normalization">
            <parameter key="method" value="Range-Transformation"/>
            <parameter key="min" value="-1.0"/>
        </operator>
        <operator name="ValueSubgroupIterator" class="ValueSubgroupIterator" expanded="yes">
            <parameter key="apply_on_complete_set" value="true"/>
            <list key="attributes">
              <parameter key="Group" value="all"/>
            </list>
            <parameter key="filter_attribute" value="false"/>
            <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
                <parameter key="attribute_name_regex" value="x.*"/>
                <parameter key="condition_class" value="is_numerical"/>
                <operator name="Normalization" class="Normalization">
                    <parameter key="method" value="Range-Transformation"/>
                    <parameter key="min" value="-1.0"/>
                </operator>
            </operator>
        </operator>
    </operator>


    I expected the results of "Complete_Norm" and the "Normalization"(by groupings) for complete_subset would be the same. But, it is not so.... Am i missing somethings?


    Thanks, Shubha

    [attachment deleted by admin]
  • haddockhaddock Member Posts: 849 Maven
    Hi,

    I think the reason that you are not getting what you expect is that you normalize values that you have already normalized!  :-\

    To do what you intend you need to make copies of the data and compare the results. I've laid out an alternative version below which acts predictably. I think you will benefit from working rhrough the examples, as the flow of control in Rapidminer can play mirage tricks!
    <operator name="Spot the difference?" class="Process" expanded="yes">
        <operator name="Datafile:- Group as Polynominal and x1 as numeric" class="ExampleSource">
            <parameter key="attributes" value="C:\Documents and Settings\Alien\My Documents\rm_workspace\t.aml"/>
        </operator>
        <operator name="Make a copy example set for the comparison" class="IOMultiplier">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="Firstly,  Global Normalization" class="OperatorChain" expanded="yes">
            <operator name="Grab examples copy 1" class="IOSelector">
                <parameter key="io_object" value="ExampleSet"/>
            </operator>
            <operator name="Z it" class="Normalization" breakpoints="after">
            </operator>
        </operator>
        <operator name="Then normalization by Group and Globally" class="OperatorChain" expanded="yes">
            <operator name="Grab examples copy 2" class="IOSelector">
                <parameter key="io_object" value="ExampleSet"/>
                <parameter key="select_which" value="2"/>
            </operator>
            <operator name="Foreach group and for all examples" class="ValueSubgroupIterator" expanded="yes">
                <parameter key="apply_on_complete_set" value="true"/>
                <list key="attributes">
                  <parameter key="Group" value="all"/>
                </list>
                <parameter key="filter_attribute" value="false"/>
                <operator name="Do those Zs!" class="Normalization">
                    <parameter key="create_view" value="true"/>
                </operator>
            </operator>
        </operator>
    </operator>
Sign In or Register to comment.