Options

SOMDimensionalityReduction and split attribute set

brianbakerbrianbaker Member Posts: 24 Maven
edited July 2019 in Help
I want to create an SOM classification and keep the attributes moving through the process stream.  When I create the SOM I get back the ID, label, and SOM dimensions only.  How do I keep the attributes?

More generally, is there a way to duplicate the attributes, send them into two separate processes and join them back into one set?  I've found the join operators, but I haven't found a split.

Thanks!

    <operator name="som" class="SOMDimensionalityReduction" breakpoints="after">
        <parameter key="return_preprocessing_model" value="true"/>
        <parameter key="number_of_dimensions" value="1"/>
        <parameter key="net_size" value="20"/>
        <parameter key="training_rounds" value="60"/>
    </operator>
Tagged:

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    just use the iomultiply operator to copy the exampleset before using the som. After this just join the original and the new exampleset. The id's will be used to identify examples which belong together.

    Greetings,
      Sebastian
  • Options
    brianbakerbrianbaker Member Posts: 24 Maven
    Thank you that is very helpful!

    I can merge the som dimension field back in:

        <operator name="build som" class="OperatorChain" expanded="yes">
            <operator name="IOMultiplier" class="IOMultiplier">
                <parameter key="io_object" value="ExampleSet"/>
                <parameter key="multiply_type" value="multiply_all"/>
            </operator>
            <operator name="SOMDimensionalityReduction" class="SOMDimensionalityReduction">
                <parameter key="number_of_dimensions" value="1"/>
                <parameter key="net_size" value="15"/>
            </operator>
            <operator name="ExampleSetJoin" class="ExampleSetJoin">
            </operator>
        </operator>
    However, I'd like to be able to split the stream, do different manipulations to both pieces and then merge them.  When I try, the 1st data set always goes into the chain. 

        <operator name="featureFix" class="OperatorChain" breakpoints="after" expanded="no">
            <description text="create derived features, transform nominal to numeric, and remove correlated  ones"/>
            <operator name="Normalization" class="Normalization">
                <parameter key="return_preprocessing_model" value="true"/>
                <parameter key="create_view" value="true"/>
            </operator>
            <operator name="Nominal2Binominal" class="Nominal2Binominal">
            </operator>
            <operator name="FeatureNameFilter" class="FeatureNameFilter">
                <parameter key="skip_features_with_name" value="gender = F"/>
            </operator>
            <operator name="Nominal2Numerical" class="Nominal2Numerical">
            </operator>
            <operator name="AttributeConstruction" class="AttributeConstruction">
                <list key="function_descriptions">
                  <parameter key="ageAdjPushup" value="pushupPre / age"/>
                  <parameter key="ageAdjSitup" value="situpPre / age"/>
                </list>
            </operator>
            <operator name="RemoveCorrelatedFeatures" class="RemoveCorrelatedFeatures">
            </operator>
        </operator>
        <operator name="build som" class="OperatorChain" breakpoints="after" expanded="yes">
            <operator name="SOMDimensionalityReduction" class="SOMDimensionalityReduction" breakpoints="after">
                <parameter key="number_of_dimensions" value="1"/>
                <parameter key="net_size" value="15"/>
            </operator>
        </operator>
        <operator name="ExampleSetJoin" class="ExampleSetJoin" breakpoints="after">
        </operator>
    Is there a way to reorder the data sets so that I can run each through a different set of operations?
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    yes. You can use the IOSelector to push one of the data sets on top of the stack of objects.

    Or simply wait for RapidMiner 5. This will make all this things unnecessary and much more intuitive because of the explicit flow layout. :)

    Greetings,
      Sebastian
  • Options
    brianbakerbrianbaker Member Posts: 24 Maven
    Nice!  when will it be released?
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    we are going to publish the final version in mid December since it's definitively something you have to put under the Christmas tree :)

    Greetings,
      Sebastian
Sign In or Register to comment.