Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Difference operator?

Legacy UserLegacy User Member Posts: 0 Newbie
edited November 2018 in Help
I have a table with N rows of data. Is there any operator that would convert it into a table of N-1 rows, where each row is a difference of two consecutive rows of the original table?

That is, input:

X  Y
a  b
c  d
e  f

Desired output:

X      Y
c-a  d-b
e-c  f-d

If there is no such operator, can you treat this as a feature request?

Thank you!


Answers

  • steffensteffen Member Posts: 347 Maven
    Hello Victor

    As far as I know, there is no such operator. Nevertheless, here is a workaround (thanks to RapidMiner for its litte operators which can be combined in powerful ways), tested with the iris-dataset which is also part of RM.
    <operator name="Root" class="Process" expanded="yes">
        <operator name="load_data" class="ExampleSource">
            <parameter key="attributes" value="iris.aml"/>
        </operator>
        <operator name="remove_label" class="FeatureNameFilter">
            <parameter key="filter_special_features" value="true"/>
            <parameter key="skip_features_with_name" value="label"/>
        </operator>
        <operator name="copy_data" class="IOMultiplier">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="process_minuend" class="OperatorChain" expanded="no">
            <operator name="remove_first_row" class="ExampleFilter">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="invert_filter" value="true"/>
                <parameter key="parameter_string" value="id=id_1"/>
            </operator>
            <operator name="remove_ID" class="FeatureNameFilter">
                <parameter key="filter_special_features" value="true"/>
                <parameter key="skip_features_with_name" value="id"/>
            </operator>
            <operator name="addID" class="IdTagging">
            </operator>
        </operator>
        <operator name="select_second_set" class="IOSelector">
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="select_which" value="2"/>
        </operator>
        <operator name="process_subtrahend" class="OperatorChain" expanded="yes">
            <operator name="remove_last_row" class="ExampleFilter">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="invert_filter" value="true"/>
                <parameter key="parameter_string" value="id=id_150"/>
            </operator>
            <operator name="remove_ID (2)" class="FeatureNameFilter">
                <parameter key="filter_special_features" value="true"/>
                <parameter key="skip_features_with_name" value="id"/>
            </operator>
            <operator name="addID (2)" class="IdTagging">
            </operator>
        </operator>
        <operator name="ExampleSetJoin" class="ExampleSetJoin">
            <parameter key="remove_double_attributes" value="false"/>
        </operator>
        <operator name="FeatureGeneration" class="FeatureGeneration">
            <list key="functions">
              <parameter key="new_a1" value="-(a1,a1_from_ES2)"/>
              <parameter key="new_a2" value="-(a2,a2_from_ES2)"/>
            </list>
        </operator>
    </operator>
    Unfortunately you got to define your functions manually (in "FeatureGeneration"), hence this is only suitable for a smaller amount of attributes.

    Note: To make this work in your situation, you must change the dataset-specific parameters (primarily the names of the attributes).
    I hope this setup is self-explanatory, if not, feel free to ask.

    hope this was helpful

    Steffen
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi Victor, hi Steffen,

    wow, what a process, I did not imagine this was even possible .. ;) ... just joking!

    Well, there is indeed actually no operator accomplishing this task. Although Ingo wrote a meta operator [tt]RelativeRegression[/tt] which allows to regress on the difference of label values, there is yet no general operator which allows to build differences of attribute values. Concerning time series models, this would certainly be a nice-to-have-operator. So, sometime somebody of us will certainly write such an operator... which by the way should not be all to complicated!

    Regards,
    Tobias

  • Legacy UserLegacy User Member Posts: 0 Newbie
    Hmm... There is no way I could come up with this sequence by myself.

    If you going to add this operator, can you make it with a choice of the function:

    1. Differences.
    2. Ratios
    3. Ratios - 1
    4. ln(ratios)

    This seem to cover all typical cases.

    Thank you for the great product and the fast response!
    Victor

  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi Victor,
    Victor wrote:

    If you going to add this operator, can you make it with a choice of the function:

    1. Differences.
    2. Ratios
    3. Ratios - 1
    4. ln(ratios)

    This seem to cover all typical cases.
    as I tried to imply, there is pretty much on our schedule at the moment, hence we will not have enough time in the short term. But we will keep this in mind. I think, you nicely resumed the requirements for the functionality of such an operator. Thanks!

    Regards,
    Tobias
  • fjcuberosfjcuberos Member Posts: 18 Maven
    I've an operator (part of a plugin) that makes the difference of adjacents attributes.
    I don´t know if it is too late. I can send you the sources if you want to extend to examples.

    F.J. Cuberos

  • KolodziejKolodziej Member Posts: 18 Contributor II
    Hallo,
    i have the same problem. I need the difference of two rows.
    Is there an operator who can do this? If not, how can i solve this problem?

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    you can use the Differentiate operator from the Series extension.

    Best, Marius
Sign In or Register to comment.