Options

How can RM identify sequences in dataset?

olandesinoolandesino Member Posts: 19 Maven
edited November 2018 in Help
HI all,

I've got a problem and it seems that i'm not able to solve by myself  :( .

I tried to use RM to make a sequence analysis since i have a dataset containing many logs of test results.
My csv dataset contains 3 columns (3 attributes) and thousands of rows representing all values.
The problem is that each test case is 50 rows, so how can I tell to RM that
each 50 rows represent an indipendent group
? so i can find interesting patterns "inside" each test case?
Note, there are 8800 test case in my data set, so is useless create 8800 files.

I hope is it clear.
Thx in advance.

A.Florio

Answers

  • Options
    haddockhaddock Member Posts: 849 Maven
    Bonsoir!

    I think that the MultivariateSeries2WindowExamples operator may be what you need, here's an example of this bad boy at work on a mock up of your problem, 8800 entries, representing 176 rows of 50 attributes.
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random"/>
            <parameter key="number_examples" value="8800"/>
            <parameter key="number_of_attributes" value="3"/>
        </operator>
        <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
            <parameter key="window_size" value="50"/>
            <parameter key="step_size" value="50"/>
        </operator>
    </operator>
  • Options
    olandesinoolandesino Member Posts: 19 Maven
    Thanks for the advice but it seems that (after the preprocessing) they are still not grouped in "sequences".
    Besides, there are some problem too when RM says that an attribut must have the same type of value...and this is not my case :-(
    Any other suggestions?

    thank you anyway!

    A.Florio
  • Options
    haddockhaddock Member Posts: 849 Maven
    so how can I tell to RM that
    each 50 rows represent an indipendent group?
    What exactly did you mean by "group" ?

  • Options
    olandesinoolandesino Member Posts: 19 Maven
    group like:
    Serie1:
    50 elements of attr 1
    50 elements of attr 2
    50 elements of attr 3.

    Serie2:
    |
    |
    SerieN

    So that RM can apply its algorithm not on ALL values, but to the single series.
    Example: find pattern through Apriori inside each series and after maybe compare them.
    I know that is not so easy to understand my problem, but i try to explain it as the best way.

  • Options
    haddockhaddock Member Posts: 849 Maven
    Hmm, the previous example produces 176 rows which contain the previous 50 values for each of the 3 attributes based on the notion that each 50 row clump is disinct, so just like your series. If you meant that each example is made up of the last 50 values for each attribute then you change the step size to one, like this, where we just look for sequence patterns in att3.

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random"/>
            <parameter key="number_examples" value="8800"/>
            <parameter key="number_of_attributes" value="3"/>
        </operator>
        <operator name="FeatureNameFilter" class="FeatureNameFilter">
            <parameter key="skip_features_with_name" value="att1|att2"/>
        </operator>
        <operator name="BinDiscretization" class="BinDiscretization">
            <parameter key="range_name_type" value="short"/>
        </operator>
        <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
            <parameter key="window_size" value="50"/>
            <parameter key="step_size" value="1"/>
        </operator>
        <operator name="W-Apriori" class="W-Apriori">
        </operator>
    </operator>
  • Options
    olandesinoolandesino Member Posts: 19 Maven
    I know that I'm close (thx to your help) but it is still not sufficient.
    Let's put it in a simple way....I've 1 attribute with 150 elements (rows),
    and i want to see in result mode on the 'data view' Series1, Series2, Series3
    with under them, 50 values of the attributes.

    if I do so :
    <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
            <operator name="ExampleSource" class="ExampleSource">
                <parameter key="attributes" value="~/minim.aml"/>
            </operator>
            <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples" breakpoints="after">
                <parameter key="horizon" value="1"/>
                <parameter key="window_size" value="50"/>
                <parameter key="step_size" value="50"/>
                <parameter key="add_incomplete_windows" value="true"/>
            </operator>
    then the output will be (in result mode->data view) : 3 example, 50 attributes' (wrong! I've 1 attribute and 3x50 values)
    i tried other series preprocessing operation like "index series" or "Single2series" but it still not what i want.
    Meanwhile I want to say that I rally appreciate your help.
    A.Florio
  • Options
    haddockhaddock Member Posts: 849 Maven
    Let's put it in a simple way....I've 1 attribute with 150 elements (rows),
    and i want to see in result mode on the 'data view' Series1, Series2, Series3
    with under them, 50 values of the attributes.
    Does the following do it?
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random"/>
            <parameter key="number_examples" value="150"/>
            <parameter key="number_of_attributes" value="1"/>
        </operator>
        <operator name="FeatureNameFilter" class="FeatureNameFilter" breakpoints="after">
            <parameter key="filter_special_features" value="true"/>
            <parameter key="skip_features_with_name" value="label"/>
        </operator>
        <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
            <parameter key="window_size" value="3"/>
            <parameter key="step_size" value="3"/>
        </operator>
        <operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
            <parameter key="replace_what" value="att.*-"/>
            <parameter key="replace_by" value="Series_"/>
        </operator>
    </operator>
    Hope so! Good weekend.
  • Options
    olandesinoolandesino Member Posts: 19 Maven
    In this way, i got 3 columns(ok), but the first one doesn't contains
    the first 50 values of my dataset. The values are spread like
    a matrix index (1st rows, 2nd rows, ...). how can i tell it to take the first 50 values,
    put in the 1st column (1st series), second 50 values, put in 2nd column (2nd series) and so on?
    Thank you a lot for your help.

    A.Florio
  • Options
    haddockhaddock Member Posts: 849 Maven
    OK, now I see what you mean, at least I hope so! What about this?

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random"/>
            <parameter key="number_examples" value="150"/>
            <parameter key="number_of_attributes" value="1"/>
        </operator>
        <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
            <parameter key="window_size" value="50"/>
            <parameter key="step_size" value="50"/>
        </operator>
        <operator name="ExampleSetTranspose" class="ExampleSetTranspose">
        </operator>
        <operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
            <parameter key="replace_what" value="att"/>
            <parameter key="replace_by" value="Series"/>
            <parameter key="apply_on_special" value="false"/>
        </operator>
    </operator>
  • Options
    olandesinoolandesino Member Posts: 19 Maven
    I get this error message when i put my simple dataset with just 1 column (only 1 attribute)
    AttributeTypeException
    Process failed Message: Cannot map index of nominal attribute to nominal value: index 0 is out of bounds!
    Even after a few changes in my dataset, i get always the same error, with out telling me where exactly is in the tree.
    What it does mean?
  • Options
    haddockhaddock Member Posts: 849 Maven
    Without seeing the data there is not much I can say.
  • Options
    olandesinoolandesino Member Posts: 19 Maven
    This is just a piece of the 1 attribute of my dataset.
    Too make things easier, I ignored (for now) other attributes.
    It is a series of operations: numerical and nominal, nothing special.

    [attachment deleted by admin]
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi,

    I noticed a blank line at the end of your file, so I took that out and then copied and pasted 6 times to end up with 140 rows. In the following example I'm saying a series is 20 rows, so we should have 7 identical columns as series, and we do  ;D
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSource" class="ExampleSource" activated="no">
            <parameter key="attributes" value="C:\Program Files (x86)\Rapid-I\RapidMiner-4.3\simple2"/>
        </operator>
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="C:\Users\CJFP\Documents\rm_workspace\simple-2.txt"/>
            <parameter key="read_attribute_names" value="false"/>
        </operator>
        <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
            <parameter key="window_size" value="20"/>
            <parameter key="step_size" value="20"/>
        </operator>
        <operator name="ExampleSetTranspose" class="ExampleSetTranspose">
        </operator>
        <operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
            <parameter key="replace_what" value="att"/>
            <parameter key="replace_by" value="Series"/>
        </operator>
    </operator>

    [attachment deleted by admin]
Sign In or Register to comment.