How can RM identify sequences in dataset?

olandesino · May 2009

HI all,

I've got a problem and it seems that i'm not able to solve by myself

.

I tried to use RM to make a sequence analysis since i have a dataset containing many logs of test results.
My csv dataset contains 3 columns (3 attributes) and thousands of rows representing all values.
The problem is that each test case is 50 rows, so how can I tell to RM that
each 50 rows represent an indipendent group? so i can find interesting patterns "inside" each test case?
Note, there are 8800 test case in my data set, so is useless create 8800 files.

I hope is it clear.
Thx in advance.

A.Florio

haddock · May 2009

Bonsoir!

I think that the MultivariateSeries2WindowExamples operator may be what you need, here's an example of this bad boy at work on a mock up of your problem, 8800 entries, representing 176 rows of 50 attributes.

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function"	value="random"/>
        <parameter key="number_examples"	value="8800"/>
        <parameter key="number_of_attributes"	value="3"/>
    </operator>
    <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
        <parameter key="window_size"	value="50"/>
        <parameter key="step_size"	value="50"/>
    </operator>
</operator>

olandesino · May 2009

Thanks for the advice but it seems that (after the preprocessing) they are still not grouped in "sequences".
Besides, there are some problem too when RM says that an attribut must have the same type of value...and this is not my case :-(
Any other suggestions?

thank you anyway!

A.Florio

haddock · May 2009

so how can I tell to RM that
each 50 rows represent an indipendent group?

What exactly did you mean by "group" ?

olandesino · May 2009

group like:
Serie1:
50 elements of attr 1
50 elements of attr 2
50 elements of attr 3.

Serie2:
|
|
SerieN

So that RM can apply its algorithm not on ALL values, but to the single series.
Example: find pattern through Apriori inside each series and after maybe compare them.
I know that is not so easy to understand my problem, but i try to explain it as the best way.

haddock · May 2009

Hmm, the previous example produces 176 rows which contain the previous 50 values for each of the 3 attributes based on the notion that each 50 row clump is disinct, so just like your series. If you meant that each example is made up of the last 50 values for each attribute then you change the step size to one, like this, where we just look for sequence patterns in att3.


<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function"	value="random"/>
        <parameter key="number_examples"	value="8800"/>
        <parameter key="number_of_attributes"	value="3"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter">
        <parameter key="skip_features_with_name"	value="att1|att2"/>
    </operator>
    <operator name="BinDiscretization" class="BinDiscretization">
        <parameter key="range_name_type"	value="short"/>
    </operator>
    <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
        <parameter key="window_size"	value="50"/>
        <parameter key="step_size"	value="1"/>
    </operator>
    <operator name="W-Apriori" class="W-Apriori">
    </operator>
</operator>

olandesino · May 2009

I know that I'm close (thx to your help) but it is still not sufficient.
Let's put it in a simple way....I've 1 attribute with 150 elements (rows),
and i want to see in result mode on the 'data view' Series1, Series2, Series3
with under them, 50 values of the attributes.

if I do so :

<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes"	value="~/minim.aml"/>
        </operator>
        <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples" breakpoints="after">
            <parameter key="horizon"	value="1"/>
            <parameter key="window_size"	value="50"/>
            <parameter key="step_size"	value="50"/>
            <parameter key="add_incomplete_windows"	value="true"/>
        </operator>

then the output will be (in result mode->data view) : 3 example, 50 attributes' (wrong! I've 1 attribute and 3x50 values)
i tried other series preprocessing operation like "index series" or "Single2series" but it still not what i want.
Meanwhile I want to say that I rally appreciate your help.
A.Florio

haddock · May 2009

Let's put it in a simple way....I've 1 attribute with 150 elements (rows),
and i want to see in result mode on the 'data view' Series1, Series2, Series3
with under them, 50 values of the attributes.

Does the following do it?

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function"	value="random"/>
        <parameter key="number_examples"	value="150"/>
        <parameter key="number_of_attributes"	value="1"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter" breakpoints="after">
        <parameter key="filter_special_features"	value="true"/>
        <parameter key="skip_features_with_name"	value="label"/>
    </operator>
    <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
        <parameter key="window_size"	value="3"/>
        <parameter key="step_size"	value="3"/>
    </operator>
    <operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
        <parameter key="replace_what"	value="att.*-"/>
        <parameter key="replace_by"	value="Series_"/>
    </operator>
</operator>

Hope so! Good weekend.

olandesino · May 2009

In this way, i got 3 columns(ok), but the first one doesn't contains
the first 50 values of my dataset. The values are spread like
a matrix index (1st rows, 2nd rows, ...). how can i tell it to take the first 50 values,
put in the 1st column (1st series), second 50 values, put in 2nd column (2nd series) and so on?
Thank you a lot for your help.

A.Florio

haddock · May 2009

OK, now I see what you mean, at least I hope so! What about this?


<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function"	value="random"/>
        <parameter key="number_examples"	value="150"/>
        <parameter key="number_of_attributes"	value="1"/>
    </operator>
    <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
        <parameter key="window_size"	value="50"/>
        <parameter key="step_size"	value="50"/>
    </operator>
    <operator name="ExampleSetTranspose" class="ExampleSetTranspose">
    </operator>
    <operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
        <parameter key="replace_what"	value="att"/>
        <parameter key="replace_by"	value="Series"/>
        <parameter key="apply_on_special"	value="false"/>
    </operator>
</operator>

olandesino · May 2009

I get this error message when i put my simple dataset with just 1 column (only 1 attribute)
AttributeTypeException
Process failed Message: Cannot map index of nominal attribute to nominal value: index 0 is out of bounds!
Even after a few changes in my dataset, i get always the same error, with out telling me where exactly is in the tree.
What it does mean?

haddock · May 2009

Without seeing the data there is not much I can say.

olandesino · May 2009

This is just a piece of the 1 attribute of my dataset.
Too make things easier, I ignored (for now) other attributes.
It is a series of operations: numerical and nominal, nothing special.

[attachment deleted by admin]

haddock · May 2009

Hi,

I noticed a blank line at the end of your file, so I took that out and then copied and pasted 6 times to end up with 140 rows. In the following example I'm saying a series is 20 rows, so we should have 7 identical columns as series, and we do ;D

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSource" class="ExampleSource" activated="no">
        <parameter key="attributes"	value="C:\Program Files (x86)\Rapid-I\RapidMiner-4.3\simple2"/>
    </operator>
    <operator name="CSVExampleSource" class="CSVExampleSource">
        <parameter key="filename"	value="C:\Users\CJFP\Documents\rm_workspace\simple-2.txt"/>
        <parameter key="read_attribute_names"	value="false"/>
    </operator>
    <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
        <parameter key="window_size"	value="20"/>
        <parameter key="step_size"	value="20"/>
    </operator>
    <operator name="ExampleSetTranspose" class="ExampleSetTranspose">
    </operator>
    <operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
        <parameter key="replace_what"	value="att"/>
        <parameter key="replace_by"	value="Series"/>
    </operator>
</operator>

[attachment deleted by admin]

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

How can RM identify sequences in dataset?

Answers