"time series attribute selection dimensionality problem"

wesselwessel Member Posts: 537 Maven
edited May 2019 in Help
Dear All,

I wish to use CFSFeatureSetEvaluator to remove a lot of irrelevant attributes.
Because I have a dataset of more then 20 attributes, and I'm using a MultivariateSeries2WindowExamples with window size 96,
I end up with 20 * 96 windowed attributes.

problem:
CFSFeatureSetEvaluator can not handle so many attributes.

solution?
Apply CFS 20 times, to all windowed examples of the same type.
So for example on all attributes with name attribute_one-.*
Then do this again for attributes with name attribute_two.*


I been trying out different xml set-ups, but I don't want to post them just yet, because it might be confusing..


Thanks in advance,

Regards,

Wessel
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Wessel,
    that's probably a fine process building work, but I doubt if it yields very good results. As far as I know, the CFS works on correlations? Then probably attributes from near timepoints are highly correlated and removed.
    At least you should use the WindowExamples2ModelingData in order to transform your data into relative changes instead of the absolute values. This is always worth a try on series prediction.
    But I personally prefer using learning algorithms and XValidation in order to evaluate a feature subset instead of heuristics...

    Generally you have to consider if removing an attribute reflecting the value x-days before is of much use. Because if the day -6 is important, this value is day - 7 the next example...


    Greetings,
      Sebastian
  • wesselwessel Member Posts: 537 Maven
    all attributes from near time points, attribute_name-[1...24] are already removed, since I'm doing 24 hours ahead prediction.

    CFS does yield good results, when I do it by hand.
    Problem: I don't understand how to automate it in Rapid-Miner.

    WindowExamples2ModelingData, yes good idea.
    But I want to try CFS first :(

    Regards,

    Wessel
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Wessel,
    I think this problem can be solved using the ForwardSelectionOperator, but I assume it's not included in 4.4 and will become part of the upcoming 4.5.
    So solving for this problem, you could use the AttributeSubsetPreprocessing. I will post a process below.
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="sum classification"/>
            <parameter key="number_examples" value="500"/>
        </operator>
        <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
            <parameter key="horizon" value="1"/>
            <parameter key="window_size" value="5"/>
            <parameter key="label_attribute" value="att1"/>
        </operator>
        <operator name="IOStorer (2)" class="IOStorer">
            <parameter key="name" value="ExampleSet"/>
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="IOConsumer" class="IOConsumer">
        </operator>
        <operator name="ParameterIteration" class="ParameterIteration" expanded="yes">
            <list key="parameters">
              <parameter key="AttributeSubsetPreprocessing.attribute_name_regex" value="att1.*|label,att2.*|label,att3.*|label,att4.*|label,att5.*|label"/>
            </list>
            <operator name="IORetriever" class="IORetriever">
                <parameter key="name" value="ExampleSet"/>
                <parameter key="io_object" value="ExampleSet"/>
            </operator>
            <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="no">
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="attribute_name_regex" value="att5.*|label"/>
                <parameter key="process_special_attributes" value="true"/>
                <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
                    <parameter key="name" value="label"/>
                    <parameter key="target_role" value="label"/>
                </operator>
                <operator name="FeatureSelection" class="FeatureSelection" expanded="yes">
                    <operator name="CFSFeatureSetEvaluator" class="CFSFeatureSetEvaluator">
                    </operator>
                </operator>
            </operator>
            <operator name="IOStorer" class="IOStorer">
                <parameter key="name" value="ExampleSet"/>
                <parameter key="io_object" value="ExampleSet"/>
            </operator>
        </operator>
        <operator name="IORetriever (2)" class="IORetriever">
            <parameter key="name" value="ExampleSet"/>
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
    </operator>

    Greetings,
      Sebastian
  • wesselwessel Member Posts: 537 Maven
    Hey Land,

    I got your parameter iteration to work when attributes are really nicely named,
    but I can't practically implement it on my problem.

    I uploaded my dataset here:
    http://student.science.uva.nl/~wluijben/workfile.csv

    I tried to make my xml file as nice as I possibly could.
    It makes a prediction for wind, using a very small window size of 49 hours, with 23 horizon attributes removed.
    If I want to make the window size bigger, attribute selection gets really really slow! :(

    Any suggestions?
    Maybe your previous derivative + smoothing to reduce the number of attributes?

    Regards,

    Wessel

    Current output:
    absolute_error: 4.095 +/- 3.096 (mikro: 4.095 +/- 3.096)

    Weights:
    wk1_kn-27 1.0
    wk1_kn-26 1.0
    wk1_kn-25 1.0
    wk1_kn-24 1.0
    wind-24 1.0
    dampdruk-48 1.0
    dampdruk-44 1.0
    gewasverdamping-46 1.0
    gewasverdamping-45 1.0
    gewasverdamping-44 1.0
    gewasverdamping-35 1.0
    systime_kn_week 1.0
    systime_kn_month 1.0



    <operator name="Root" class="Process" expanded="yes">
        <operator name="bug fix" class="CSVExampleSource" activated="no">
            <parameter key="filename" value="D:\wessel\Desktop\please.csv"/>
        </operator>
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="workfile.csv"/>
        </operator>
        <operator name="51-550" class="ExampleRangeFilter">
            <parameter key="first_example" value="51"/>
            <parameter key="last_example" value="550"/>
        </operator>
        <operator name="convert systime_kn string to time object" class="Nominal2Date">
            <parameter key="attribute_name" value="systime_kn"/>
            <parameter key="date_type" value="date_time"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="Europe/Brussels"/>
        </operator>
        <operator name="id: systime" class="ChangeAttributeRole">
            <parameter key="name" value="systime_kn"/>
            <parameter key="target_role" value="id"/>
        </operator>
        <operator name="window size 26" class="MultivariateSeries2WindowExamples">
            <parameter key="horizon" value="1"/>
            <parameter key="window_size" value="49"/>
            <parameter key="create_single_attributes" value="false"/>
        </operator>
        <operator name="label: wind-0" class="ChangeAttributeRole">
            <parameter key="name" value="wind-0"/>
            <parameter key="target_role" value="label"/>
        </operator>
        <operator name="remove horizon attributes 0-23" class="FeatureNameFilter">
            <parameter key="skip_features_with_name" value=".*-([0-9]|1[0-9]|2[0-3])"/>
        </operator>
        <operator name="hour relative to day" class="Date2Numerical">
            <parameter key="attribute_name" value="systime_kn"/>
            <parameter key="time_unit" value="hour"/>
            <parameter key="keep_old_attribute" value="true"/>
        </operator>
        <operator name="week relative to year" class="Date2Numerical">
            <parameter key="attribute_name" value="systime_kn"/>
            <parameter key="time_unit" value="week"/>
            <parameter key="keep_old_attribute" value="true"/>
        </operator>
        <operator name="month relative to year" class="Date2Numerical">
            <parameter key="attribute_name" value="systime_kn"/>
            <parameter key="time_unit" value="month"/>
            <parameter key="keep_old_attribute" value="true"/>
        </operator>
        <operator name="day relative to month" class="Date2Numerical">
            <parameter key="attribute_name" value="systime_kn"/>
            <parameter key="time_unit" value="day"/>
            <parameter key="keep_old_attribute" value="true"/>
        </operator>
        <operator name="FeatureSelection" class="FeatureSelection" expanded="yes">
            <operator name="CFSFeatureSetEvaluator" class="CFSFeatureSetEvaluator">
            </operator>
        </operator>
        <operator name="28 atts" class="SlidingWindowValidation" expanded="yes">
            <parameter key="training_window_width" value="240"/>
            <parameter key="test_window_width" value="1"/>
            <parameter key="horizon" value="24"/>
            <parameter key="average_performances_only" value="false"/>
            <operator name="LinearRegression" class="LinearRegression">
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <parameter key="keep_model" value="true"/>
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="RegressionPerformance" class="RegressionPerformance">
                    <parameter key="keep_example_set" value="true"/>
                    <parameter key="absolute_error" value="true"/>
                    <parameter key="correlation" value="true"/>
                </operator>
                <operator name="ExceptionHandling" class="ExceptionHandling" expanded="yes">
                    <operator name="IORetriever (2)" class="IORetriever">
                        <parameter key="name" value="fullset"/>
                        <parameter key="io_object" value="ExampleSet"/>
                    </operator>
                </operator>
                <operator name="ExampleSetMerge" class="ExampleSetMerge">
                </operator>
                <operator name="IOStorer (2)" class="IOStorer">
                    <parameter key="name" value="fullset"/>
                    <parameter key="io_object" value="ExampleSet"/>
                </operator>
            </operator>
        </operator>
        <operator name="ouput me" class="IORetriever">
            <parameter key="name" value="fullset"/>
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="regular: systime" class="ChangeAttributeRole">
            <parameter key="name" value="systime_kn"/>
        </operator>
    </operator>
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    the growing slowness probably results from an algorithm with quadratic runtime in the number of attributes...Unfortunately you cannot do anything about that beside buying a faster computer...

    Greetings,
      Sebastian
  • wesselwessel Member Posts: 537 Maven
    @ AttributeSubsetPreprocessing
    Ehm, or be more smart? :P
    Is there any way I use AttributeSubsetPreprocessing to take the first n attributes?
    Or split the number of attributes into n subsets?


    Can I save my attribute selection from the last go?
    So I don't have to run it every time?
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    hmm, it is not designed to do this, but you might use regular expressions for specifiying the attributes. Perhabs this already suits your needs?

    Hmm, you could save the resulting example set and lateron merge all of the sets. Just use the exampleSetWriter inside the loop. If you add the predefined macro %{a} into the filename, it will be replaced with the number of application of the current operator. That's the way you can avoid overwritting previous results in loops.

    Greetings,
      Sebastian
Sign In or Register to comment.