🥳 RAPIDMINER 9.9 IS OUT!!! 🥳

The updates in 9.9 power advanced use cases and offer productivity enhancements for users who prefer to code.

CLICK HERE TO DOWNLOAD

Process Log

brenniebrennie Member Posts: 7 Contributor II
edited November 2018 in Help
Hi
I have a process set up such that I am using a SlidingWindowValidation to validate a GridParameterOptimization.  I would like capture various performance statistics using BinominalClassificationPerformance in average for each run through the full sliding window, ie the average for each parameter set.  I can only seem to capture either the last window for the last parameter set or every window for every parameter set using ProcessLog

Is there any way to collect the average performance?  Going a bit further, is it possible to collect the best and worst performing window within a parameter set?

Thanks for your help.

Brent

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,750  RM Founder
    Hi Brent,

    hmm, I am not sure if I totally got the point. Could you please post the XML process setup here (from the XML tab) so I can see what you intend to do and check if and how this is possible.

    Thanks and cheers,
    Ingo
  • brenniebrennie Member Posts: 7 Contributor II
    Hi
    Please see the XML below.  To clarify, I have a training set with 1140 examples, have a training window of 240 and test window of 60.  Hence the total testing size would be 1140 - 240 = 900 data points.  There would be 15 iterations of the window (900 / 60).  I would like to inspect the TP, FP, TN and FN data for the full 900 data points in the process log for each parameter set. (apologies if I confused when I said average in the post below).

    The process below only delivers what appears to be the last window in the process log.  The total of TP, FP, TN, FN comes to 60.  If I move the process log operater directly under the classification performance I get process log data for each of 15 * 5 parameter set windows.  The total for these is ultimately what I am after but slows things down a bit. 

    I hope this makes it clear.  Thanks for your help.

    By the way, how do I copy the xml into a window like I see in other posts?

    Brent

    <operator name="Root" class="Process" expanded="yes">
        <parameter key="logverbosity" value="status"/>
        <operator name="ExcelExampleSource" class="ExcelExampleSource" breakpoints="after">
            <parameter key="excel_file" value="C:\Documents and Settings\BRENT\My Documents\rm_workspace\Training data\RM training ^GSPC v01.xls"/>
            <parameter key="first_row_as_names" value="true"/>
            <parameter key="id_column" value="1"/>
            <parameter key="label_column" value="2"/>
        </operator>
        <operator name="GridParameterOptimization" class="GridParameterOptimization" expanded="yes">
            <list key="parameters">
              <parameter key="LibSVMLearner.C" value="[0.0;20.0;4;linear]"/>
            </list>
            <operator name="SlidingWindowValidation" class="SlidingWindowValidation" expanded="yes">
                <parameter key="test_window_width" value="60"/>
                <parameter key="training_window_width" value="240"/>
                <operator name="LibSVMLearner" class="LibSVMLearner">
                    <parameter key="C" value="20.0"/>
                    <list key="class_weights">
                    </list>
                    <parameter key="gamma" value="1.0"/>
                </operator>
                <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier" class="ModelApplier">
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="BinominalClassificationPerformance" class="BinominalClassificationPerformance">
                        <parameter key="f_measure" value="true"/>
                        <parameter key="false_negative" value="true"/>
                        <parameter key="false_positive" value="true"/>
                        <parameter key="main_criterion" value="f_measure"/>
                        <parameter key="precision" value="true"/>
                        <parameter key="recall" value="true"/>
                        <parameter key="true_negative" value="true"/>
                        <parameter key="true_positive" value="true"/>
                    </operator>
                </operator>
            </operator>
            <operator name="ProcessLog" class="ProcessLog">
                <list key="log">
                  <parameter key="precision" value="operator.BinominalClassificationPerformance.value.precision"/>
                  <parameter key="recall" value="operator.BinominalClassificationPerformance.value.recall"/>
                  <parameter key="f measure" value="operator.BinominalClassificationPerformance.value.f_measure"/>
                  <parameter key="TP" value="operator.BinominalClassificationPerformance.value.true_positive"/>
                  <parameter key="FP" value="operator.BinominalClassificationPerformance.value.false_positive"/>
                  <parameter key="TN" value="operator.BinominalClassificationPerformance.value.true_negative"/>
                  <parameter key="FN" value="operator.BinominalClassificationPerformance.value.false_negative"/>
                  <parameter key="SVM - C" value="operator.LibSVMLearner.parameter.C"/>
                  <parameter key="SVM - gamma" value="operator.LibSVMLearner.parameter.gamma"/>
                </list>
            </operator>
        </operator>
    </operator>
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,750  RM Founder
    Hi,

    still not sure if I got you right (sorry) but are you looking for the logged cumulated performance values (cumulated for all 900 data points, i.e. only one value for the whole set) for each parameter combination? Hence, for only optimizing C the result should look like

    precision                          recall                                f_measure                      SVM-C SVM-Gamma
    0.9455537425537426 0.9610703843618139 0.9525370970632324 0.0 1.0
    0.9428607226107226 0.9531385952706045 0.94727680367819 5.0 1.0
    0.9324453056797544 0.940922229744281 0.935745777848474 10.0 1.0
    0.9340506858209258 0.9291517779738292 0.9303849062070936 15.0 1.0
    0.9266490907172111 0.9194995150116083 0.9216341147207373 20.0 1.0


    I only tried it for RM 4.2 but the following setup produced this:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="number_examples" value="1140"/>
            <parameter key="number_of_attributes" value="1"/>
            <parameter key="target_function" value="sum classification"/>
        </operator>
        <operator name="NoiseGenerator" class="NoiseGenerator">
            <list key="noise">
            </list>
            <parameter key="random_attributes" value="1"/>
        </operator>
        <operator name="GridParameterOptimization" class="GridParameterOptimization" expanded="yes">
            <list key="parameters">
              <parameter key="LibSVMLearner.C" value="[0.0;20.0;4;linear]"/>
            </list>
            <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                <operator name="SlidingWindowValidation" class="SlidingWindowValidation" expanded="yes">
                    <parameter key="test_window_width" value="60"/>
                    <parameter key="training_window_width" value="240"/>
                    <operator name="LibSVMLearner" class="LibSVMLearner">
                        <parameter key="C" value="20.0"/>
                        <list key="class_weights">
                        </list>
                        <parameter key="gamma" value="1.0"/>
                    </operator>
                    <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                        <operator name="ModelApplier" class="ModelApplier">
                            <list key="application_parameters">
                            </list>
                        </operator>
                        <operator name="BinominalClassificationPerformance" class="BinominalClassificationPerformance">
                            <parameter key="f_measure" value="true"/>
                            <parameter key="main_criterion" value="f_measure"/>
                            <parameter key="precision" value="true"/>
                            <parameter key="recall" value="true"/>
                        </operator>
                    </operator>
                </operator>
                <operator name="ProcessLog" class="ProcessLog">
                    <list key="log">
                      <parameter key="precision" value="operator.SlidingWindowValidation.value.performance1"/>
                      <parameter key="recall" value="operator.SlidingWindowValidation.value.performance2"/>
                      <parameter key="f measure" value="operator.SlidingWindowValidation.value.performance"/>
                      <parameter key="SVM - C" value="operator.LibSVMLearner.parameter.C"/>
                      <parameter key="SVM - gamma" value="operator.LibSVMLearner.parameter.gamma"/>
                    </list>
                </operator>
            </operator>
        </operator>
    </operator>
    Please note, however, that you would have to repeat the process for other criteria if you want to log more than 3.

    Another side note: as long as you did not perform some inner windowing or some time lag introduction into your data source you might consider to embed a windowing inside of the validation.

    By the way, how do I copy the xml into a window like I see in other posts?
    There should be an icon in the message editing with a "#" symbol on it. Pressing it will insert the tags {code} and {\code} (please note that the real tags have to be written with [ and ] instead of { and }). Just put your XML code in between.

    Cheers,
    Ingo
  • brenniebrennie Member Posts: 7 Contributor II
    Hi Ingo
    Thanks for quick response.  Your comment  - are you looking for the logged cumulated performance values (cumulated for all 900 data points, i.e. only one value for the whole set) for each parameter combination?  Yes this is correct.  I have run the process below on 4.2 and get the same precision and recall numbers (f measure is different???).  I then add TP, FP, TN, FN to the process log expecting that they will total 900 for each parameter set.  Instead total is 60 which leads me to believe that this is only in relation to the last sliding window, (I confirmed this by running process log for each window and matching up last window for each parameter set). This also makes me suspicious that the precision, recall, etc are calculated on only the last window instead of full 900.

    Let me know if I'm wrong here.

    Thanks

    Brent
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,750  RM Founder
    Hi Brent,

    but you did notice the slight difference in the setups, yes? I admit they are sooo sublte...  ;)

    I was not logging the results from the BinominalClassificationPerformance but from the the SlidingWindowValidation (via the generic performance names "performance1", "...2", and "...3"). The sliding window validation reports the total, the performance operator only the last calculated (what should it report else?). Please adapt your setup accordingly and you should get the total results like in my example above.

    Cheers,
    Ingo
  • brenniebrennie Member Posts: 7 Contributor II
    Hi
    I think I understand now - the performance operator reports only the last window calculated.

    Thanks
    Brent
Sign In or Register to comment.