"Process log with 2 performance vectors"

ChristianChristian Member Posts: 2 Contributor I
edited May 23 in Help
Hi,

I have a problem using the ProcesLog operator to log values from 2 performance vectors. I use a SlidingWindowValidation to validate the performance of a learner on a iteratively growing training set. In each iteration step I want to log the current number of training samples and the performance of the model learned out the current training samples.

This is my process tree (iris data is just an example and the Sorting operator is only used here to remove the sorted-by-class order):

<operator name="Root" class="Process" expanded="yes">
    <operator name="ArffExampleSource" class="ArffExampleSource">
        <parameter key="data_file" value="iris.arff"/>
        <parameter key="label_attribute" value="class"/>
    </operator>
    <operator name="Shuffle sample order" class="Sorting">
        <parameter key="attribute_name" value="sepalwidth"/>
    </operator>
    <operator name="Validation" class="SlidingWindowValidation" expanded="yes">
        <parameter key="keep_example_set" value="true"/>
        <parameter key="training_window_width" value="10"/>
        <parameter key="training_window_step_size" value="1"/>
        <parameter key="test_window_width" value="5"/>
        <parameter key="cumulative_training" value="true"/>
        <operator name="OperatorChain" class="OperatorChain" expanded="yes">
            <operator name="Number of samples" class="Data2Performance" activated="no">
                <parameter key="keep_example_set" value="true"/>
            </operator>
            <operator name="NaiveBayes" class="NaiveBayes">
            </operator>
        </operator>
        <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
            <operator name="ModelApplier" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="Performance" class="Performance">
            </operator>
            <operator name="Log window validation" class="ProcessLog">
                <parameter key="filename" value="validation.log"/>
                <list key="log">
                  <parameter key="sw-acc" value="operator.Performance.value.performance"/>
                  <parameter key="sw-nt" value="operator.Number of samples.value.performance"/>
                </list>
                <parameter key="persistent" value="true"/>
            </operator>
        </operator>
    </operator>
</operator>

The problem is that the two performance vectors (created by 'Number of samples' and 'Performance') seem to get merged and the 2 values 'sw-acc' and 'sw-nt' logged by the ProcessLog operator both have the same value, that is the number of training samples.

If I disable the operator 'Number of samples', then the logged value 'sw-acc' has the the accuracy result from the current validation, which is what I want, but of course 'sw-nt' is not set then.

2 questions:
1) Why does the process log save the same value for both fields ('sw-acc' and 'sw-nt') though it explicitly states to different operators to get the values from?
2) How can I log the number of samples and the validation result at the same time?

Thanks and regards,
Christian
Tagged:

Answers

  • haddockhaddock Member Posts: 849  Guru
    Hi Christian,

    Is this the sort of thing you were after?
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random classification"/>
        </operator>
        <operator name="Validation" class="SlidingWindowValidation" expanded="yes">
            <parameter key="keep_example_set" value="true"/>
            <parameter key="training_window_width" value="10"/>
            <parameter key="training_window_step_size" value="1"/>
            <parameter key="test_window_width" value="5"/>
            <parameter key="cumulative_training" value="true"/>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="DataMacroDefinition" class="DataMacroDefinition">
                    <parameter key="macro" value="Exs"/>
                </operator>
                <operator name="NaiveBayes" class="NaiveBayes">
                </operator>
            </operator>
            <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Performance" class="Performance">
                </operator>
                <operator name="Log window validation" class="ProcessLog">
                    <parameter key="filename" value="validation.log"/>
                    <list key="log">
                      <parameter key="Performance" value="operator.Performance.value.performance"/>
                      <parameter key="Examples" value="operator.DataMacroDefinition.value.macro_value"/>
                    </list>
                    <parameter key="persistent" value="true"/>
                </operator>
            </operator>
        </operator>
    </operator>
  • ChristianChristian Member Posts: 2 Contributor I
    Hi haddock,

    yes, using the DataMacroDefinition operator to access the number of samples solved my problem.

    Many thanks!

    Christian
Sign In or Register to comment.