RapidMiner

RapidMiner

Bug in Saving Performance Vector

Contributor II

Bug in Saving Performance Vector

Hi altogether,

I work in an WindowsXp environment and try to evaluate the performance of a tree classifer using xvalidation.
Actually everything runs as long as I don't want to save the performance vector. Than the Rapidminer crashes and refuses to continue to work. It happens also with some of the provided samples.
However, the files seem to get saved propoerly.

Is there a way to fix it and if how?

Best

Norbert
11 REPLIES
Regular Contributor

Re: Bug in Saving Performance Vector

Hello Norbert

From this point of view it is quite hard to analyse the error. Do you can post some more details please ?
Here are some hints:

  • Switch to the XML-Tab and copy the content to the forum to gain the process setup

  • go to header->Tools->Preferences->and activate rapidminer.general.debugmode. Then run the process again, which lead to a more detailed error message you can send as bug report or post here,too



greetings

Steffen
Moderator

Re: Bug in Saving Performance Vector

Hi Norbert,

in addition to Steffens remarks I would like to ask which RapidMiner version do you use? As far as I remember, quite a lot data had been saved in a performance vector in older releases which resulted in a large runtime or other inconvenient behaviour. It might therefore be possible, that an update to the newest RapidMiner version solves your problem, if you do not already updated RapidMiner.

Regards,
Tobias
Contributor II

Re: Bug in Saving Performance Vector

Hi Steffen and Tobias,

Thank you for the quick response
attached you find the layout of the process:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSource" class="ExampleSource" breakpoints="after">
        <parameter key="attributes" value="D:\ZID_daten\weka\GemBRD\EqSizeBin\NormVar\BRD_NormGemClusterOutL1.aml"/>
        <parameter key="sample_ratio" value="0.1"/>
    </operator>
    <operator name="MultipleLabelIterator" class="MultipleLabelIterator" expanded="yes">
        <operator name="XValidation" class="XValidation" expanded="yes">
            <parameter key="create_complete_model" value="true"/>
            <parameter key="sampling_type" value="shuffled sampling"/>
            <operator name="DecisionTree" class="DecisionTree">
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Performance" class="Performance">
                </operator>
            </operator>
        </operator>
    </operator>
    <operator name="AverageBuilder" class="AverageBuilder">
    </operator>
</operator>

> in addition to Steffens remarks I would like to ask which RapidMiner version do you use?

I use the Version 4.1 however the performance vector has a size of 6 MB which looks rather huge to me compared to the amount of data available in the output (confusion matrix, kappa- statistics and overall accurancy)

> go to header->Tools->Preferences->and activate rapidminer.general.debugmode. Then run the process again, which lead to a more detailed error message you can send as bug report or post here,too


I did as proposed. RM still refuse to transmit a single bit of data after the "save" button of perfomance vector has been pushed. Therefore I can give you a more detailed bug report.

Best Norbert
RMStaff

Re: Bug in Saving Performance Vector

Hello,

I would like to suggest that you try out the latest version 4.2 which is available on our web site now:

http://rapid-i.com


The written performance vectors are much smaller now and as far as I remember the writing mode was also changed. So probably this problem is no longer there in the latest version (at least I cannot reproduce it).

Cheers,
Ingo
Contributor II

Re: Bug in Saving Performance Vector

Hi,

I downloaded 4.2 and tested it on several of our computers. But the problem is at least partially still present.
If I save the Performance vector manually RI gets stuck in an endless loop and the resulting .per file has a size of 6 MB.
If I use the respective IO container and integrate the saving process in the program everything runs smoothly and the .per file has a size of 23KB.

Perhaps this information will help you to find the bug.

Best,

Norbert

Moderator

Re: Bug in Saving Performance Vector

Hi Norbert,

if I understand you right, the behaviour is correct when you save the performance vector via the [tt]IOContainerWriter[/tt] but not when saving the performance vector manually by clicking on the button in the GUI (when the performance is shown)? What happens when you save the performance vector by the [tt]PerformanceWriter[/tt] operator?

Normally, at least the last to ways should lead to the same files with the same sizes ... if they do not, this is indeed a bug and we will try to fix this as soon as possible.

Regards,
Tobias
Contributor II

Re: Bug in Saving Performance Vector

Hi Tobias,

Sorry for using the wrong terminology. But essentially you got my point.
Everything is fine when I use the PerformanceWriter but when I try to save manually via the GUI the system crashes.

Norbert
Moderator

Re: Bug in Saving Performance Vector

Hi Norbert,

no need to apologize, I just wanted to check if I understood you right!  Smiley Wink
We will have a look at the problem and post again when we solved the problem. Until then please use the workaround by saving the performance vector using the [tt]PerformanceWriter[/tt].

Regards,
Tobias
RMStaff

Re: Bug in Saving Performance Vector

Hi,

we have tried it with both possibilities: using the PerformanceWriter operator and pressing the "Save..." button in the results view. It worked perfectly well for both cases. Does the problem also occurs if you de-activate "use_example_weights" in the performance operator?

Cheers,
Ingo