"Performance Operator presents wrong value in the table view and description"

Alexey · February 2015

Somehow I ran into following situation:
after applying a model trained with a decision tree in a cross validation the performance result is looking very "strange". Suddenly on the precision and recall view the rows and the columns are changed causing all the values as precision and recall to become (100% - x).

I attach the pictures from the view. This happened with the recent Rapidminer version downloaded from the website on my Macbook Pro with Mac OSX 10.10.1.

Accuracy View

Recall View

Description View

MartinLiebig · February 2015

Hi Alexey,

Could you try to use

1. Reorder attributes right in front of the apply model and the training of the model
2. use remap binominal before the cross validation

Do you do anything special inside the x-val which could change the meta-data (Append, Union,...)?

By the way - are you working on a IACT like HESS, Magic or Veritas?

Best,

Martin

Alexey · February 2015

Hey,

I've tried both, reordering the attributes and remapping binomial before the cross validation. In both cases nothing changed the output.

In the cross validation I just learn the decision tree, apply the model, select the recall, apply the threshold and then calculate the performance. I suppose what is causing the problem is sample(bootstrapping). As we have less examples from one class, I was trying to use bootstrapping in order to get somehow similar amount of both classes and then training the model. This was just a try and it didn't really worked that well as expected, but never mind. The way was as following: get all examples of one class, use Sample(bootstrapping), use Union with the other unmatched data and then sampling data for training from that unified data. Just in case of bootstrapping I get this strange result.

Yes, I'm working with the FACT data, somehow base on the work from Marius Helf

Asking in english, as everything here is in english and maybe someone else would run into the same configuration.

MartinLiebig · February 2015

Hi

The problem is not the sample but the union. The Union is changing the meta data and then it might be, that the labels are switched in there internal representation. You might put a remap binominal after the union and before the x-val and map them by hand to the internal positive/negative values.

If it does not work, try, to use the simple sample operator. There you can use "balance classes" and define the ratios for gamma and proton seperatly. This should do the trick

.
Did you try to use weights? Should be fine for a decision tree.

Ohh it's FACT :-). I looove the project. As you might know i did my phd on icecube but was "a bit" involded in the data analysis of FACT (because of the coffee machine).
Are you at Wolfgangs or Katharinas chair? In the physics department it is always useful to talk with Tim about the problems.

Best,

Martin

MariusHelf · February 2015

Hi Alexey, nice to see that my work is finally being reused. So investing in your scholarship finally pays off

Good luck for your thesis and happy mining!
~Marius

Alexey · February 2015

Martin Schmitz wrote:

The problem is not the sample but the union. The Union is changing the meta data and then it might be, that the labels are switched in there internal representation. You might put a remap binominal after the union and before the x-val and map them by hand to the internal positive/negative values.

I've tried this trick, but this doesn't solve the problem.

Martin Schmitz wrote:

If it does not work, try, to use the simple sample operator. There you can use "balance classes" and define the ratios for gamma and proton seperatly. This should do the trick .
Did you try to use weights? Should be fine for a decision tree.

I've already used the "normal" sampling. Using bootstrapping was just a thought of how to get some similar amount of proton data. There is about 100k gamma examples and 40k proton examples. I was thinking of using more examples while still holding gamma and proton at the same level (50/50). But this seems to lead to some problems. I'm still not sure why these happens.

Martin Schmitz wrote:

Ohh it's FACT :-). I looove the project. As you might know i did my phd on icecube but was "a bit" involded in the data analysis of FACT (because of the coffee machine).
Are you at Wolfgangs or Katharinas chair? In the physics department it is always useful to talk with Tim about the problems.

I'm new on this project, but I've heard of it before and found it pretty amazing.

Marius wrote:

Hi Alexey, nice to see that my work is finally being reused. So investing in your scholarship finally pays off

I'm not the first reusing your work!

The scholarship was definitely a really good help, though I wasn't able to handle an internship or something similar. This is still not my thesis, but just work. But who knows, how it all ends up!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Performance Operator presents wrong value in the table view and description"

Answers