Training set ordering matters??

simplyfat · December 2009

Hi guys,
i'm pretty new to rapidminer and data mining in general and i wanted to talk about a strange behavior i'm seeing with rapidminer 5.0 beta:

as you can see from the screenshot, i am fetching a training set from a mysql database and passing it to a neuronal network. so far war so good.

you can see the training set in the other screenshot:

it is labeled with a value between 0 and 1, which should be predicted afterwards. now the strange thing: i've copy&pasted the training set sql-query from somewhere else and at the end there was still a "ORDER BY `label` DESC". i thought this was rather useless (i am not limiting the number of sql results), so i removed the ORDER BY clause. but that made my prediction worse! far worse! i could not belive it, so i reproduced it many times...

can somebody tell me, why the neuronal network depends on the ordering of its training set and why it is better with DESC than with ASC?

land · December 2009

Hi,
Neuronal Networks depend on the ordering of their training inputs. The training of them works like follows:
1. For each training example x
1.1. Feed x forward
1.2. Calculate error
1.3. Propagate error back and adapt weights of neurons.

So you see, that it heavily depends on the ordering. That it has a better accuracy if you sort it in one way than the other probably is simply random. You might insert some noise attribute with the noise operator and use the sort operator to sort after the noise. You then could compare it several times, varying the random seed of the Noise operator, how the performance behaves.

By the way: This is one of the properties of Neural Nets, why I don't use them...

Greetings,
Sebastian

simplyfat · December 2009

ah! thank you very much. i thought ordering dependency would go away if you have more training cycles than training set entries.

what you do with noise is the same as a "ORDER BY RAND()", right? i tried it and it was bad, too.

what learner would you recommend instead for my purpose? you can see the data set in the screenshot. it is about proteins..

land · December 2009

Hi,
I would recommend to use the LibSVM. You will probably would have to optimize the kernel type and its dependent parameters, but in nearly all cases a tuned SVM is at least as good as a NeuralNet.
And although the linear regression is a very simple learner, it's always worth a try in combination with feature generation. It's quick, so that many iterations can be performed to find a suitable attribute set. And compared with the heavy weight learning schemes like SVM and NeuralNet, it produces understandable results.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Training set ordering matters??

Answers