"Neural Network functioning"

hagen85hagen85 Member Posts: 18 Contributor II
edited June 2019 in Help
Hi there,

I was doing some testing with the neural network operator and was wondering if I am mistaken in my understanding how it works:
Lets assume you have 1000 data instances(rows) where for each woman from a certain region and above the age of 25 a target variable has the value "yes". For men from the same region and of the same age group the target variable is "no". Then you have another 1000 instances where it is vice versa. If I train (no X-Validation) the NNW on the first 1500 instances (ordered) and switch off shuffling in the operator, shouldn´t the network somehow "forget" that women lead to yes? Apparently it is not, because if I apply the model to the 500 instance left the classification rate is very poor. I have tried different learning rates and momentums.

Thank you in advance for your ideas.


  • Options
    hagen85hagen85 Member Posts: 18 Contributor II
    Hi again,

    maybe I have to ask this differently... Normally NNW can be trained in batch or in on-line mode. How can  I use the operator in rapidminer in online mode(presenting one example at a time)?

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Hagen,

    RapidMiner does not (yet) support online or stream learning, but we are planning to release a stream mining framework in the future. After that, the operators and algorithms must be adapted to handle streams.

    Currently, only the Naive Bayes model supports online learning.

  • Options
    hagen85hagen85 Member Posts: 18 Contributor II
    thanks for your reply. I made an observation which seems totally strange to me:
    I have a dataset which contains two concepts which differ significantly.
    I use Sliding-Window-Validation with cumulative learning and a neural net inside.
    What happens now is, if I switch off "shuffle" in the neural net operator it almost perfectly classifies my data, meaning it adjusts to the concept drift.

    I do not understand that at all :). Isn t the error minimized over the whole dataset, which would mean that the effects of both concepts balance each other out?

    I would be very grateful for ideas on that.

  • Options
    hagen85hagen85 Member Posts: 18 Contributor II
    Hi there, me again :-)..
    Sorry for pushing on that, but I am using rapid miner for my thesis and therefore need to ensure that I understand who it works. Regarding my last post: if I have a look at the source code of the ImprovedNeuralNetModel.java method public void train(...

    00114        // optimization loop
    00115        for (int cycle = 0; cycle < maxCycles; cycle++) {
    00116            double error = 0;
    00117            int maxSize = exampleSet.size();
    00118            for (int index = 0; index < maxSize; index++) {
    00119                int exampleIndex = index;
    00120                if (exampleIndices != null) {
    00121                    exampleIndex = exampleIndices[index];
    00122                }
    00124                Example example = exampleSet.getExample(exampleIndex);
    00126                resetNetwork();
    00128                calculateValue(example);
    00130                double weight = 1.0;
    00131                if (weightAttribute != null) {
    00132                    weight = example.getValue(weightAttribute);
    00133                }
    00135                double tempRate = learningRate * weight;
    00136                if (decay) {
    00137                    tempRate /= cycle + 1;
    00138                }
    00140                error += calculateError(example) / numberOfClasses * weight;
    00141                update(example, tempRate, momentum);
    00142            }
    I realized that the update function is called after each example, which also causes all network weights to be updated after one example has been seen? Is this correct? I would be very grateful if someone could confirm if I am right.
Sign In or Register to comment.