Options

# Trying to understand MLP output

herbert12345
Member Posts:

**3**Contributor I
Hi,

I am currently trying to understand the output of the W-MultilayerPerceptron operator. Let us consider a toy model without hidden layers. Output might look like this.

Thanks for your reply

I am currently trying to understand the output of the W-MultilayerPerceptron operator. Let us consider a toy model without hidden layers. Output might look like this.

From my understanding this should be equivalent to a linear regression. So I train a LinearRegression model with the same input data using the results from the above "MLP" as label (in order to rule out differences in the fitting algorithm). Results show that the model indeed reproduces the results from the "MLP" perfectly. The coefficients however are completely different:

Linear Node 0

Inputs Weights

Threshold 0.4052907755005098

Attrib O3 -0.2617907901506467

Attrib NO2 -0.05083306647141619

Attrib Altitude -0.14881316186685326

Attrib z 0.35660878655615114

Attrib sza_rad -0.44846864905805994

Class

Input

Node 0

I assume that this is because of the normalization done in the MLP operator. So here's the question: Assume I want to implement the above "MLP" into my own code: How must I process my data and the results?

- 0.0000070221 * O3

- 0.0000717637 * NO2

- 0.0004435178 * Altitude

+ 0.0003188475 * z

- 0.0040543204 * SZA*pi/180.

+ 0.0145570907

Thanks for your reply

0

## Answers

537MavenA single layer perceptron starts with random weights.

Takes a single data points.

Propagates the input forward in the network.

Calculates the error.

Finds the weight gradient that minimizes the error.

Moves the weights in the direction of the gradient according to the learning speed.

Repeat.

Linear regression calculates the optimal weights in closed form.

At data normalisation.

The Neural Net has the option to turn of the data normalisation.

I think you could also normalise your data, so nothing changes.

using: (value - min) / (max - min)

3Contributor II understand that they might go different ways to obtain their weights. But assuming a fair amount of convergence, the weights should end up being about the same. Up to normalization that is. Indeed I manage to make them the same by turning on the "I" and "C"-options in the W-MLP operator.

I think I have managed to understand how things work by now. The problem was in part caused by a misunderstanding of mine as to how things work. Still it troubles me that the W-MLP output is not complete in the sense that the normalization employed is not documented. (I believe now that it normalizes both attributes and labels to the interval [-1,1] using 2*(value-min)/(max-min)-1).

What bothers me though is that my final model (i.e. with hidden layers) appears to have a certain bias. Well, I guess I can fix that.

Thanks for helping

537Maven2*(value-min)/(max-min)-1 [-1,1]

When the normal sigoid, which is 1 / 1 + exp(-x) is used, its normalised to

(value-min)/(max-min) [0, 1]

This is indeed poorly documented.

Should I take a look in WEKA's source code? Or the RM source code?

What you mean that final model have a certain bias?

Don't all learners have a certain bias?

edit:

this link very shortly mentions normalisation:

http://en.wikiversity.org/wiki/Learning_and_neural_networks

3Contributor IAbout the bias: Looking closer I see that for some reason the prediction is actually wrong by a linear map, that is I get good correlations (as in 0.999...) but scatter plots show that the model is rather off. This could easily be fixed by applying a linear model in post of course but I think it is strange. d

Edit: My fault. Shouldn't wonder about offsets if training data and validation data are processed in different ways ... :-[