"Linear regression beats ANN"

chaosbringer · March 2012

Hi,
i have a dataset consisting of 1000 samples and 19 attributes. The data is housing data (living area, presence of heating, bath, neighborhood characteristics, etc). The target value is house price. The dataset has 8 binary attributes.
If i apply linear regression to this dataset the results are far superior to an ANN, although from my understanding the data is to complex for linear regression.
Also decission trees and SVM are inferior to linear regression.
Have you some advice, how i can validate the results and check why linear regression is that good?

Thank you very much.

wessel · March 2012

Use cross validation.
A linear model with 19 parameters is still a fairly complex model.

chaosbringer · March 2012

Hi,
thank you for your answer.

Yes, the data is still complex. But i still do not understand, why ANN is so bad.
Even with cross validation if get:
MSQE with lin. reg: 0.34
MSQE with ANN: 0.54

Is there an explanation for this? How can i shed some light into the details? Why is ANN such bad in comparisson to lin. reg?

Thank you very much.

wessel · March 2012

Make a convergence plot.
E.g. measure the RMSE at every iteration.
Maybe you need to train your network for many more iterations.
With 19 inputs, your network gets very big, very fast, so you have lots of weights to optimize.
An alternative problem could be premature convergence, e.g. getting stuck in local optima.

Best regards,

Wessel

chaosbringer · March 2012

Hi,
tank you, that helped. Fiddling with the parameters improved the situation significantly.
However, another problem raises:
T-Test says, that the means are the same (p=1,0).
If i test-wise modify the parameters of the neural net to produce a realy bad result, the t-test still return 1.
How can it be, that the t-test returns 1, even though the RMSEs are very different (0.5 vs 0.34)?

Thank you

fikio · March 2012

chaosbringer wrote:

Hi,
i have a dataset consisting of 1000 samples and 19 attributes. The data is housing data (living area, presence of heating, bath, neighborhood characteristics, etc). The target value is house price. The dataset has 8 binary attributes.
If i apply linear regression to this dataset the results are far superior to an ANN, although from my understanding the data is to complex for linear regression.
Also decission trees and SVM are inferior to linear regression.
Have you some advice, how i can validate the results and check why linear regression is that good?

Thank you very much.

I come from a statistical background, to clarify, your dataset has 1000 observations and 19 variables, 8 of which are binary? Why do you believe that the data is too complex for linear regression, have you looked at variables univariately with a scatterplot or done some modeling to determine that there are nonlinear relationships?

I am used to evaluating models with AUCs, what numbers are you getting? Is MSQE the mean squared error?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Linear regression beats ANN"

Answers