I'm doing a linear regression with a data set that have about 500k rows and 66 attributes, I'm running rapidminer on a windows os, rapidminer is using 8 gb of mem only for itself and a processor xeon 2.4GHz. These are my problems:
First: The process takes about 40 minutes to finish, it seems a lot of time compared with other tools I've used
Second and more important: in the values of the p-values and std error and some other metrics I get an "?" (question mark), I don't know what that means and I starting to think that is something wrong with rm. I'm including a picture with the results
RapidMiner's Linear Regression does not only do the actual regression, but also eliminates colinear features, performs a feature selection etc. This actually can take quite some time and often improves the model quality, but you can try to switch it off and see how the runtime is affected. Out of curiosity, which other tools are you using and how long do they need for your dataset?
The issue regarding the missing values has been forwarded to our development team.