I ran a multiple regression model on a dataset having 15 variables first using the "forward selection" nested operator, and then using the "backward elimination" nested operator. I got dramatically different models. the first had 3 independent variables, the second had 8 IVs. why such a bid difference. I realize the serial elimination or addition of IVs may yield local optima, but is it common to get such wildly different "optimal" models for the same dataset? How can training yield such dramatically different trained models?
thanks in advance,