RAPIDMINER 9.7 BETA ANNOUNCEMENT
The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!
Logistic Regression on large datasets:RapidMiner vs. SAS
Background: I am a consultant working with a customer to replace SAS with Rapidminer studio (not server). Most of the analysts work on developing marketing scorecards (logistic/decision tree).
I have read the most of the informative blogs but please excuse me for re-posting some of the niggling questions
1. SAS vs. Rapidminer: Predictions using the software will not match due to difference in underlying technique. How does Customer validate historical predictions going forward (Model development in SAS but validation using Rapidminer)?
2. Prediction vs. Explanation: My customer uses the beta coefficients and odds ratio to derive insights. In Rapidminer, how will they read and interpret the weights of explanatory variables?
3. Small vs. Large Data set: Customer currently has 1million records and 3000 attributes which is analysed on an 8GB Ram Dell Inspiron 5000 series laptop. Customer is not keen on using sampling/extrapolation route of analysis nor wants to upgrade to server version at this stage of transition (SAS to Rapidminer). What are the alternatives?
a. Pre-processing: What will be the loop/macro design to run step-wise logistic regression?
b. Radoop/Stream Database: Is this an option they can adopt to run logistic regression?