RapidMiner vs. SPSS Modeler

ndf-ls
Hello Community,
i am currently working on a comparison of the two mentioned data mining tools. The comparison should refer on the features of the softwares, as well as on the performance (for example for huge amount of datasets). Also, it would be interesting to know whether additional features can be implemented in RapidMiner?!
As i am still a newbie, do not know RapidMiner very well and SPSS not at all, i was hoping to get some information out of this forum.

Thank you very much in advance!


    imwaldverirrt
    I think you'd have to download trial of SPSS (http://www-01.ibm.com/software/analytics/spss/downloads/) and try to build some models in RM and SPSS yourself and then compare ;-)
    maybe that'll help you as well http://link.springer.com/chapter/10.1007%2F978-3-540-77018-3_2
    JEdward
    Ah, I remember those days when needing to compare SPSS, RapidMiner & SAS to build a business case to management proving that just because RapidMiner didn't have a website with only 3 letters before the .com and was far less expensive that it would still be capable of doing what the business needed and therefore a better investment. 

    Assuming this is for your business too, the first step in the comparison is to work out what you need, build a list of features & business goals you'd like to have and then go through them to see which of the two tools offers it (either out of the box or with an add in).  I'm happy to help you out on this if you need any advice. 
    ndf-ls
    Hey JEdward, thanks for your reply, im glad to find someone who can help me out.

    The comparison should be pretty general and show the overall functions of both tools. For example, RapidMiner offers operators for classification, clustering, regression, association, text mining..... and a lot more for sure.
    For SPSS, i cannot find safisfying information about the areas for which operators are available and the overall amount of operators.

    Also, i know that you can extend RapidMiner via several ways. However, i do not know about SPSS. I suppose you can buy extensions for big money?!

    Thank you in advance,
    JEdward
    Apologies for the lengthy post which mainly focuses on the benefits of SPSS Modeler.  I want to be fair and balanced and although my own preference is RapidMiner because for my use cases of the product it delivers the best value for money both in terms of model creation time & cost.  (Particularly in the scenario where you might with to leave RapidMiner Server running on a client system to act as an ETL, webservice, scoring engine, report generator & to periodically refresh model parameters to ensure the best accuracy). 

    However, SPSS does have good things: in terms of product comparison you are looking at comparing SPSS Modeler as this is the closest the IBM has to RapidMiner & also combines with their server to provide similar features to RapidMiner Server. 

    If you're wanting to compare simply like-for-like in terms of operators, algorithms, data sources & extendability available to each tool and if money is no object then you'll find that SPSS Modeler wins on a number of points. 

    You can download a datasheet from the SPSS Modeler site here http://www-01.ibm.com/software/analytics/spss/products/modeler/features.html
    but I'd try to avoid putting in your real details as IBM salesmen are (in my experience) very high pressure and irritating.*
    · # of algorithms & operators
    I believe RapidMiner wins on this point due to the sheer volume of operators available.  However, this is an unfair comparison as it also depends on how they are viewed in both programs, where RapidMiner may have 20+ different types of decision tree operators, SPSS might have 2-3 with an option to pick an algorithm as a parameter. (It's not really comparable).  You can do different loops with RapidMiner though which might not be possible with SPSS out of the box.  SPSS does have additional operators that RapidMiner doesn't such as Geographical operators like Haversine, which can be useful, but GenerateAttributes, Scripts (R or Groovy) can also get the same results with a bit of effort.**
    · Scripting & Extending
    With SPSS you can extend with R & Python, see this link for the Python scripts: ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/16.0/en/modeler_jython_scripting_automation_book.pdf, for examples on extending with R & some other Python examples on SPSS this is a good site: http://datamininginsights.co.uk/***
    RapidMiner can be extended with Groovy script, R or by using Java to modify the code, write your own extensions, etc. 
    ·Data sources SPSS can connect to any ODBC datasource,  RapidMiner to any JDBC source.  SPSS is integrated with IBM databases in a number of cases (naturally, they want you stuck in their ecosystem) so if you use their servers & products already in a big way it might be a better choice and you may be able to leverage your company buying power on the servers to get a discount.  If not, then RapidMiner connects just as easily in my opinion. 

    Anyway, hope this helps to give you a start.  As I said previously the best place to start is with some use cases for how you'll be using the tool because only then can you properly assess if it's going to be value for money... speaking of which.

    Prices for RapidMiner is on the website http://rapidminer.com/pricing/, for SPSS Modeler you can see their wonderfully uncomplicated and inexpensive licence options here: https://www-112.ibm.com/software/howtobuy/buyingtools/paexpress/Express?P0=E1&part_number=D0EMZLL,D0EC5LL,D0EN7LL,D0EC7LL,D0EP4LL,D0ECFLL,D0EPGLL,D0EC8LL,D0EPCLL,D0EC9LL,D0EP8LL,D0ECALL,D0PJRLL,D0PJPLL,D0PLBLL,D0PLALL&catalogLocale=en_US&Locale=en_US&country=USA&PT=jsp&CC=USA&VP=&TACTICS=&S_TACT=&S_CMP=&brand=none to fully understand what you are buying with each licence you would need to speak with one of their snake oil merchants.
    For RapidMiner Server costs vs SPSS AnalyticalServer I couldn't see the pricing online for IBM, however I can tell you that a number of years ago when comparing the two directly.  IBM quoted me an initial figure 10x more expensive than the RapidMiner Server equivalent and that was for a product that didn't do many of the things that RapidMiner Server did! 

    *Sorry if any IBM salesmen are reading this, but really one of the factors in my product choice is because of you. I feel much the same about you guys as Bill Hicks does for marketing folk.
    **Note to any RM extension builders, a Geo Extension would be a nice addition to the Marketplace.
    ***RapidMiner Salesbods, I'm sure there must be a number of similar blogs for SPSS & SAS out there, what could be a nice demonstration of RapidMiner is to take blog posts from these blogs and work the examples to demonstrate how the same can be accomplished in RapidMiner and post that as a blog article.
