Why did you choose Rapidminer ?
Legacy User Member Posts: 0 Newbie
What made you choose rapidminer over other tools available on the market ? ( Weka, Spss and Sas) Was it the cost or as i have found out from good reports from others. I am selecting a tool at the moment and looking for criteria to select which is the best for bank credit datasets
In this forum the probability of meeting someone working with the Enterprise Edition (which includes extensive reporting) is rather small, so...
I personally choose RapidMiner among free tools because it is ...
- powerful due to its learning operators and operator framework, which allows to form nearly arbitrary processes
- easy to extend for Java programmers
as a developer of RapidMiner I am of course not really neutral when comparing RM to other data mining tools. But besides from the points that are mainly a matter of taste, in my opinion there are several aspects which would make me favor RM instead of tools as the other ones you mentioned. Comparing to the commercial products such as SAS and SPSS this is first of all the price and therefore the costs you have to budget: the community edition of RM is totally free, the Enterprise Edition of RM gives you (depending on the package) full support at a very acceptable price and cost performance ratio. If you know about the prices of the aforementioned products, you know what I mean. But apart from these business aspects, RM has other advantages such as the enormous flexibility in process design (e.g. considering the ability of nesting optimization, evaluation, etc.). For many of our customers, an additional argument for selecting RM is also the dynamic development which allows them to get customized features at a low cost, if RM should lack a certain functionality.
If you would like to have more information about RM in general and especially regarding its application in the credit domian or if you would like to help you to find out whether RM is the appropriate product for you, just send us a mail. Of course, you could simply check out RM as well before if you like ...
I do not want to start a flame war which of the tools is better for which purpose - which seems stupid to me since both are open-source and freely available and everybody can test both and check which one better suits the user's needs.
But since you have asked I will give you at least a short idea of my motivation why I prefer RapidMiner over Weka. I actually used to employ Clementine for all my data mining problems and I also used Weka quite a lot (almost every time for the same reason: I missed some functionality in Clementine and Weka which I had to implement myself). About one year ago I came across RapidMiner and this actually has changed my everyday's working life - probably more any other application I was introduced to. Today, our company does not longer spend money for Clementine (sorry guys) and all of our analysts have fully changed to RapidMiner. Here are some of the main reasons (only applicable for Weka, for Clementine things are a bit different):
1. power and flexibility: Weka's Experimenter is easy to use but let's face it: it is not flexible enough to meet real-worlds process requirements. IMHO it is not even flexible enough for scientific work (I know both quite well, the scientific data mining world as well as the real business). The same is basically true for the Weka Knowledge flow. Nice and in general quite similar to RapidMiner but not nearly as powerful when it comes to more complex processes as they are necessary to us. RapidMiner provides much more analysis steps (operators) than Weka and much more possibilities to combine them. I am often amazed how the small modules of RapidMiner can be combined in such a way that you can solve analysis problems which can not be solved by any other solution. Two thumbs up for the RapidMiner developers to come up with such a clear and modular concept for data analysis processes.
2. scalability: the first versions of RapidMiner I used were actually not faster than Weka - but they used much less memory. In the meantime (I use a pre-release of RapidMiner 4.3) the algorithms were also optimized for speed. Our database contains 1.6 billion transactions and our data mining processes work quite well on that amount of data. On Weka we always had to use rather small samples and never were able to directly work on the database. By the way: we recently updated to the Enterprise Edition and got a great performance gain. On our analysis server with 8 cores we got a nice runtime boost - the parallel version of decision tree learner really rocks and delivers the results in about 1/8 of the time of the non-parallized version. That's pretty cool.
3. visualization: things look much better in RapidMiner than in Weka. I do not refer to the look and feel here but to the really great visualization tools within RapidMiner. Try it yourself, you will love them.
4. preprocessing: it is really amazing how many methods for preprocessing and data extraction / transformation are available directly within RapidMiner. There are much more methods for these really important aspects of data analysis than in Weka (and also more than in any other tool I am aware of). This integrates all phases of analysis into one process / tool and my work became really smooth.
5. rapid-i: this is not really an argument for the software but anyway. As I said before, I used to work with Weka a lot and I also have developed some algorithms. I found several bugs within Weka and have sent them to the Weka maintainers. Almost the same reaction: none. The developers of RapidMiner are much faster (did you notice how often and fast they implement feature requests coming from the community? You cannot imagine how well they work for their customers...). This is also something I never got from SPSS / Clementine and I really like this about RM / rapid-i as well.
Since some might ask why we prefer RapidMiner over Clementine the answer is quite simple: it offers much more data mining and analysis possibilities for no (or only a small) price.
The answer grew longer than I ihad ntended. But since this was also an important question for me one year ago I hope it helps some of the readers. But again: please be not offended if you prefer Weka for one reason or another. That's of course fine and this is only my own story why I have changed the tool. And let's not start a big discussion about the pros and cons - there are of course also some drawbacks of RapidMiner as every user might know (personally, I found the beginning quite hard since there were so much different possibilities). I just wanted to let others know why getting used to RapidMiner might be good idea even if this means slightly more work until you get used to many different options of RapidMiner.
For me the major points are the combining and extending abilities you mention, but I can see that it takes quite an effort to recognise those points. I ended up having to cut and paste the pdf Manual ( which is confusingly also called the Tutorial ) into a searchable structure, like this http://188.8.131.52/index.html .
I can quite understand why this forum gets more than its fair share of questions which reading the manual should prevent, and admire the manners with which they are dealt with. However it is also clear that many slots in the operator reference section are merely stubs waiting to be filled in.
In short more effort should be put into the documentation, making RM's advantages become so clear that this topic would not exist!
PS. Martin, what is your 8 core rig?
CPU: 2 x AMD Opteron 2350 (4 cores each)
Mainboard: Tyan Thunder 3600B
Memory: 8 x 2 Gb ECC DDR2-800 (= 16 Gb)
Harddisk: 2 x 1 Tb (Raid-1)
OS: Linux 64 bit (openSUSE 11)
Together with all other necessary stuff the total price was less than net 1200 Euro which is quite fair taking the speed and the amount of memory into account. We have three of those servers here and work remote on them.
It gives you a quick overview of your possible data analysis options.
And the work flow gives you a live view of what is going on.
You can view intermediate results, use the community plug in to see work from others.
Furthermore this forum is great for feedback.
So every minute you spend learning rapid miner is well spend.
If you compare Rapid Miner with Matlab, or the free variant Octave:
- Rapid miner is much more about data mining, understanding your data, etc.
- Octave and Matlab are much more about machine learning, doing experimentation with your algorithms.
- I would say the tool R is somewhere in between.
It is hard to compare with R since it mostly depends on personal preference.
Which one is best in terms of scalability, efficiency, integratibility, usability.?