comparing data mining tools

jesslyn · June 2012

I am currently evaluating Rapidminer, R, SAS Enterprise and Orange. Can someone provides some useful information to me?

which software provides better features in terms of 1)scalability and 2)power and flexibility, 3)how well the tools access and manage the data, 4) which is more graphical user friendly as well as 5) visualization.

I've done some research and I found out that rapidminer is better than the other 3 softwares.

I need someone to provide me more information about this topic as I am currently evaluating on these 3 tools. thanks.

Nils_Woehler · June 2012

Hi,

we are glad you are interested in RapidMiner. But please don't double post. You questions have been answered here: http://rapid-i.com/rapidforum/index.php/topic,5187.0.html

Best,
Nils

IngoRM · June 2012

Hi Jesslyn,

I just have answered a couple of questions already here:

http://rapid-i.com/rapidforum/index.php/topic,5187.0.html

Let me add some information to the new ones:

1)scalability

The desktop version of RapidMiner is working, well, on your desktop. Hence, there is a limit for the amount of data by the amount of memory your desktop system has. Things are of course much better for the server RapidAnalytics, which is usually running on better hardware. And there are several specific extensions for improving scalability for RapidMiner: a) an In-DB-Extension for executing processes directly in the database (for many processes, there is then literally no limit anymore), b) a Streaming Extension which offers operators so that data is no longer completely loaded into memory, and c) there is the Radoop Extension which allows running data transformation and modeling processes in distributed Hadoop clusters.

2)power and flexibility

This has been partly answered already. Right now, there is no other graphical data mining suite offering more operations and more options for combining them including all necessary control structures like loops, branches, macros (variables), etc. More can be found in the fact sheet.

3)how well the tools access and manage the data

Again, please have a look at the fact sheet. There are plenty of operators for connecting to data sources and transforming the data. Actually, many users of RapidMiner do not perform data analysis but ETL processes

4) which is more graphical user friendly

Although this is a matter of taste I would like to point out that the Rapid-I team has put a lot of efforts into better supporting analysts, especially beginners. There are a lot of features like meta data propagation, quick fixes, error detection, online help, operator recommendations etc. to simplify the analyst's life. More, as you might guess already, can be found in the fact sheet.

5) visualization

And a last time: the most important visualization techniques are listed in the fact sheet. This is actually an area we are pretty proud on since RapidMiner offers really a huge amount of different visualization techniques. And there is the new "Advanced Plotter" section (the documentation for this can be found in our download section).

Fact Sheet

Probably, you will find the following fact sheet for RapidMiner and RapidAnalytics interesting:

http://rapid-i.com/downloads/rapidminer/facts/rapidminer_rapidanalytics_fact_sheet.pdf

Cheers,
Ingo

jesslyn · June 2012

Hi Ingo,
thanks for replying. I understand that for Rapidminer Community edition is a free software and there is a limitation in size constraints like how many rows or records it can handle. however, can i have an estimation on what the limit will be? Millions of rows of data? 1 million, 2 million? thanks.

awchisholm · June 2012

Hello jesslyn

I don't believe there is an explicit limitation in the community edition on the maximum number of rows that can be processed. There is always a physical resource limit imposed by the machine you are running on however. Whenever these limits are encountered there are plenty of approaches that can be adopted to work round them. For example, the stream database and loop batch operators let you process things in batches at the expense of increased running time of course. The other thing is to use Rapid Analytics and run processes remotely.

regards

Andrew

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

comparing data mining tools

Answers