Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Fuzzy c-Means for Rapid Miner
Hello RM-community..
I noticed that Rapid miner lacks of some clustering algorithms. Especially Fuzzy c-Means and its derivatives. Also I was experimenting with the other clustering algorithm models available and it seems there are not many of them. I was expecting many more. I don't know if there are plugins or extensions for clustering, I did not find them. In the Weka extension, there are also not so many clustering algorithms. If any one can point me to more of them, I would be very glad. I googled for it, but unfortunately, I did not find any.
I am a researcher in clustering and I have quite some algorithms developed. I want to use RM to publish my algorithms not only as a paper, but also its implementation. I already ordered the documentation but my company is rather slow for such things. Anyway, I noticed that the DBScan implementation is very weak. First, its very slow and Second, the result is wrong. I already filed a bug-report for that, but I have an implementation that is very fast. It has a execution complexity of n*log(n) for building a tree data structure based on the specified distance function and k*n*log(n) for executing DBScan it self. I don't know how fast the implementation is, but it needs for 35000 data objects of each 23 real values more than 2 hours. I will export the data set before clustering and apply my algorithm on it to see how fast it is. But from my experience with other (much larger) data sets, it should be done within a few seconds, maybe minutes. Is there any way to contribute to rapid miner and improve existing algorithms?
Best regards,
Roland
I noticed that Rapid miner lacks of some clustering algorithms. Especially Fuzzy c-Means and its derivatives. Also I was experimenting with the other clustering algorithm models available and it seems there are not many of them. I was expecting many more. I don't know if there are plugins or extensions for clustering, I did not find them. In the Weka extension, there are also not so many clustering algorithms. If any one can point me to more of them, I would be very glad. I googled for it, but unfortunately, I did not find any.
I am a researcher in clustering and I have quite some algorithms developed. I want to use RM to publish my algorithms not only as a paper, but also its implementation. I already ordered the documentation but my company is rather slow for such things. Anyway, I noticed that the DBScan implementation is very weak. First, its very slow and Second, the result is wrong. I already filed a bug-report for that, but I have an implementation that is very fast. It has a execution complexity of n*log(n) for building a tree data structure based on the specified distance function and k*n*log(n) for executing DBScan it self. I don't know how fast the implementation is, but it needs for 35000 data objects of each 23 real values more than 2 hours. I will export the data set before clustering and apply my algorithm on it to see how fast it is. But from my experience with other (much larger) data sets, it should be done within a few seconds, maybe minutes. Is there any way to contribute to rapid miner and improve existing algorithms?
Best regards,
Roland
Tagged:
0
Answers
yes, we certainly appreciate any contribution you want to make to RapidMiner. You can find some basic information about contributing to RapidMiner at
http://rapid-i.com/content/view/51/81/
(joint copyright assignment, code style, basics...)
And on
http://rapid-i.com/content/view/25/48/
you can find information about how to configure Eclipse to get access to the latest version. The forum here is also a good resource and probably the most comprehensive one is the white paper at
http://rapid-i.com/component/page,shop.product_details/flypage,flypage.tpl/product_id,52/category_id,5/option,com_virtuemart/Itemid,180/
which I assume you have already found. You might also find interesting the proceedings of the RCOMM 2010, where many people presented their new extensions and algorithms:
http://rapid-i.com/component/page,shop.product_details/product_id,68/flypage,flypage.tpl/pop,0/option,com_virtuemart/Itemid,180/
Hope that helps. Cheers,
Ingo
i will clustering with fuzzy c-means, and try to validate with davies bouldin in cluster distance performance. i just try it, but i can't connected between fuzzy c-means with cluster distance perfomance. may be someone know how to solve it?