RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
DBSCAN taking very long time
I am currently trying to do a cluster analysis with DBSCAN. Since it is my first time to either do a clusteranalysis or using DBSCAN I only have knowledge from papers and online documents. But maybe someone of you is able to help me out:
I am analyzing a kind of huge amount of data (I know it's relative). It's 10 columns and around 6 million rows. I am selecting attributes, filter them, normalize and then put them into the dbscan clustering. My parameters are epsilon=0.5 and minpts=4. I want to look at 2 attributes at a time since I'll compare it to k-means.
But the problem is that it already takes over an hour to preprocess the data (there is the loading circle on the clustering part) before it even starts to go from 1 to 100. Is there anything I can change in my process that would maybe make it faster? Perhaps there are some beginner mistakes involved which is quite likely..
Thanks for your answers and have a nice day.
EDIT: I have 64GB of RAM and the process uses around 32GB at the moment. I put the maximum to 50GB. In addition I can say that I only have numeric attributes