"Is clustering and Decision Tree supposed to take hours to process?"

GViasuRaeisaene · June 2017

Hi,

I'm on a tight schedule and using Rapidminer for the first time. At the moment I have been running Agglomerative Clustering for over 5 hours and I'm not sure if I should just let it run still or if there is something wrong and I'm just wasting my time. My exampleset has 241762 examples and 25 attributes, most of which are polynominal. I ran into the same problem when trying to create a Decision Tree, but I just killed that process after 5 hours.

Thanks,

Geta

Thomas_Ott · June 2017

It's hard to tell without seeing your process and data. Are the polynominals transformed into numbers via dummy coding? Normally Decision Trees are fast, there must been a problem somewhere.

IngoRM · June 2017

Agglomerative clustering for many examples (rows) is always very slow. The same is true for decision trees with nominal attributes and massive amounts of possible values. I would suggest to use the following web site to find out which algorithms are feasible:

http://mod.rapidminer.com/

For clustering, I would try "k-Means (fast)" and even that might easily take some time. For classification, I would start with Naive Bayes or k-NN which in general are pretty fast algorithms.

Hope this helps,

Ingo

Telcontar120 · June 2017

In general I would be wary of using nominal attributes that have a high number of possible values in a predictive model. Usually these types of attributes do not generalize very well because the patterns that are in the training data are too specific and simply overfit to the training sample. You might want to consider some kind of feature engineering to reduce the number of possible values by aggregating or combining values in some sensible ways (e.g., 5-digit zip code to region, IP address to country, name to gender, etc.).

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Is clustering and Decision Tree supposed to take hours to process?"

Answers