Options

# newbie requires advice to select a clustering algorithm

Hello,

I discovered RapidMiner yesterday after several hours of research into data clustering (it looks very nice and friendly). I need a little bit of help in selecting an algorithm for what is most likely a simple case.

I have a series of events that happened in time at irregular intervals. I would like to determine which of those events are in clusters, where in my case a cluster is composed by those adjacent events that were closer in time than a given threshold. The time span of the cluster does not matter (so 3 events at 10 seconds apart or 20 events at 1 minute interval are still valid clusters, I only care about the distance between two succesive events).

From what I've read so far, k-means and its variants are not appropriate since they require the user to specify how many clusters are desired. I don't know how many there are and, in this case, their number is in fact an output of the analysis, not an input.

Any guidance is appreciated.

Thanks,

-jl

I discovered RapidMiner yesterday after several hours of research into data clustering (it looks very nice and friendly). I need a little bit of help in selecting an algorithm for what is most likely a simple case.

I have a series of events that happened in time at irregular intervals. I would like to determine which of those events are in clusters, where in my case a cluster is composed by those adjacent events that were closer in time than a given threshold. The time span of the cluster does not matter (so 3 events at 10 seconds apart or 20 events at 1 minute interval are still valid clusters, I only care about the distance between two succesive events).

From what I've read so far, k-means and its variants are not appropriate since they require the user to specify how many clusters are desired. I don't know how many there are and, in this case, their number is in fact an output of the analysis, not an input.

Any guidance is appreciated.

Thanks,

-jl

0

## Answers

2,531Unicornif each of your example is marked with the point in time, when the even occurs, you might use the Agglomerative Clustering with single link. If you only cluster on the time (mar each other attribute special or remove it), you will get a dendrogram, showing which events are combined into one cluster and which distance is between them.

Greetings,

Sebastian

18Contributor II