2 basic questions on agglomerative clustering and CSV processing
I have 2 basic questions.
Question 1: I have a CSV file whose examples I want to feed into an Agglomerative Clustering. How do I select which column is the one used for the metric? Also, if this column is a timestamp, do I need any extra processing (such as converting into milliseconds)? I chose MeasureType=Numerical, Numerical Measure=Euclidian as these appear to meet my needs (I need to cluster examples by how close they are in time).
Question 2: with the same setup in mind, can I specify a stop condition for the algorithm so it doesn't continue to calculate clusters until the very end (i.e. the one cluster with everything?). I have hundreds of thousands of examples with events in time but the clusters are small (max 15 minutes apart), so it doesn't make sense calculating clusters of hours, days or months (the total span of the records).