Right values for k and max run and dbscan epsilon and min point issue

Elu · May 2019

Hi All,

Please i would like to know when one can tell the right values for k and max run when using kmeans algorithm. how do i also evaluate and interprete the results to know when k values is right? Is there a comprehensive video/material showing this? also how would i know the right value for epsilon and min point in DBSCAN. How do i evaluate and interprete results. Is there also a comprehensive video/material showing this? Thanks

rfuentealba · May 2019

Hello, @Elu

Well, this is a tough question: although popular, establishing the value of k for a k-Means algorithm is a frequent topic of discussion and it depends on your experience. I can share two things with you today, though. One is that you may want to use x-Means, which is the same as a k-Means but it determines k based in a heuristic method rather than a manually added value. The other one is that you may want to use the elbow method to determine k, which is reasonable. A good tutorial on this can be found here.

Calculating epsilon and the min points on DBSCAN is the same principle, but using the k-NN distances in a matrix of points. Calculate the average distances of every point to the k-nearest neighbors, sort those in ascending order, plot the result and find where the knee cuts the Y value, that is your epsilon setting. The knee is the threshold where a change happens in the k distance curve. Now, I don't know how to determine k for this, as I've mostly used the same k as in a k-Means.

Hope this helps,

Rodrigo.

Elu · May 2019

Where do you then input the k means value and squared errors value? Do not fully understand the elbow method

rfuentealba · May 2019

Hi @Elu,

I didn't read this before. The elbow method is just a method to be used to determine the value of k in a graphical manner. Basically you put a value for k and run the algorithm, showing if the amount of examples on each k change as dramatically as possible. When that point happens, you have your best k. But it's trial and error, as there is no way for us to make sense of each k value.

All the best,

Rodrigo.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Right values for k and max run and dbscan epsilon and min point issue

Best Answer

Answers