Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Right values for k and max run and dbscan epsilon and min point issue
Hi All,
Please i would like to know when one can tell the right values for k and max run when using kmeans algorithm. how do i also evaluate and interprete the results to know when k values is right? Is there a comprehensive video/material showing this? also how would i know the right value for epsilon and min point in DBSCAN. How do i evaluate and interprete results. Is there also a comprehensive video/material showing this? Thanks
Please i would like to know when one can tell the right values for k and max run when using kmeans algorithm. how do i also evaluate and interprete the results to know when k values is right? Is there a comprehensive video/material showing this? also how would i know the right value for epsilon and min point in DBSCAN. How do i evaluate and interprete results. Is there also a comprehensive video/material showing this? Thanks
0
Best Answer
-
rfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 UnicornHello, @Elu
Well, this is a tough question: although popular, establishing the value of k for a k-Means algorithm is a frequent topic of discussion and it depends on your experience. I can share two things with you today, though. One is that you may want to use x-Means, which is the same as a k-Means but it determines k based in a heuristic method rather than a manually added value. The other one is that you may want to use the elbow method to determine k, which is reasonable. A good tutorial on this can be found here.
Calculating epsilon and the min points on DBSCAN is the same principle, but using the k-NN distances in a matrix of points. Calculate the average distances of every point to the k-nearest neighbors, sort those in ascending order, plot the result and find where the knee cuts the Y value, that is your epsilon setting. The knee is the threshold where a change happens in the k distance curve. Now, I don't know how to determine k for this, as I've mostly used the same k as in a k-Means.
Hope this helps,
Rodrigo.6
Answers
I didn't read this before. The elbow method is just a method to be used to determine the value of k in a graphical manner. Basically you put a value for k and run the algorithm, showing if the amount of examples on each k change as dramatically as possible. When that point happens, you have your best k. But it's trial and error, as there is no way for us to make sense of each k value.
All the best,
Rodrigo.