Options

# Optimize Parameters

Hi All

I have a data set I'm trying to find the natural clusters in. I am using KNN to group the clusters and a Bayesian model to classify the cluster labels

The aim being the higher the accuracy of the Bayesian model, the better the clusters are (There will obviously be some manual checking done afterwards)

I embedded all of this information into an Optimize Parameters as well as a log file to tell me the performance of each iteration i.e. For every value of K output the performance of the model

I got the results below from the log file

K Value Performance

2 0.957987839

4 0.997700133

5 0.99876161

6 0.999380805

7 0.998142415

8 0.998407784

9 0.970278638

10 0.996196373

it can be seen from this that the optimal value are 3 or 6. Its possible the Optimize Parameters setting ignored these because of overfitting. It recommended k = 2 with the results below

The accuracy that is shown from the Optimize Parameters is the same as k=3

From the log operator where k = 2 the accuracy is just over 95%

I was wondering if anyone can help me understand why this may be the case?

Thanks

I have a data set I'm trying to find the natural clusters in. I am using KNN to group the clusters and a Bayesian model to classify the cluster labels

The aim being the higher the accuracy of the Bayesian model, the better the clusters are (There will obviously be some manual checking done afterwards)

I embedded all of this information into an Optimize Parameters as well as a log file to tell me the performance of each iteration i.e. For every value of K output the performance of the model

I got the results below from the log file

K Value Performance

2 0.957987839

**3 0.999203892**4 0.997700133

5 0.99876161

6 0.999380805

7 0.998142415

8 0.998407784

9 0.970278638

10 0.996196373

it can be seen from this that the optimal value are 3 or 6. Its possible the Optimize Parameters setting ignored these because of overfitting. It recommended k = 2 with the results below

**accuracy: 99.92% +/- 0.02% (mikro: 99.92%)**true cluster_0 true cluster_1 class precision |

pred.cluster_0 34792 0 100.00% |

pred.cluster_1 75 55576 99.87% |

class recall 99.78% 100.00% |

The accuracy that is shown from the Optimize Parameters is the same as k=3

From the log operator where k = 2 the accuracy is just over 95%

I was wondering if anyone can help me understand why this may be the case?

Thanks

0

## Answers

1,869UnicornJust as a sidenote your accuracies are *very* high, and very high accuracies are always suspicious (unless you have really well-separated clusters). You may want to have another look at your process setup to see that everything is setup correctly.

Best regards,

Marius