Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
[SOLVED] Speed / Evaluation time improvement of kNN Classifier
jaysonpryde
Member Posts: 20 Contributor II
in Help
Good day,
I've already developed a Java application, which uses RapidMiner.jar (and the other jars), to classify my test data. Classifier that I've used is kNN (k=3, distance measure = cosine similarity). I've already performed the necessary optimization with respect to k and distance measure to be used.
My model is comprised of 25k data set/rows, 31 attributes.
Now, when I ran a test data, which is a CSV file with an average of 3k data set/rows, execution time is quite very long, which is 1 hr+ (avg).
Do you have any suggestions/recommendations on how I can improve the execution/evaluation time of my kNN classifier application based on the details I've mentioned?
Hoping to receive feedback. Thank you
I've already developed a Java application, which uses RapidMiner.jar (and the other jars), to classify my test data. Classifier that I've used is kNN (k=3, distance measure = cosine similarity). I've already performed the necessary optimization with respect to k and distance measure to be used.
My model is comprised of 25k data set/rows, 31 attributes.
Now, when I ran a test data, which is a CSV file with an average of 3k data set/rows, execution time is quite very long, which is 1 hr+ (avg).
Do you have any suggestions/recommendations on how I can improve the execution/evaluation time of my kNN classifier application based on the details I've mentioned?
Hoping to receive feedback. Thank you
0
Answers
as you probably know, kNN is a lazy learner, which means that training a model is very fast (basically just storing the training set), but application is quite slow, since for each new example the k nearest neighbours have to be found. The only possibility to reduce execution time of kNN is to reduce the size of the training set, either by removing attributes or by removing examples (where the latter will probably have the greater impact).
Otherwise I would suggest to use another learner than kNN. Basically any learner which actually creates a model will be way faster during application than kNN. Additionally you may be able to learn something about your data by looking at the model. A linear SVM for example outputs the example weights, such that you can see how big the influence of an attribute is for classification. You may want to try: SVM (linear or rbf kernel), decision trees, Linear Regression if you have a regression problem, ...
Best regards,
Marius