Hi, I have such table with results of classifications:
I have 4 algorithms. Classification was made for 16 different training sets:
- all => all 15 predictors were used
- 1-15 => each set contains 14 predictors and in each set one different type of predictor was removed
Example of set is in attachment.
Type of excluded predictor | column name in csv
1 - characters_number
2 - sentences_number
3 - words_number
4 - average_sentence_length
5 - average_sentence_words_number
6 - ratio_unique_words
7 - average_word_length
8 - ratio_word_length_[1-16]
9 - ratio_special_characters
10 - ratio_numbers
11 - ratio_punctuation_characters
12 - most_used_word_[1-4]
13 - ratio_letter_[a-z]
14 - ratio_questions;
15 - ratio_exclamations;
I have to samehow conclude why results for 1-15 for each algorithm
and each set are better/worse than results in column "ALL".
But I don't
have any idea why. I know that in most cases, when difference between column ALL and column [1-15] is very small (like < 1%) it is just a luck and randomness. But in cases when difference is higher, probably it is caused by something.
The most important thing - I don't know why for k-NN algorithm results are the same
for columns 9-15...
And good will be to know, why Naive Bayes is the best (54%) and k-NN is a bad algorithm for this task (20%).
Can someone help me with that?