Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Inspecting the examples indicated by confusion matrix--possible?
Hi everyone,
Loving getting to know RapidMiner and I've gotten myself deep into it. I am doing multi-class classification using several methods (random forest at the moment) and looking at the confusion matrix in the Performance output. The cells of the confusion matrix tell me where the model is in error and give hints about new attributes I might try. My question is:
Is there an easy way to click on or indicate a cell in the matrix and see a list of the examples that it refers to? (e.g. to answer a question like "what is it about all these examples in category-8 that get misclassified into category-7?).
[As a side question, I have 8 categories, 6000 examples, and about 50 attributes. Initial messing around shows random forest seems best (with certain parameters), gives about 50% accuracy, which is a pretty good information gain over random guess. Does anyone have sage advice about what types of models work well with so many attributes and categories?]
--many thanks, Tom
Loving getting to know RapidMiner and I've gotten myself deep into it. I am doing multi-class classification using several methods (random forest at the moment) and looking at the confusion matrix in the Performance output. The cells of the confusion matrix tell me where the model is in error and give hints about new attributes I might try. My question is:
Is there an easy way to click on or indicate a cell in the matrix and see a list of the examples that it refers to? (e.g. to answer a question like "what is it about all these examples in category-8 that get misclassified into category-7?).
[As a side question, I have 8 categories, 6000 examples, and about 50 attributes. Initial messing around shows random forest seems best (with certain parameters), gives about 50% accuracy, which is a pretty good information gain over random guess. Does anyone have sage advice about what types of models work well with so many attributes and categories?]
--many thanks, Tom
0
Answers
if you do a X-Prediction instead of a X-Validation, you get the scored example set. Afterwards you can use a Filter examples or the filter in the example set result view to get only the misclassified.
Of course you might be careful not to overtrain by hand doing that.
Cheers,
Martin
Dortmund, Germany