Text classification into topics
Apologies for the super-beginner question but I am a super-beginner
I have 4 documents in Spanish that contain about 600 words altogether. I would like RapidMiner to scan each document and classify the words they contain according to the topics they relate to. I know what general words can be expected so I could even feed it a list of words I consider to be related to one another and that belong in a particular class to facilitate the classification process, if it helps!
Ideally, I would like to compare the topic classes (and prominence of the various classes as a proportion of total words) found in the different documents.
I have tried Naive Bayes, K-Means, K-Medoids and Extract Topics from Document (LDA) but despite my good will and reading a lot about these operators (including on this forum) I still cannot figure out what tool is the best to use in my case and how to do this simple text classification.
Please help me. My thesis is at stake
Thank you very much!
Attached is one of my attempts, and the 4 data files