"convert document files to transaction dataset"
I am a new on text mining and rapidminer. I want to prepare a dataset to create a model with my algorithm. The dataset should contain one row for each text document and each row consists of words contained in the document (separated by comma). Moreover,the words in dataset should be passed the preprocessing steps. token, stop word remove,stem, n-gram.
Please help me
Thank you
Please help me
Thank you
Tagged:
0
Answers
Here is an example process with two hard-coded documents (use "Process Documents from Files" to read from a set of files). Inside the "Process Documents" operator you will see a "Tokenize" and "Filter stopwords" operator. The resulting example set can be used to learn models like with any other numerical data set. In text mining it is common to use the SVM for classification, e.g..