can you share some more details about what you are trying to do with us? Your screenshot is too small to see any details. The best approach is copying and pasting the process XML in the forum.
Hi @mims98, Thanks for sharing the process. The "Nominal to text" was applied to all attributes. Do you want to apply text processing on all attributes? Usually, I only need to apply the nominal to text conversion on one single column(nominal attribute). One more potential issue is that there is new keywords in the second input data. So I would suggest using the "wordlist" from the first input data, and reuse this wordlist on second data. Hope it helps.
YY
1
David_AAdministrator, Moderator, Employee, RMResearcher, MemberPosts: 297 RM Research
Hi,
I can't run the process, because I don't have your CSV. Could you please provide with a clearer picture or description what the error says?
What I also saw in your process, that in the first Process Documents from Data operator, all operators all operators are disabled. To apply the model on a test set, you need to apply the same transformations on both data sets.
@yyhuang thank you, I have try to change it on one single column only. but it going to be error for my "SVM" operator which is "missing attribute" where I should put the "wordlist" operator?
@David_A error at performance operator : non-nominal label the label attribute(id) must be nominal for the calculation of performance criteria for classification tasks. certain learning schemes and algorithms require the label to be nominal
David_AAdministrator, Moderator, Employee, RMResearcher, MemberPosts: 297 RM Research
Okay,
the technical reason your process fails is, because you are trying to predict the ID represented as a number and the performance measure you apply are used for categories. Using the Numerical to Polynominal would help you in that case. But in your data the IDs are unique and so can't be used as labels for a classification problem (how should an algorithm learn to detect the right label, if each only appears one time). So perhaps you need to rephrase your learning task first and then update your process accordingly.
Perhaps take a look at the community or the RapidMiner Academy for some inspiration on different text classification tasks.
Answers
David
here is the process XML. thanks for your response
<?xml version="1.0" encoding="UTF-8"?><process version="9.5.001">
Thanks for sharing the process. The "Nominal to text" was applied to all attributes. Do you want to apply text processing on all attributes? Usually, I only need to apply the nominal to text conversion on one single column(nominal attribute).
One more potential issue is that there is new keywords in the second input data. So I would suggest using the "wordlist" from the first input data, and reuse this wordlist on second data.
Hope it helps.
YY
What I also saw in your process, that in the first Process Documents from Data operator, all operators all operators are disabled. To apply the model on a test set, you need to apply the same transformations on both data sets.
David
thank you, I have try to change it on one single column only. but it going to be error for my "SVM" operator which is "missing attribute"
where I should put the "wordlist" operator?
error at performance operator :
non-nominal label
the label attribute(id) must be nominal for the calculation of performance criteria for classification tasks. certain learning schemes and algorithms require the label to be nominal
here is the data in CSV:
But in your data the IDs are unique and so can't be used as labels for a classification problem (how should an algorithm learn to detect the right label, if each only appears one time). So perhaps you need to rephrase your learning task first and then update your process accordingly.
David