Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Process Documents from Data: Apply to a new set of data
Perhaps I am missing something obvious, but you can envision that the Process Documents from Data operator is pretty comparable to other pre-processing models that we can use with Apply Model. After processing an ExampleSet of text with this operator, is there a way to apply the same model on top of a new ExampleSet?
A comparable flow would be using CountVectorizer in sklearn.
A comparable flow would be using CountVectorizer in sklearn.
Tagged:
0
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornSo you would need to do both actually. If there are specific document processing steps that you take inside Process Documents then you will need to apply those to future datasets as well (e.g., tokenization, n-grams, etc) but then you will use the wordlist input port to ensure that only those words which were present in your initial model construction get counted for purposes of subsequent scoring. Otherwise you may generate new words from the new documents and it would be missing words that are being looked for by the model.0
Answers