Error when applying a trained model to a new unlabeled data set

Stann · May 2021

I want to apply a Naive Bayes model to a new (unlabeled) data set. The model has already been trained and tested via cross-validation. However when I try to apply the model to a brand new data set I get an error message.

Here is an overview of my process and the error I get:

The "Retrieve aggregate" is the new (unlabeled) data set, which I want to predict using my trained model.

"Process Documents from Data" contains a "Tokenize" operator.

The subprocesses within the Cross Validation operator are:

I am new to RapidMiner and I have no clue as to why I get this error

I would greatly appreciate your help as I need to carry on with my research

lionelderkrikor · May 2021

@Stann,

Yes it is possible :

As said apply the same preprocessing steps in your test set "branch"

and connect the word output (wor) of Process Documents from Data operator of your training "branch" to the word input (wor) of your Process Documents from Data of your test set branch.

Regards,

Lionel

lionelderkrikor · May 2021

Hi @Stann,

The attributes have to be strictly the same in your training set and in your unlabeled test set.
Thus you have to apply strictly the same preprocessing steps to your unlabeled test set (thus you have to apply
Nominal to text and Process Documents from data operators to your test set) . Currently you are applying the raw test set to your model...

Hope this helps,

Regards,

Lionel

ceaperez · May 2021

Hi @Stann,

It seems that the name of Attributes (columns) in your Train dataset and Test dataset, aren't the same.

please verify the name and type of your test dataset.

Best

Stann · May 2021

@lionelderkrikor, @ceaperez thank you for your quick response.

Having the exact same attributes would be impossible as each attribute is a token (word) which appeared in the initial text document. Since the new (unlabeled) data set contains different text documents as the training set, the attributes would always differ, because the text documents in the new data set are comprised of "new" tokens.

Having said that, is there still a way to apply the model to a new (unlabeled) set?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Error when applying a trained model to a new unlabeled data set

Best Answer

Answers