Regarding Model Building

Harshav · November 2021

There are a few values in the test data attributes that aren't trained while modelling, which I discovered while constructing my test data. How do I filter the values and extract the data (from the test data) similar to the values of train data ?

BalazsBarany · November 2021

Hi,

can you post your process (with anonymous test data if possible) so we can better understand what you're attempting? And explain your problem with more details?

Which modeling algorithm are you using? Many algorithms are supposed to throw out attributes and values from the test data that are not relevant for the model.

Regards,

Balázs

Harshav · November 2021

Exactly Barany, Algorithm(Decision Tree) is not throwing out the attributes and values from the test data , but its making the values null when they are not relevant to the model in the predicted results .

How can I keep check on them(irrelevent data thats not trained ) and filter the data points before applying it to the model ?

Can you suggest me process to keep a check to make sure that test data contains relevant information (Thats in train data ) before applying to model ?

P.S-I will be posting process if required

BalazsBarany · November 2021

Hi,

I really don't understand your question.

Is your problem with the model that was built?

Or with validation, testing on data that didn't go into the model?

If you do a Cross Validation (e. g. the default 10 times), your data will be split up 10 times into 90 % (training) and 10 % (testing) segments randomly. This makes sure that all the data are being tested once, with a model that didn't know about that example. This is how it is supposed to be.
The model output of the cross validation then executes the training phase once more, on the entire data set. This is the best available model that should be as good as the cross validation performance indicates.

Regards,
Balázs

Harshav · November 2021

If an attribute(nominal) has 6 distinct labels , and model got trained on these six labels . In my test data , the same attribute has 7 distinct labels .How can I filter extra label which is not relevant to the model . Do we have any operators or parameters to ignore the exceptions before applying the model.

BalazsBarany · November 2021

If you have a tree model, the additional attribute value doesn't matter.

E. g. you have splits like this: "attX=foo". If your attX in the test set has a different value, it just won't match this rule.

However, you should better mix the training and test set in a way that the model training sees all possible values. That will give you a better model.

If you have an additional attribute in the test set, that's also not a problem. The model will work with the attributes it knows about.

And there's Handle Exception where you can put in operators that might raise exceptions.

Regards,
Balázs

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Regarding Model Building

Answers