Regarding Model Building

HarshavHarshav Member Posts: 33 Contributor I
There are a few values in the test data attributes that aren't trained while modelling, which I discovered while constructing my test data. How do I filter the values and extract the data (from the test data) similar to the values of train data ?

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi,

    can you post your process (with anonymous test data if possible) so we can better understand what you're attempting? And explain your problem with more details?

    Which modeling algorithm are you using? Many algorithms are supposed to throw out attributes and values from the test data that are not relevant for the model. 

    Regards,

    Balázs
  • HarshavHarshav Member Posts: 33 Contributor I
    edited November 2021
    Exactly Barany, Algorithm(Decision Tree) is not throwing out the attributes and values from the test data , but its making the values null when they are not relevant to the model in the predicted results .


     How can I keep check on them(irrelevent data thats not trained ) and filter the data points before applying it to the model ?

    Can you suggest me process to keep a check to make sure that test data contains relevant information (Thats in train data ) before applying to model ?


    P.S-I will be posting process if required 
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi,

    I really don't understand your question.

    Is your problem with the model that was built?

    Or with validation, testing on data that didn't go into the model?

    If you do a Cross Validation (e. g. the default 10 times), your data will be split up 10 times into 90 % (training) and 10 % (testing) segments randomly. This makes sure that all the data are being tested once, with a model that didn't know about that example. This is how it is supposed to be.
    The model output of the cross validation then executes the training phase once more, on the entire data set. This is the best available model that should be as good as the cross validation performance indicates.

    Regards,
    Balázs
  • HarshavHarshav Member Posts: 33 Contributor I
    If an attribute(nominal) has 6 distinct labels , and model got trained on these six labels . In my test data , the same attribute has 7 distinct labels .How can I filter extra label which is not relevant to the model . Do we have any operators or parameters to ignore the exceptions before applying the model.
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    If you have a tree model, the additional attribute value doesn't matter.

    E. g. you have splits like this: "attX=foo". If your attX in the test set has a different value, it just won't match this rule. 

    However, you should better mix the training and test set in a way that the model training sees all possible values. That will give you a better model. 

    If you have an additional attribute in the test set, that's also not a problem. The model will work with the attributes it knows about.

    And there's Handle Exception where you can put in operators that might raise exceptions.

    Regards,
    Balázs

Sign In or Register to comment.