What seems to be the problem in this case?

cjjc20001cjjc20001 Member Posts: 8 Contributor II
I am trying Lightgbm with a dataset. It is giving the following error. 




Sample data are gender, degree concentration etc. Mostly ready-made options coming from a survey where the participant just selects the most appropriate option.

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hi,

    looks like your text field has categories in application which werent present in training.

    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • cjjc20001cjjc20001 Member Posts: 8 Contributor II
    I think the problem is that there are data instances that only occur once, and during the sampling, this occurrence is not chosen by the training data; hence during the validation; they are marked as unrecognized. When I removed the split, it worked. However, I need to train and test the model. I utilized cross-validation but it has the same problem. What is the solution for this?
Sign In or Register to comment.