Options

Attributes do not match

TuvokbubkaTuvokbubka Member Posts: 5 Newbie
edited May 2021 in Help
I am supposed to create a predictive model using old data for training and then apply it to a new dataset. However when I try to apply it to my new dataset it says that there is missing attributes. I have tried a number of combinations and cannot resolve it. 

If anyone knows how I can resolve this issue, help is more than welcomed,

Thank You
Tagged:

Answers

  • Options
    kaymankayman Member Posts: 662 Unicorn
    You remove on both old and new set the correlated attributes, and most likely one of the removed attributes (SC_MCV in this case) wasn't considered as correlated in the training set. Or just above the threshold in one set and below in the other set...

    You need to ensure therefore that the attributes you keep for both CSV 1 and CSV (3) are the same as you used for your model. Though I think you can actually just remove the correlated attributes filter from the test set as the non used attributes probably will be ignored.
  • Options
    TuvokbubkaTuvokbubka Member Posts: 5 Newbie
    I tried removing the "remove correlated" for both dataset and it still gives me the same error 🧐 

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    The lower stream has an additional Nominal to Numerical (betweeen set role and normalize).
    This will likely cause it.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    TuvokbubkaTuvokbubka Member Posts: 5 Newbie
    It still give me an error.... 
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    Hi,
    did N-PRVR exist in the lower stream? Try to use breakpoints to figure out why N_PRVR wasn't converted.
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    TuvokbubkaTuvokbubka Member Posts: 5 Newbie
    N-PRVR does exists but somehow does not break into different columns as it does for the example set. 
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    so - what type is it? Likely a numerical on the lower stream but a nominal on the upper.
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    TuvokbubkaTuvokbubka Member Posts: 5 Newbie
    in both cases the are numerical but in the original dataset they are nominal and I believe it is because when transforming into dummy variables, the value npvrv =1 does not exist in the new dataset. because I tried removing the attribute in both dataset and it gave me the same issue with another attribute. so I conclude that it is because in the first dataset there are values that are not in the second and that creates an issue when it recodes in dummy variable. but I don't know how to solve or if it is possible to solve ? 

    Thank you so much for your help
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    Hi,
    i think you want to use the Parse Numbers operator on the upper stream before you use Nominal to Numerical.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.