Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

how to get performance of test data where the label has no values

User111113User111113 Member Posts: 24 Maven
Hi All,

I tried to find out how my model is performing on Training data and I was able to do it successfully

         

Now I wanted to see how it is going to perform on test data so I added another apply model and performance and of course my test data like below

                          
I got below errors: That's probably because my "label parameter" is blank in "test data" as I wanted to see what values it will predict....... I am able to get results of prediction but to see how my model is performing on completely new set of data with no values in label.... can we do that if yes then how?
squared_error: unknown

root_mean_squared_error: unknown

if I am trying to put "Set Role" in between "Apply model" and "Performance" I am able to set that predicted variable as my "label" which is not right because that predicted variable column is not present in the original test data so that's not working

        

Best Answer

  • User111113User111113 Member Posts: 24 Maven
    Solution Accepted
    Yes. Nov 19 data was not in training set but I feel I have very limited options on how many models I can run. I see only 3 models, mostly 2 GBT, & random forest to work with my data as it has only 1 real/int variable which is response rate and all others are polynomial.

Answers

  • varunm1varunm1 Member Posts: 1,207 Unicorn
    to see how my model is performing on completely new set of data with no values in label.... can we do that if yes then how?
    Nope, regular performance metrics cannot be calculated without the original known label.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • User111113User111113 Member Posts: 24 Maven
    @varunm1

    Thank you for your response.

    My next step was to run it without performance and save the results in an excel file then I ran that excel as an input to the same model to see the error rate and it came as 0. Can you tell me why?

    Please see below screenshot


      


     
  • User111113User111113 Member Posts: 24 Maven
    I did one more thing and I think I did it right this time.

    The result set that was generated above was from the model and feeding same data to the model obviously would show 0 deviation.

    Now I put original data for example I predicted response rate for Nov 2019 and I already have the actual/original so I fed that as an input to see how much the result set would deviate from original and I got root mean squared error as 0.016

    which isn't bad what do you think?
  • varunm1varunm1 Member Posts: 1,207 Unicorn
    If this nov 2019 data is not in your training then the RMSE is low, which is fine. You can try different models and see if you could get better
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • EnragedWaspEnragedWasp Member Posts: 1 Learner I
    Now I put original data for example I predicted response rate for Nov 2019 and I already have the actual/original so I fed that as an input to see how much the result set would deviate from original and I got root mean squared error as 0.016
  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Basically to find the performance of newly scored data you will need to wait until enough time passes for you to assign the label using the same logic that was embedded in your original model development sample.  Then you can load that in and merge it with the dataset containing the predictions, and then use the typical performance operators on that combined dataset to see how the model did.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.