Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

performance of testing data

rafeenarafeena Member Posts: 14 Contributor II
hi,

i have included images on how i have done my classification. i would like to know how to view the performance of my testing data.hopefully what i am doing here is correct

thanks 

Best Answer

Answers

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @rafeena,

    Your process is correct. Your performance vector is given by the ave output port  of the Validation operator.
    Do you encounter any error with this process ?

    Regards,

    Lionel
  • rafeenarafeena Member Posts: 14 Contributor II
    hi lionelderkrikor .. it didnt give me any problem. however i would like to see the performance of my testing file, the file names retrieve testing date and i believe the performance i got now is for my training data. 
  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,
    Just add another Performance operator after the Apply Model (2) which will then calculate the error rates for the provided test data.
    Side note: what you have now is actually is not really the training error but the estimation of the test error from a cross-validation.  The true training error would be if you would apply the model on the complete training data again and calculate the performance for that.
    The cross-validated error and the test error should be similar (provided you have enough data and it follows the same distributions).
    Hope this helps,
    Ingo
  • rafeenarafeena Member Posts: 14 Contributor II
    IngoRM  hi. i did it like you said but the result is not good. the accuracy is 0
  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Well, I see that you have changed your process a bit.  You seem to select some column in the training path - are you sure that you do the same data transformations also on the test data?
  • rafeenarafeena Member Posts: 14 Contributor II
    @IngoRM i am doing 2 process actually one is to select features using tfidf and one using entropy. can you explain more on the data transformation because i probably didnt execute them all
  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @rafeena,

    What Ingo said means that you have to apply strictly the same preprocessing steps to both your training dataset and test dataset.
    From your screenshot of your previous post, it seems that your are selecting only some features (via the Weight by Information Gain / Select by Weights operators) during your training step.
    You have to apply strictly the same selection to your test data.
    To have a personalized response, please share your process(es) and all your dataset(s).

    Regards,

    Lionel


  • rafeenarafeena Member Posts: 14 Contributor II
    hi @lionelderkrikor i have applied the same step but it says that the attributes are not a matched, however i do believe the attributes i used are all the same. any way i have included my datasets. my processes are as the pictures above 
  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @rafeena,

    In attached file, the working process.
    I'm able to obtain a test performance (accuracy) of  around 70 % (calculated by the Cross Validation operator).

    Hope this helps,

    Regards,

    Lionel

    PS : You can not calculate the "test error" from your dataset "testing data2.1" because you have not the true label...
  • rafeenarafeena Member Posts: 14 Contributor II
    thank you very much @lionelderkrikor. when you say test error does this mean i cannot see the accuracy for testing data 2.1?

  • rafeenarafeena Member Posts: 14 Contributor II
    @lionelderkrikor noted thanks for your help.
  • rafeenarafeena Member Posts: 14 Contributor II
    edited January 2020
    @lionelderkrikor . i would like to be clear on training and testing data for rapidminer. if i do it like the process in the photo the file named testind data 2.1 is not actually set as my testing data right? both my  training and testing data is within the file formspring training 2 and rapidminer will choose randomly which one will be testing and training data. is this correct?
Sign In or Register to comment.