Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Classifier Accuracy with Grid Search is not similar to accuracy without Grid Search

SafaSafa Member Posts: 4 Learner I
Hello guys I'm doing Grid Search for tuning Random Forest Parameters when the process ends it gives me a set of best parameters also the accuracy of the best parameters for RF, now my question is when I run the process without Grid Search by setting Random Forest parameters that i got from Grid Search I notice I get a downgrade accuracy??? Can anyone explain the difference because both approaches are the same the only difference is that the first approach is with Grid Search and the second time without Grid Search?
 I have includes screenshots of my process 

my dataset is Glass Type with 214 samples it contains 1 duplicate row, 6 class Unbalance Data, I run my process as following
send dataset into Optimize Parameters (Grid) operator
inside Optimize Parameters (Grid) operator:
1- remove duplicates 
2- Normalize
3- split Data into 80:20
4- use Smote on Training data only
5- Train RF
6- Evaluate Model



Answers

  • ceaperezceaperez Member Posts: 541 Unicorn
    Hi @Safa

    It's a abnormal behavior if you are using the same datasets, be sure that it's the case.
    for example, I saw that you are using the split operator, depending on the parameters, the datasets (training and test) may vary.
    Try the process with stable train and test datasets and check it. 

    Best
  • SafaSafa Member Posts: 4 Learner I
    Hi @ceaperez i have set the split operator in stratification mode and split 80:20 both time, correct me if I'm wrong i think the split operator give same 80% for traininig and same 20% for testing in both cases??
  • ceaperezceaperez Member Posts: 541 Unicorn
    Hi @Safa
    The stratified sampling create random subsets.
    I suggest you to use the split operator once, store the results and then use the new examplesets into your comparison.

    Best

    Cesar
  • SafaSafa Member Posts: 4 Learner I
    Hi @ceaperez
    I did as you said and split the data then store the results into two separate files.
    After that, I run Grid Search and get the best parameters and accuracy.
    Then I test without grid search but still, I get a downgrade accuracy??
    please check my screenshots and tell me if I'm doing something wrong??
  • ceaperezceaperez Member Posts: 541 Unicorn
    edited February 2022
    Hi @Safa

    One of the most beautiful things about Rapidminer is that you have a whole view of your pipeline and you can explore your model step by step. 
    I saw in your model that the accuracy is more like now than  before. that is because we eliminated one source of  aleatority. 
    The Smote operator is another one. if you use the Smote operator over the same dataset twice, you will not obtain the same dataset. 
    I invite you to explore your model using the pipeline, breakpoints and the compare distributions operator from smile extension. 

    Best, 

    Cesar
  • SafaSafa Member Posts: 4 Learner I
    Hi @ceaperez thank you for answering my question really appreciated i have learn few thing from you thanks.
    I have used smote only once,
    I have removed smote too and test again without using split operator still I get downgrade accuracy, I think using performance operator inside grid search and without grid search make slightly different result anyhow thanks
    best regards
Sign In or Register to comment.