Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Classification model and preparing data

keb1811keb1811 Member Posts: 11 Learner II
edited May 2020 in Help

Hello everyone,

I want to build different classification models. I have two questions.

1) At first, I want to build a decision tree. So I have to change the numeric values into nominal. I can do this with the discretizing operator. But all my numeric attributes are differently distributed. Do you know any literature which says the best method in each case? I also read that I can do it with k-means clustering, but it doesn’t work with missing values.

2) I often read that I have to split my dataset into a training and a testing part. I can do this with the splitting operator. I don’t understand why I have to split only into two parts and not into three. Because what is about my non-classified observations? Are they included in each of them (training and testing)?  In my opinion I have to split in a training, a testing and a real prediction part.  

Thank you very much.

Regards


Tagged:

Best Answers

Answers

  • keb1811keb1811 Member Posts: 11 Learner II
    Hello Martin,

    thank you very much for you answer.
    Now i tried it without discretizing.
    And to the second question:
    Do you mean it like this? In the beginning I seperated all classified from all non-classified. (Filter 3 and 4) Then I tried in the upper part to train the model, but i don't know how to implement the cross validation when I want also a connection to the bottom "apply model". The bottom one should do the application with the non-classified data.

    Best regards


  • keb1811keb1811 Member Posts: 11 Learner II
    Hello Martin,
    thanks again for your answer. Is it correctly implemented as you indicated? (see picture)
    And one other question: Can I use this "Layout" also for other classification algorithms like KNN, Naive Bayes, Neural Nets and SVM when I only change the Operator (decision tree) in the Cross Validation process?
    I think the diffrent Algorithms will need a diffrent preparation, does it work if I prepare after the operator "set role" or do I have to insert it before the "multiply" operator (because of "filter examples 4" )?  

    Thank you very much!
    Regards

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,529 RM Data Scientist
    Hi,

    this looks good! you can now just change the Decision Tree for other learners. As your pointed out your preprocessing may chance. Especially the handling of nominal variables. You can put the preprocessing infront of Multiply and it will work. One sometimes need to be a bit careful, because this is technically out of your validation. I am not sure how precise you want to run this. Many people do not validate their preprocessings.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • keb1811keb1811 Member Posts: 11 Learner II
    Hi Martin,

    thnaks for your answer. I am not really sure what you mean. Why is it out of my validation? The prepared data are used for the descision tree and those will be validated. Or do you mean, that I have to validate my preprocessing too? If yes, what exactly can I validate there and if yes, are there any explainations/ informations/ literature from RapidMiner?

    Regards
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,529 RM Data Scientist
    its a bit of more tricky thing. Maybe https://academy.rapidminer.com/courses/normalize-demo helps.

    The same idea which is demoed there for Normalize also applies for other operators like Nominal to Numerical. Basically one needs to be careful with all operators which have a red "pre" port.

    Cheers,
    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.