Prediction model questions
Hello I have a ńfew newbie questions. I’m very new to this but am trying to learn!
i created a csv with around 20 variables (columns) which should explain / predict one variable (column).
After using the automodel function, which btw is a blessing, i chose the „predict“ function and selected the variable which i want to predict / explain with the others. Doing this gave me 3 questions, which after researching the internet i still cant answer:
1. The variable which i want to have explained is a numerical one. When setting everything up, the tool asks me if I want to turn it into classification (after i already selected the prediction function). What does that mean and what would be my benefits if doing so ?
2. When calculating, do the tools use all variables to explain/predict, or do the tools filter out the most efficient ones to do the calculation ? Because I was hoping to also get insight on which variables are the most important for the prediction instead of forcing the tools to use all of them...
3. Some of the values have more decimal places than RapidMiner shows. The values therefore show as “0”... Is there a possibility to change the amount of decimal places?
Thank you so much for the help...
Greetings from Germany.
i created a csv with around 20 variables (columns) which should explain / predict one variable (column).
After using the automodel function, which btw is a blessing, i chose the „predict“ function and selected the variable which i want to predict / explain with the others. Doing this gave me 3 questions, which after researching the internet i still cant answer:
1. The variable which i want to have explained is a numerical one. When setting everything up, the tool asks me if I want to turn it into classification (after i already selected the prediction function). What does that mean and what would be my benefits if doing so ?
2. When calculating, do the tools use all variables to explain/predict, or do the tools filter out the most efficient ones to do the calculation ? Because I was hoping to also get insight on which variables are the most important for the prediction instead of forcing the tools to use all of them...
3. Some of the values have more decimal places than RapidMiner shows. The values therefore show as “0”... Is there a possibility to change the amount of decimal places?
Thank you so much for the help...
Greetings from Germany.
0
Best Answer

lionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 UnicornHi again @MikeGer !
to answer, to your three questions :
1. The answer is : it depends on what you want to achieve in fine.
You have a numerical (and thus a continuous) label (variable you want to predict), thus you have a "regression problem".
Example :
In your original problem, your label can take any real value between 0 and 10.
From this point you can "transform" your label by defining three new variables called X,Y and Z with :
X = [0,3]
Y = ]3,6]
Z = ]6,10]
In this case, your label has now only 3 possible values, you have a finite number of possible values (instead an infinite number of values initially). You have "transformed" your initial "regression problem" into a "classification problem"
"would be my benefits if doing so ?"
It is very difficult to answer to this question in general. In general by doing so you loose some information but there are pro and cons to doing so.... once again it depends on your specific use case.
2. I already answered to this question in your other thread about feature selection in AutoModel. Please refer to this thread.
3. You have to go to :
Settings > Preferences > Number Format > set the "fraction digits of numbers"
Hope this helps,
Regards,
Lionel (Greetings from France )
1
Answers
Thank you so much!!! I literally spent the whole day trying to figure those things out, so your comment really helped me out...
Stay safe and thank you again.