Options

# predicting float label which depends on polynominal attributes

Member Posts: 3 Contributor I
edited November 2018 in Help
Hi there!

I am new to rapidminer and to datamining. Rapidminer is my first dataminig tool and im very pleased with it, it is very good for newcomers, but i have some problems and i need someone who is an expert to help me

I have a data in excel that looks like this:

variable X - lable, its an float variable (student grade, for an example 3.73)
variable Y1 - attribute, nominal value (can have 4 values that ive coded in numberes: 1, 2, 3, 4)
variable Z1 - attribute, nominal value (can have 6 values taht ive coded in numbers from 1 to 6)
variable Y2 - attribute, nominal value (can have 4 values that ive coded in numberes: 1, 2, 3, 4)
variable Z2 - attribute, nominal value (can have 6 values taht ive coded in numbers from 1 to 6)

i want to predict X depending on Y1, Z1, Y2, Z2

first i thought to use linear regression with converting nominal to binominal first (dummy coding), but rapidminer made an output of only 2 variables in model that X depends on, instead of all 4 (and it has no sense that others have no influance on X)

ive also used weka library linear regression, without dummy coding first, got same result

any help? can you point me how to setup this? is there some other algorithms for this problem (label is a number, but attributes are polynominal)?

i hope im making it clear whats my problem

thank you

• Options
Moderator, Employee, Member Posts: 114 RM Data Scientist
Hi denmla.

The reason for this problem is the build-in feature selection of the linear regression methods. By default M5 prime is used in both cases (RM and Weka). Simply turn it off (RM: feature selection = none, Weka: S = 1.0) and you should receive a model that refers to more than two attributes.

Greetings,
Helge
• Options
Member Posts: 3 Contributor I
Hi Helge,

sorry for late response, I had some exams and I was not around my computer. Your advice helped, RapidMiner managed to output all variables. I have some questions concerning the output, since im not sure if RapidMiner is using dummy coding as binary. I want to calculate residuals. I'll post result and explain the question better later when i get to my desktop PC.