Options

# (Poly-)nominal values in Linear Regression?

Member Posts: 45 Guru
edited November 2019 in Help
Good evening everyone,

Lets imagine I have dataset consisting of these Columns:

UnitPrice | Price_Last_week | UnitsSold (label) | Product_Code | Brand | Dummy_Promoted?| Store_Number

I used the Nominal to Numerical operator to transform Brand Values, because I want to apply a Linear Regression.

Now some questions:

• Is that really working?
• Is that a right approach to examine the influence from the Brand on the UnitsSold ?
• How is Linear Regression handling dummvalues for example 0= no promotion, 1= promotion
• Are Product Codes a problem in this case? Because they are somehow Numerical but not made for calculations

I wish you a nice evening

Best regards

Tagged:

• Options
Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
Solution Accepted
Hello, @MPB_

Let's see, I'll try something here.

`Unit Price | Last Week Price | Units Sold | Code | Brand | Dummy Promoted | Store No.`

I am a little OCD, therefore I'll reorder the columns and put their roles and data types behind. (reg. stands for regular and pnom. stands for polynominal):

`Code | Brand | Dummy Promoted | Store No. | Unit Price | Last Week Price | Units Sold<br>id   | reg.  | reg.           | reg.      | reg.       | reg.            | label<br>pnom.| pnom. | binominal      | polynom.  | numerical  | numerical       | numerical<br>`

That your process is working properly depends on how did you setup the Nominal to Numerical operator. If you used dummy coding or effect coding, then good. If you used unique integers, then it's not that good because you will have a nominal value that acts as a numerical one. Let's think you are doing it right:

• Is that really working? Probably yes.
• Is that a right approach to examine the influence from the Brand on the UnitsSold? Yes. But please consider putting the roles correctly, like using an id where it belongs.
• How is Linear Regression handling dummy values? It depends on how you configure your Linear Regression, if you drop the colinear features and other factors. Hope this was your question but I can't give you an answer from the top of my head.
• Are Product Codes a problem in this case? Yes. You should mark them as nominal even if they are numbers. Also, mark them with ID as a role.
Hope this helps,

Rodrigo.