"Preprocessing Data for Decision Tree (Weights)"

mmaelzer · November 2009

Hi,

I have a special problem because of the characteristics of my data. The attributes are:

- ID (I declared as ID)
- contact (nominal and declared as regular)
- product (nominal and declared as regular)
- execution (nominal and declared as label)
- quantity (numerical and declared as weight)

The data covers all possible combinations of contact, product and execution, if the combination doesn't exist, the quantity is zero, if the quantity is 300, then this case appeared 300 times (in reality but not in the datasheet). So it isn´t leading to the desired results, when i build a decision tree or some rules. I tried to declare the quantity-attribute as weight, but seemingly it isn´t the right way. Can someone tell me, how to weight the data correctly?

Thanks a lot!

land · November 2009

Hi,
I would have suggested to declare the quantity as weight. This should work with learners supporting weights. What went wrong?
By the way:
I would filter out all examples having quantity =0 using the example filter operator. This would at least make things faster.

Greetings,
Sebastian

mmaelzer · November 2009

Hi Sebastian,

Filtering out examples with quantity 0 reduces the classification error (to 75%). When I´m not filtering out this examples the classification error is at 99%. Because of this I thought that weights are not correctly used or declared.
At first I used a X-Validation, as I understood this splits the dataset into two or more disjoint datasets (problematic because of the fact, that every case appeares just one time). classification error: 89% with filter/ 99% without filter
Now I tried to split the data manually in two datasets (month1, month2) covering all cases and used month1 as trainingset for the learner und month 2 as testset after applying the model to the testset. classification error: 75% with filter/ 99% without filter
The tree doesn´t represent the data, for example:

contact - product - execution - quantity
c1 - p1 - e1 - 2
c1 - p1 - e2 - 500

leads to this path in the tree: c1 -> p1 -> e1
It seems like the learner takes the first combination and ignores the weights.
I tried it with decision tree and CHAID.

Regards,

M. Mälzer

land · November 2009

Hi,
sorry, but I don't see any need for doing classification anyway. If you have each combination of the nominal attributes and each combination is assigned a label, where's the need for learning? It seems to me, the list of combinations with labels is a perfect classifier?

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Preprocessing Data for Decision Tree (Weights)"

Answers