Options

# Problem with too many parameter to put as columns into example set

My problem-task is that I have customers with a unique ID and they have parameter (binomial) and I would like to predict the value of certain target variables, so far only one but possible multiple.

In my test case I used the following input dataset, see meta data, each customer is represented in a row and the parameter are in the columns – simply the usual way.

Role Name Type

id Customer_Id integer

label Target binominal

regular Para1 binominal

regular Para2 binominal

regular Para3 binominal

regular Para4 binominal

Customer_Id Target Para1 Para2 Para3 Para4

1 M 1 0 1 0

2 V 1 0 0 1

3 M 0 1 1 1

I have some 100,000s of parameter and the number is growing a lot. The actual number of active parameter for a customer is very small and so the table would be extremely large and sparse. So my idea was to use the following dataset format as input:

Role Name Type

id Customer_Id integer

label Target binominal

regular ActivePara polynominal

Customer_Id Target ActivePara

1 M Para1

1 M Para3

2 V Para1

2 V Para4

3 M Para2

3 M Para3

3 M Para4

BUT now I do not get consistent predictions per customer what I get is something like this

Customer_Id Target ActivePara Prediction of Target

1 M Para1

1 M Para3 M

2 V Para1 V

2 V Para4 V

3 M Para2 M

3 M Para3 M

3 M Para4

How do I need to set up the input data/ the model to get the result!

In my test case I used the following input dataset, see meta data, each customer is represented in a row and the parameter are in the columns – simply the usual way.

__meta data:__Role Name Type

id Customer_Id integer

label Target binominal

regular Para1 binominal

regular Para2 binominal

regular Para3 binominal

regular Para4 binominal

__dataset:__Customer_Id Target Para1 Para2 Para3 Para4

1 M 1 0 1 0

2 V 1 0 0 1

3 M 0 1 1 1

**=> With Naïve Bayes I get great prediction results in the test case with limited dimensions.****Problem with the actual dataset:**I have some 100,000s of parameter and the number is growing a lot. The actual number of active parameter for a customer is very small and so the table would be extremely large and sparse. So my idea was to use the following dataset format as input:

__meta data:__Role Name Type

id Customer_Id integer

label Target binominal

regular ActivePara polynominal

__data:__Customer_Id Target ActivePara

1 M Para1

1 M Para3

2 V Para1

2 V Para4

3 M Para2

3 M Para3

3 M Para4

BUT now I do not get consistent predictions per customer what I get is something like this

Customer_Id Target ActivePara Prediction of Target

1 M Para1

**V**1 M Para3 M

2 V Para1 V

2 V Para4 V

3 M Para2 M

3 M Para3 M

3 M Para4

**V****But I want/need the target prediction per customer_id to be consistent.**How do I need to set up the input data/ the model to get the result!

**Thanks a lot in advance for any hints and help!!!**0