Problem with too many parameter to put as columns into example set
My problem-task is that I have customers with a unique ID and they have parameter (binomial) and I would like to predict the value of certain target variables, so far only one but possible multiple. In my test case I used the following input dataset, see meta data, each customer is represented in a row and the parameter are in the columns – simply the usual way. meta data: Role Name Type id Customer_Id integer label Target binominal regular Para1 binominal regular Para2 binominal regular Para3 binominal regular Para4 binominal dataset: Customer_Id Target Para1 Para2 Para3 Para4 1 M 1 0 1 0 2 V 1 0 0 1 3 M 0 1 1 1
=> With Naïve Bayes I get great prediction results in the test case with limited dimensions.
Problem with the actual dataset: I have some 100,000s of parameter and the number is growing a lot. The actual number of active parameter for a customer is very small and so the table would be extremely large and sparse. So my idea was to use the following dataset format as input: meta data: Role Name Type id Customer_Id integer label Target binominal regular ActivePara polynominal data: Customer_Id Target ActivePara 1 M Para1 1 M Para3 2 V Para1 2 V Para4 3 M Para2 3 M Para3 3 M Para4
BUT now I do not get consistent predictions per customer what I get is something like this Customer_Id Target ActivePara Prediction of Target 1 M Para1 V 1 M Para3 M 2 V Para1 V 2 V Para4 V 3 M Para2 M 3 M Para3 M 3 M Para4 V
But I want/need the target prediction per customer_id to be consistent.
How do I need to set up the input data/ the model to get the result!