turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Community Home
- :
- Product Help
- :
- RapidMiner Studio Forum
- :
- Problem with too many parameter to put as columns ...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

07-25-2013 01:07 PM

07-25-2013 01:07 PM

My problem-task is that I have customers with a unique ID and they have parameter (binomial) and I would like to predict the value of certain target variables, so far only one but possible multiple.

In my test case I used the following input dataset, see meta data, each customer is represented in a row and the parameter are in the columns – simply the usual way.

__meta data:__

Role Name Type

id Customer_Id integer

label Target binominal

regular Para1 binominal

regular Para2 binominal

regular Para3 binominal

regular Para4 binominal

__dataset:__

Customer_Id Target Para1 Para2 Para3 Para4

1 M 1 0 1 0

2 V 1 0 0 1

3 M 0 1 1 1

**=> With Naïve Bayes I get great prediction results in the test case with limited dimensions.**

**Problem with the actual dataset: **

I have some 100,000s of parameter and the number is growing a lot. The actual number of active parameter for a customer is very small and so the table would be extremely large and sparse. So my idea was to use the following dataset format as input:

__meta data:__

Role Name Type

id Customer_Id integer

label Target binominal

regular ActivePara polynominal

__data:__

Customer_Id Target ActivePara

1 M Para1

1 M Para3

2 V Para1

2 V Para4

3 M Para2

3 M Para3

3 M Para4

BUT now I do not get consistent predictions per customer what I get is something like this

Customer_Id Target ActivePara Prediction of Target

1 M Para1**V**

1 M Para3 M

2 V Para1 V

2 V Para4 V

3 M Para2 M

3 M Para3 M

3 M Para4**V**

**But I want/need the target prediction per customer_id to be consistent.**

How do I need to set up the input data/ the model to get the result!

**Thanks a lot in advance for any hints and help!!!**

In my test case I used the following input dataset, see meta data, each customer is represented in a row and the parameter are in the columns – simply the usual way.

Role Name Type

id Customer_Id integer

label Target binominal

regular Para1 binominal

regular Para2 binominal

regular Para3 binominal

regular Para4 binominal

Customer_Id Target Para1 Para2 Para3 Para4

1 M 1 0 1 0

2 V 1 0 0 1

3 M 0 1 1 1

I have some 100,000s of parameter and the number is growing a lot. The actual number of active parameter for a customer is very small and so the table would be extremely large and sparse. So my idea was to use the following dataset format as input:

Role Name Type

id Customer_Id integer

label Target binominal

regular ActivePara polynominal

Customer_Id Target ActivePara

1 M Para1

1 M Para3

2 V Para1

2 V Para4

3 M Para2

3 M Para3

3 M Para4

BUT now I do not get consistent predictions per customer what I get is something like this

Customer_Id Target ActivePara Prediction of Target

1 M Para1

1 M Para3 M

2 V Para1 V

2 V Para4 V

3 M Para2 M

3 M Para3 M

3 M Para4

How do I need to set up the input data/ the model to get the result!