The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

# How do I split up scored data into 20 equally sized segments?

Member Posts: 3 Learner I
edited February 2020 in Help

Hi there-- still only a few days into using RapidMiner and wasn't sure if/how I could go about doing the following:

I created a logistic regression model for direct mail marketing. I've scored my model onto new data but what I want to be able to do is split the scored data up into 20 different groups based on their descending confidence(responder) value resulting in the A's having 1/20th of the most likely responders, the Bs having 1/20th of the next most likely and so on.

-Simon

Tagged:

• Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 123 RM Data Scientist
Hi @simon_philipose,

You can first use Sort operator to Sort confidence values with the descending order, followed by Split data operator.
In split data operator Parameter window; add partition ratio = 1/20

Hope this helps.

Cheers,
Pavithra
• Member Posts: 3 Learner I

Hi Pavithra,

Thank you for your response. So I ran into a few problems with using the Split Data operator.

1. It splits the dataset into multiple datasets. What I need is one data set but with a field called Model_Group with a value of A, B, C, D, etc. depending on the confidence values.

2. It appears the maximum number of data sets I can split is 8 by putting .125 in the partions ratio field 8 times. I can't do 10, much less 20 different splits.

• Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
Hi,
i would do the following:

Sort - by confidence
Generate ID - to get a index
Use Generate attributes with id%10 to get your Model_Group

Best,
Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany
• Member Posts: 3 Learner I
Thank you so much @rfuentealba -- your solution worked perfectly! Very much appreciated!!
• RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
Wow, so many ways to do this in RapidMiner!
If you copy your score attribute first, Discretize by Frequency should be able to do this directly for your score attribute by selecting that attribute and setting the number of bins to 20.  This will create exactly the bins you are looking for, although if there are a large number of ties this can sometimes cause problems for the Discretize operators.  (The reason you copy the score first is Discretize will replace your selected attribute with a new attribute, so if you still want to have the raw score, you will need two copies of it, one which is binned and one which is not).
Brian T.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts