Options

# Comparing the profitability of different rules ?

Member Posts: 7 Contributor II
edited November 2018 in Help
Hi Everybody,

First post but I've done numerous searches and have tried to sort this problem out before asking.
I hope one of you kind people can help me.

I have market data that's separated into various columns, age, ethnicity, postcode, etc etc..
I have a column to show if a purchase was made (1,0)
I also have a column to show the profit on each transaction.

So far i have managed to find a good model that predicts who is most likely to purchase an item.
What i would like to do is take this one step further and find the best fit for profitability.

Currently I could have a model that finds 60% of people who generally purchase low profitable items.

I would rather a model that finds 40% of people who purchase items with twice the profit of the above example.

Is there an easy way to check all the combinations to produce the most profitable model ?

Thanks in advance and sorry if this has been asked elsewhere before, I'm sure it's a popular request but I couldn't find anything in my searches.

• Options
RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
Hi,
if I understood you correctly, you have some threshold where a profit is high and where it is not. You are certainly using some parameter value for distinguishing high and low. So you could optimize this parameter value using one of the optimization operators and use a Performance (Costs) operator to calculate the real costs/benefits of each outcome. Was this helpful?

Another possibility might be to use the Threshold finder.

Greetings,
Sebastian
• Options
Member Posts: 7 Contributor II
Hi Sebastian,

Thanks for replying, I was concerned that I hadn't worded my question very well.

I've added an example for visualisation.

Imagine 2 columns of data in excel, the first column is a sale column filled with just a 1 or a 0.
1 signifies a sale, 0 means no sale.

The next column called Profit shows the profit made on that sale. ;D

Row # Sale Profit
1       1 15
2       0 0
3       0 0
4       1 5
5       1 5
6       0 0
7       1 10
8       0 0
9       0 0
10       1 20
11       0 0
12       0 0

Using historical data as explained in my first post (which includes the sales column in the example as a label ) I currently use rapid miner to forecast a best fit sale model to see if an application is likely to be a 1 or a 0. The model performs well, but i now wish to take this one step further.

I would like to somehow add in the profit data to the equation (it isn't at the moment) so that the model will predict the best return of sales and profit ... for example the model at the moment ( without a profit column ) could pick rows 1, 3, 4 ,5, 7, 8 which would return 4 sales from 6 calls and a profit of 35 points (which it doesn't know).

I would like the model to check all the sums and variables so the best output could be rows 1, 2, 7, 8,10,12 which is 3 sales from 6 calls but with a profit of 45 points.

I imagine I will need some sort of model that uses 2 labels and maybe a "sum product" equation to check against all combinations ????

I did try using a sum product feature but I couldn't get it to work, please could somebody post an XML example of any ideas they may have.

I really hope this makes sense, Thanks again in advance.

• Options
RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
Hi,
might it be a solution to train two independent models? First one for sale or not (0, 1) and then one for the sales amount, possibly trained only on examples representing a sale? You can do this by switching roles using the Set Role operator.
After you applied the two models, you might use the Generate Attribute operator for multiplying the sales column (0 or 1) with the predicted profit to achieve a valid prediction only on the ones where a sale was conducted.

Greetings,
Sebastian
• Options
Member Posts: 7 Contributor II
Thanks Sebastian,

Sorry for the delayed reply, I hoped it would be possible to do this in one go.
I'll have a look at the Generate Attribute Operator as suggested.

Correlation