Predict (assign) viewers to emissions

hopseyhopsey Member Posts: 1 Newbie
Hi all!

I have a data set that contains TV commercial emissions. It has many properties, like 
 - date & time of emission,
 - GRP value
 - TV channel
 - TV show
 - commercial position (beginning of the block, middle, or end)
 - Channel subject group (cooking, traveling, etc)

each property is important. Date and time determines whether the emission was during the prime time, night, etc., GRP value indicates range of emission, etc.

on the other side I have new website visitors count (based on Google Analytics), so I can clearly see how many people each emission has brought to the site and how effective it was.
Visitors data set is aggregated to minutes, so I have information like
 - 2020-05-10 13:30:00  - 7 visitors
 - 2020-05-10 13:31:00  - 10 visitors
 - 2020-05-10 13:32:00  - 8 visitors
 - 2020-05-10 13:33:00  - 2 visitors

so I can estimate, that this particular emissions has brought 27 new visitors to my website.

Problem is when emissions interfere. So having two (or more) emissions colliding all I know is that they have brought together eg. 57 visitors.

Is it possible to estimate how many visitors came from particular, interferred emission, using information based on "clean" (not colliding) emissions? Each emission is described by many properties. How to achieve it with RapidMiner? I'm trying hard with Impute Missing Values and k-NN operator with no luck.

Any help will be appreciated!


  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,524 RM Data Scientist
    Hi @hopsey ,
    thats an ineresting question. One may look at this from a marketing stand point. In Marketing you invest into different advertisement (i.e TV, Cinema, Print, Web, Social) and you want to estimate how big the effect is on your sales.
    The common way to do this is to fit a linear model:
    Sales = a*Cinema+b*TV... and then look at the coefficients. An example for this is part of the GLM Contribution operator, which is part of operator toolbox.

    There are for sure other ideas to tackle this. @Telcontar120 - you got quite some experience here, don't you?

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Yes, as @mschmitz says there are several different approaches to these types of problems.  The linear approach is often called Marketing Mix Models (MMM) ( in case you want to do some research) and typically this is applied at understanding the overall sales (unit or dollar) based on the level of aggregate spend across different channels, and not in the narrower question of marketing attribution, which is another entire area of research (although related).
    For marketing attribution, when you have overlapping interactions, probably the most common way that it is handled is that each type of interaction has its own normalized canonical curve associated with it to identify expected responses.  These curves are then used to attribute responses, and if there are any periods in which multiple interactions are operating simultaneously, the fit is estimated based on the sum of individual effects and then interactive effects are added to explain and remaining discrepancies. 
    So, in your example, you have some TV transmissions that occur when nothing else is happening.  From these transmissions, you would develop a set of data to describe the typical number of responses and the timing of those responses, for each combination of other characteristics that characterize those transmissions, based on something like "minutes since broadcast".
    Then when you have overlapping transmissions, each of the underlying canonical curves for the components is used to attribute responses based on its characteristics and broadcast time, and the difference between the sum of those curves and the actual results is reviewed, and if there are discrepancies, then new interaction terms (which may either be additive or subtractive in nature) are added to account for the differences.
    This is fairly complicated and takes a fair amount of manual work---not an approach that is merely a matter of running the combined dataset through a machine learning algorithm, I'm afraid.  It can all be done in RapidMiner, but it would be a fairly extensive project and would involve a fair amount of manual setup and attribute generation. 
    There are other rule-based approaches that use simple heuristics for marketing attribution as well, such as last interaction, first interaction, time-decay allocation, equal allocation, etc., but those are not based on ML either, just on the application of simple assumptions about how interactions generate responses. 
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.