The Altair Community and the RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Modelling events with loose association

Member Posts: 9 Contributor II
edited November 2018 in Help

I'm searching for the best way to do this and, being fairly new to data modelling, I would appreciate ideas or guidance!

I have two sets of events, A and B, both of which may be triggered by root cause events (set C, which I don't have). See the diagram below. The events in set A may (or may not) lead to events in set B. Set A contains around 10k possible distinct items (of which maybe 500 are particularly useful), and set B contains around 1000 items. There is a time lag between A and B and the closer A is to B, the more relevant the association. A and B are polynomials.

At present I want to develop a prediction model for A->B (what is likely to occur in B given events in A?) However, if there is any way to determine the elements of C from A and B... I'm all ears.

I'm thinking that FP-growth may be a good starting point. Anyone with experience of this?

• RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

So this sounds like market basket analysis provided your C set does have a connection with A and B sets.

So what you;ll need to do is load your C data set that contain A and B instances and use a Numerical to Binomal (or some other coversion operator) to set your data set to true and falses. Then feed it into the FP-growth algo and Association ruless operator.

• Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

But if you really don't have any access to events "C" (as your original post implies) and it is instead some kind of hypothetical root cause, then you will have to directly model based on A and B, which you can also do using FP_Growth as @Thomas_Ott explains.

Brian T.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
• Member Posts: 9 Contributor II

Thanks. As mentioned, I don't have access to, or information about, set C. The FP-growth algorithm might be appropriate, though it makes for a lot of binomial fields from my long list of polynomial data items. The other thing is that FP-growth seems to be looking at one set of data and trying to find associations within that set (potential combinations of items within a transaction), rather than associations between items in different sets. I guess what I'm really looking for is clusters of A relating to clusters of B, but I'm not sure if there is an appropriate model to use here.