Options

Association Analysis for numerical real data?

Fred12Fred12 Member Posts: 344 Unicorn
edited February 2020 in Help

hi,

is Association analysis (e.g FP-Growth) also suited to do calculations and discover relationships on numerical (real) data columns? or only for categorical variables?

if it is, I'd like how to do so ....

Answers

  • Options
    yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist

    Hi Fred,

     

    Take a look a the tutorial process for FP-Growth, it is not a perfect tutorial to explain FP-Growth but it happen to use Iris data which only has continous numerical variable/attributes. The tutorial uses Discretize by Frequency and convert nominal to binomial before applying FP-Growth.

    Keep in mind that all attributes of the input example for FP-Growth are required to be binominal.

    What is your user case/purposes to apply FP-Growth on real/continuous data columns?

  • Options
    Fred12Fred12 Member Posts: 344 Unicorn

    ok but I want to do FP-Growth regarding my class values (1,3 or 4). Identify item-sets that appear with a certain kind of support in regard to a given label class...

     

    but I think its probably not well suited for problems with 20+ numerical parameters, you would have to discretize them and then divide them into binomial values bigger or smaller than half of the interval... probably makes not much sense

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,509 RM Data Scientist

    Fred,

     

    FP-Growth is a algorithm initially designed for Market Basket Analysis. Either you buy a product or not. It is also used in other use cases e.g. Webpagevisits. But it's always a "either you did it or not". In fact it also ignores if you took it twice or not. That's simply how the algorithm works and nothing RM specific. That's why it can only run on binary and not on numerical data.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.