Options

Tipical Workflow for Associationanalysis / Classifikation

SunnyLotusFloweSunnyLotusFlowe Member Posts: 37 Contributor II
Hi all,

this is my first post in this forum!


I have a general Question: I want to know, which Operators are tipically used in the Associationanalysis and which Operators are tipically used in the Classifikation (for preprocessing and so on). It would be nice to hear some experiences about that.
:D

greetings

Lotus

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    welcome to RapidMiner and this forum.

    Well, it is a bit hard to answer this in general since the operators, especially those for preprocessing, will mainly depend on the format of your data. For the actual modeling step, you will find the operators used for association rule mining in "Modeling" - "Association and Itemset Mining" and those for classification learning in "Modeling" - "Classification and Regression".

    For preprocessing, things are harder to answer. For assocation rule mining, often the operator "Pivot" has to be used to transform transaction data into a basket data format. "Nominal to Binominal" is also a hot candidate. For classification learning, it mainly depends on your data format and the capabilities of the learning scheme. Sometimes you have to discretize your data or transform it into a numerical format before a specific learner can be applied. You can find many examples in the Sample Repository of RapidMiner 5 and also with our new Community Extension on myExperiment.org.

    Actually: most of the fun in data mining derives from the fact to define the best preprocessing process for your current task. RapidMiner (and its extensions) now provide about 800 different operators for this - we would not do that if they would not be necessary from time to time  ;)

    In this sense: have fun. Cheers,
    Ingo
  • Options
    SunnyLotusFloweSunnyLotusFlowe Member Posts: 37 Contributor II
    oh this sounds usefull to me. thx alot for the information  :)



    greetings

    Lotus
  • Options
    SunnyLotusFloweSunnyLotusFlowe Member Posts: 37 Contributor II
    Hi there,
    I have looked a little bit further on the Pivot:

    u mean the (De-) Pivoting does the following tranformation (i just want to be sure that i have understand what u had mean):
    Articles are 'A', 'B' and 'C'

    ID | Transaktion  ->    ID | A | B | C
    1  | A,C                      1 | 1 | 0  | 1
    2  |  B                          2 | 0 | 1 | 0


    Is that correct?

    greetings

    SunnyLotusFlower
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    Is that correct?
    Almost  :D

    If you really have such a comma separet format, you would not need a Pivot-Operator but could simply use the operator "Split".

    A real Pivoting would transform the data set:

    ID | Transaktion 
    1  | A
    1  | C                   
    2  | B                       

    to the data set

    ID | A | B | C
    1 | 1 | 0  | 1
    2 | 0 | 1 | 0

    As you can see, the number of examples have also changes and there might exist more than one example per ID before the transformation.

    Cheers,
    Ingo
  • Options
    SunnyLotusFloweSunnyLotusFlowe Member Posts: 37 Contributor II
    aha ok i understand the underlying idea of pivoting.

    furthermore in the literature i have read of mining quantitative Assoc Rules. i have seen that RapidMiner support a lot of Discretization -techniques. But i dont get if all the 3 techniques are supported.

    i mean  the static discretization / dynamic discretization and the distance based

    Discretize by Binning and Discretize by Size sould be the static approaches . i think at least  :D


    greetings

    SunnyLotusFlower
  • Options
    SunnyLotusFloweSunnyLotusFlowe Member Posts: 37 Contributor II
    hello there,

    i found this Operator Discretize by Entropy. I suppose that this has no use in Association Rule Matters. What do i need minimized-entropy intervall in Mining assoc Rules ?


    greetings Lotus
Sign In or Register to comment.