Options

GA driven attribute selection according to Positive predictive value

radoneradone RapidMiner Certified Expert, Member Posts: 74 Guru
In my case I am interested only in POSITIVE PREDICTIVE VALUE.

The problem is when I am  selecting attributes  - GA selects only the case with a single correctly classified example - and thus PPV = 100 %. This is of course of a very little  reliability.

Could anyone help me which performance evaluator will fit my needs?
Thank you in advance for any help.

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    sorry, but I'm a bit confused. What exactly are you going to do? Which operator do you use?

    Greetings,
      Sebastian
  • Options
    radoneradone RapidMiner Certified Expert, Member Posts: 74 Guru
    I am sorry. I will try to be more clear now.

    I have binominal classification problem and what I am interested in is to maximize positive predictive value (PPV) . Therefore lets say I got these confusion matrices:
    1068	328
    57 77
    accuracy: 74.84%
    PPV: 57.46 %
    This is quite good as the PPV is of 57.46 %

    Lat have a look at this example:
    1135	394
    0 1
    accuracy: 74.25%
    PPV: 100.0 %

    Here the PPV is 100 % (i.e. the perfect solution from the point of PPV view and the first is considered to be better)

    Unfortunately - one sample positively classified is only of a little significance. There is high probability that when deployed on validation data the results will be very bad.
    Results on a validation example set is 1) PPV: 55.9% 2) PPV: 0.0 % (two misclassified samples).

    And here is my question - is there any solution how to objectively compare these two results?

    Thanks in advance.
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hello,

    hmm, there is actually always a risk in concentrating on precision alone. Beside taking other measures into account, be it by a combination like f-measure, be it by weighting or be it by multi-objective optimization schemes (which is all possible within RapidMiner), I am afraid there is no general solution for a objective comparison.

    Cheers,
    Ingo
Sign In or Register to comment.