Can bad become good accuracy?

haddockhaddock Member Posts: 849  Guru
edited June 19 in Help
Greetings all!

Perhaps you can help me solve a dilemma.

I have two processes, one consistently gets 65% accuracy on an A or B classification, whereas the other only gets 25% on the same data.

Should I pick the first, because of its higher accuracy, or the second, because doing the opposite to its prediction would produce 75% accuracy?
Tagged:

Answers

  • steffensteffen Member Posts: 347  Guru
    Hello Haddock

    Hm ... accuracy measures the ability to predict the correct class taking into account the correct predictions of both classes:
    (see e.g. here: http://en.wikipedia.org/wiki/Accuracy#Accuracy_in_binary_classification)

    So... switching A and B should not increase the accuracy of your second process.

    Additional note: How are the prior probabilties of this class, i.e. |A| / all , |B| / all where |A| + |B| = all ? Accuracy can be fooled by predicting the majority class (if there is a significant class skew).

    awaiting your reply

    kind regards,

    Steffen
  • haddockhaddock Member Posts: 849  Guru
    Hola Steffen!

    Not sure I made myself clear enough...

    System 2.
            Actual A Actual B
    Predict A 12       40         52 Predictions A
    Predict B 35       13        48 Predictions B
            47       53      (12+13)/100

    System2 reversed.
            Actual A Actual B
    Predict A 35       13      48 Predictions A
    Predict B 12       40      52 Predictions B
            47          53       (35+40)/100




  • steffensteffen Member Posts: 347  Guru
    Ah ok

    Here is my opinion:
    I would never do such a reversal.

    Why: Speaking theoretically, a classification algorithm tries to model the properties of two classes in order to differentiate the two classes from each other. System 2 has failed completely. Performing such a reverse assumes that the system learned the properties of the wrong class (respectively). But how to prove this assumption ? If the system is not able to learn the properties of given classes, what does it learn (internally) after all ?

    Mathematical Logic: Let's assume a rule A=>B
    that means:
    If A = true then B is true BUT
    If A=false then we cannot say anything about B. B could be either true or false.

    roughly speaking: Based on false assumptions you can conclude anything.

    Example:
    -1 = 1 // squared
    1 = 1

    I hope that I did not tell you things you already knew

    in conclusion: check  the properties of the first system and compare them with the properties of the second system (e.g. tree structure if system = decision tree). Maybe there is an error and the algorithmn messes the classes up internally. Or maybe the classes are nearly the same so that the algorithm failed to find the differences. If you can not find anything like that, take the first model. (my humble opinion ;))

    kind regards,

    Steffen
  • haddockhaddock Member Posts: 849  Guru
    Hi again,

    Thanks for your reply. If you take a look here http://en.wikipedia.org/wiki/Table_of_logic_symbols you might want to reconsider your wording  ;)

    Actually I'm not convinced that the deductive and inductive reasoning mesh so easily. RM is most definitely an induction tool, and as it says in the good book ( http://en.wikipedia.org/wiki/Inductive_reasoning ) "Inductive reasoning is deductively invalid", hence my dilemma!
  • steffensteffen Member Posts: 347  Guru
    Hello Haddock

    First of all: Thanks for the remark, I changed the expression about conclusion / implication.

    Second:
    haddock wrote:

    Actually I'm not convinced that the deductive and inductive reasoning mesh so easily. RM is most definitely an induction tool, and as it says in the good book ( http://en.wikipedia.org/wiki/Inductive_reasoning ) "Inductive reasoning is deductively invalid", hence my dilemma!
    This is the thrill about Data Mining, isnt it ? Or as stated in The Secret Laws of Analytic Projects:

    The First Certainty Principle: C~ 1/K; Certainty is inversely proportional to knowledge.
    A person who really understands data and analysis will understand all the pitfalls and limitations, and hence be constantly caveating what they say. Somebody who is simple, straightforward, and 100% certain usually has no idea what they are talking about.
    Back to the topic:
    As stated before, I would analyze the properties of the system to find out what has happend (as I still believe, the classes have been mixed up internally). However, if we assume that the system are black boxes, the problem is more complicated (and very fascinating).

    I will think about the black box modell and come back later. I have to respond to this ... because you made a good point and ignoring it means that I will run into such a situation sooner or later (*wink* murphy).

    ... a refreshing and fascinating problem

    Steffen
  • haddockhaddock Member Posts: 849  Guru
    I'm back in France now so Bonjour tout le monde!

    Nice one Steffen, things are starting to get interesting, so it is time to put some meat on the table, as they say, but not to vegetarians...
    Back to the topic:
    As stated before, I would analyze the properties of the system to find out what has happend (as I still believe, the classes have been mixed up internally). However, if we assume that the system are black boxes, the problem is more complicated (and very fascinating).
    When the issue of class mangling, first identified by Steffen, was resurrected recently I checked that the database input operator avoids the issue by having a specific class parameter. So at that level I'm reasonably comfortable that the labels are not mishandled.

    My datastructure is a table of attributes, to which successive label columns are joined, in short my own multilabel generator formed by SQL queries to the database. I then iterate over the labels, using previously optimised parameter sets to produce binominal SVM predictions and performance metrics, which in turn go back into the database.

    Matching up those predicted classifications I noticed that certain related labels performed consistently better than the rest, and the converse, that some were consistently bottom of the performance table.

    I can't help feeling that it is interesting that the same learning process, on the same attribute premises, produces between 30% and 65% accuracy, when 50% is the default expectation in binominal classification. Perhaps an optimiser should maximise the accuracy distance from 50%, and a model applier should have a negation parameter?

    Enough for now, my brain aches and my belly tells me it is time for lunch.

  • steffensteffen Member Posts: 347  Guru

    I'm back in France now so Bonjour tout le monde!
    welcome back  ;D

    I can't help feeling that it is interesting that the same learning process, on the same attribute premises, produces between 30% and 65% accuracy, when 50% is the default expectation in binominal classification. Perhaps an optimiser should maximise the accuracy distance from 50%, and a model applier should have a negation parameter?
    Wait a minute: Are we talking about the problem of two different learning systems on the same label or the same learning system on different labels ?  ???

    Regarding the latter: I do not find this very surprising. Every label represent its own concept, which may be hard or easy to learn.
    Similar concepts ("related labels") result (not necessarily) in comparable performance values (e.g. extreme case: the labels are correlated).

    idea for an experiment: generate a label randomly for a fixed set of premise attributes and see what happens...

    Perhaps an optimiser should maximise the accuracy distance from 50%, and a model applier should have a negation parameter?
    I still do not think that a negotian parameter is an option ... but I will return to this point later ...

    kind regards,

    Steffen
  • steffensteffen Member Posts: 347  Guru
    Hello

    ... I thought about the problem.

    Theoretical point of view:
    I now think it is ok to include a negotian parameter for binary predictions, because in short it is just another optimization parameter to find the optimal hypothesis (i.e. reduce the classification error).

    Practical Point of view:
    I assume we got a black box learning algorithmn and one (!) label to find the optimal classificator for. The optimization of a binary parameter (the negotian parameter) is very tricks, because ...
    • crisp decision (true or false), in opposite to "fuzzy" parameters
    • the result depends strongly on the value of the parameter
    So: If your black box learning algorithmn is "stable" , i.e.  low variance across all results of an e.g. crossvaldidation, the value of the parameter should always be the same and so I would use this negotian-switch on my application set. If this is not true, i.e. the ratio of parameter true / false is points not clearly in one direction (95 %, 99 % how much safety do you want ?), I would not use such a switch.

    ok ... I know now for myself what I would do in such a situation ... thanks for detecting the problem ..

    kind regards,

    Steffen

  • haddockhaddock Member Posts: 849  Guru
    Hola Steffen!

    Why Hola? Because I know you'll find this link interesting..
    http://digital.csic.es/bitstream/10261/1810/1/55703.pdf!
    Anyway many thanks for your perseverance with this one, I respect your opinion and am therefore more than happy that we seem to be in fuzzy agreement; by the same token I'm a little surprised that nobody else has jumped in, as this issue addresses a pretty central question in datamining, namely " when is a pattern a pattern?". That being said, quite a few viewings, so on we plod.
    So: If your black box learning algorithmn is "stable" , i.e.  low variance across all results of an e.g. crossvaldidation, the value of the parameter should always be the same and so I would use this negotian-switch on my application set. If this is not true, i.e. the ratio of parameter true / false is points not clearly in one direction (95 %, 99 % how much safety do you want ?), I would not use such a switch.
    Exactly! As mentioned previously, what struck me was that certain labels were consistently well below 50%, and it was that gap from 50% that seemed too interesting to throw away. I'm looking at two approaches to using it.

    1. An optimiser/pessimiser combo, where the pessimiser optimises against reversed labels, trying to beat the conventional optimiser ( gosh, RM seems to have built-in possibilities for label swapping  :D ),

    2. Adding the predictions as an attribute, to capture the meaning "this class has the property of predictability by model N to extent X". Drifting towards the land of Meta here.

    I'll let you know how I get on.

    Onward through the fog!
Sign In or Register to comment.