Options

# "Cross Entropy error in calculation?"

Member Posts: 344 Unicorn
edited June 2019 in Help

hi,

in performance(classification) operator, cross entropy is defined as sum of logarithms of confidence of true label classes divided by nb. of examples, however, I get only the correct results, if I do this but divide by number of examples +1

I know its not a big thing, but I spent a lot of time wondering why I get wrong results according to that definition, but then divided by nb. of ex. +1 and get right results:

cross entropy:

-(log2(1)+log2(0.385)+log2(0.615))/3 = 0.692803777838436

but :

I get the same if I divide by 4 instead of 3.

Tagged:

• Options
Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

Hi,

I am not 100% sure but is this not just one of those cases where you add 1 to the denominator just to avoid the special case of the empty set and ending up with an infinite performance?

Cheers,

Ingo

• Options
Member Posts: 344 Unicorn

that sounds not logic to me, why would there be an empty set? The testset needs to have at least 1 (or maybe more) example, if not it would not be possible to calculate any performance, so there will always be at least 1 example.

e.g if you have 1 example with 0 confidence, it would just be log2(0) /1

how log2(0) is defined, thats the question (but that should be possible, there will always be some  confidence zero).. but its not dependend on some empty set I think