NaN problems with MinMaxNormalization and precision measure

UsernameUsername Member Posts: 39 Maven
edited November 2018 in Help
Hi,

I noticed two bugs (?) in the MinMaxNormalization and WeightedMultiClassPerformance classes.

MinMaxNormalization:
If an attribute has always the same value, they are normalized to NaN. Is this normalization behaviour really intended? This can result in strange results from Learning operators since some of them don't handle unkown values well (LibSVM). Here's my proposed fix:
### Eclipse Workspace Patch 1.0
#P yale
Index: src/com/rapidminer/operator/preprocessing/normalization/MinMaxNormalizationModel.java
===================================================================
RCS file: /cvsroot/yale/yale/src/com/rapidminer/operator/preprocessing/normalization/MinMaxNormalizationModel.java,v
retrieving revision 1.11
diff -u -r1.11 MinMaxNormalizationModel.java
--- src/com/rapidminer/operator/preprocessing/normalization/MinMaxNormalizationModel.java 14 Jan 2009 13:45:34 -0000 1.11
+++ src/com/rapidminer/operator/preprocessing/normalization/MinMaxNormalizationModel.java 12 Mar 2009 10:56:13 -0000

double value = example.getValue(attribute);
double minA = range.getFirst().doubleValue();
double maxA = range.getSecond().doubleValue();
- example.setValue(attribute, (value - minA) / (maxA - minA) * (max - min) + min);
+ if (maxA == minA || min == max) {
+ example.setValue(attribute, Math.min(Math.max(minA, min), max));
+ } else {
+ example.setValue(attribute, (value - minA) / (maxA - minA) * (max - min) + min);
+ }
}
}
}

WeightedMultiClassPerformance:
The average precision is NaN if there is a class that is never predicted by a model. The reason is that the precision for this class is NaN. Here's another possible fix:
### Eclipse Workspace Patch 1.0
#P yale
Index: src/com/rapidminer/operator/performance/WeightedMultiClassPerformance.java
===================================================================
RCS file: /cvsroot/yale/yale/src/com/rapidminer/operator/performance/WeightedMultiClassPerformance.java,v
retrieving revision 1.6
diff -u -r1.6 WeightedMultiClassPerformance.java
--- src/com/rapidminer/operator/performance/WeightedMultiClassPerformance.java 9 May 2008 19:22:43 -0000 1.6
+++ src/com/rapidminer/operator/performance/WeightedMultiClassPerformance.java 12 Mar 2009 11:02:28 -0000

                 }
                 result = 0.0d;
                 for (int r = 0; r < rowSums.length; r++) {
-                    result += classWeights * (counter / rowSums);
+                double p = counter / rowSums;
+                    result += classWeights * (Double.isNaN(p)? 0 : p) ;
                 }
                 result /= weightSum;
                 return result;
Tagged:

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    thanks for sending in those fixes. Both seemed very reasonable to me and we just have incorporated them into the latest CVS developer branch. They will of course also be part of the upcoming new release.

    Thanks again and cheers,
    Ingo
Sign In or Register to comment.