ROC true positve rate remains at 0 for some time before going up. Unusual.
Hi everyone, I'm pretty new to data mining and RapidMiner so take it easy on me . I'm dealing with a binary classification problem where I'm trying to identify people at high risk for a certain condition. 1 = yes 2= no I'm using various sizes of data (in terms of observations) averaging around 160,000 observation. the data set contains 22 attributes (nominal/polunominal/numerical) and the binominal class label as described above. I'm comparing different classification algorithms for this problem which are listed in the table below. All experiments used a 5-fold cross validation with a binominal classification performance operator to get the results.
THE PROBLEM The J48 Decision tree from the WEKA extension provides promising results as seen in the provided results table below, however, the AUC does not seem correct (see table below). When looking at the plot of the ROC curve at the bottom left corner of the chart the true positive rate remains at 0 for a little as the false positive rate increases along the x-axis. at about .5 along the x-axis the true positive rate finally increases and eventually goes above the y=x line. This is clearly why the AUC suffers but I do not know why this is happening and this does not occur in any other algorithm. (all data has been prepossessed to remove missing values and under-sampling has been implemented with some additional steps as well.)
If anyone knows why this could be occurring your help would be greatly appreciated, thank you.
Re: ROC true positve rate remains at 0 for some time before going up. Unusual.
Normally this happens when a model assigns the top probabilities for the positive class to some examples that are actually in the negative class.
I used the word "normally" as this would happen in a correct implementation of ROC curves and ROC analysis. However, from my experience, ROC analysis is unreliable in RapidMiner, including in the latest non-free professional version 6. So I would not use whatever is related to ROC curves from RapidMiner in my analyses, even if I would pay $2999+ per year for this software.