Options

# "[SOLVED] Reconstruction of the model in excel"

Member Posts: 4 Contributor I
edited June 2019 in Help
Imagine you have a model, learnt using W-BayesNet or W-BFTree (those from WEKA package).
Now you want to reproduce all the maths behind the model on sheet of paper or excel. Just in case you won't have Rapidminer you still might be able to us the model you learnt.

I managed to do it for k-means clustering, for W-LADTree and for binominal BF-Tree algorythms. Lets take a look at simple BF-Tree model output:

=========================================
W-BFTree
Best-First Decision Tree

indicator1 < -0.01016
|  indicator2 < 0.01842: Class2(7.0/2.0)
|  indicator2 >= 0.01842: Class1(103.0/29.0)
indicator1 >= -0.01016
|  indicator3 < 0.00926
|  |  indicator4 < -7.7E-4: Class2(24.0/1.0)
|  |  iindicator4 >= -7.7E-4: Class2(20.0/12.0)
|  indicator3 >= 0.00926: Class1(14.0/5.0)

Size of the Tree: 9

Number of Leaf Nodes: 5
=========================================

I can describe this model in Excel with the formula:
class1 probability = IF(indicator1 < -0.01016;IF(indicator2 < 0.01842;2/9;103/132);IF(indicator3 < 0.00926;IF(indicator4 < -7.7E-4;1/25;12/32);14/19))
class2 probability = 1 - class1 probability

But if I have polynominal label (class1 to class8), I can't reconstruct RapidMiner calculations because information in the outputs is insufficient. Look at this code:

=========================================
W-BFTree
Best-First Decision Tree

indicator1 < -0.01016
|  indicator2 < 0.01842: Class2(7.0/2.0)
|  indicator2 >= 0.01842: Class1(103.0/29.0)
indicator1 >= -0.01016
|  indicator3 < 0.00926
|  |  indicator4 < -7.7E-4: Class3(24.0/1.0)
|  |  iindicator4 >= -7.7E-4: Class4(20.0/12.0)
|  indicator3 >= 0.00926: Class5(14.0/5.0)

Size of the Tree: 9

Number of Leaf Nodes: 5
=========================================

Let's say we have an example where indicator1 < -0.01016 and indicator2 < 0.01842. What you can say is that it belongs to class2 with 7/9 probability. But how the rest 2/9 is distributed between other classes? You can't say that, though Rapidminer will give you confidence level for every single class in its output. I use some postprocessing and it really important for me to be able to reproduce these hidden calculations. Does anyone know how to?

Same goes to some other learning methods, for instance W-BayesNet. I was unable to determine how those output confidence levels are calculated from model output.
Tagged: