10-05-2016 05:44 AM
I'm currently building my own credit risk model in RM and I have an issue after one of the steps. Without going to much into detail about the model itself, here a the steps leading to my issue:
1) Bin numeric attributes
2) Obtain number of defaults and non-defaults in each bin
3) Make certain calculations
My current outcome is a table looking like this:
|range1 [-∞ - 0.149]||19,0||29,0||,2||,1||,7||-,7||,1|
|range2 [0.149 - 0.304]||19,0||30,0||,2||,1||,6||-,6||,1|
|range3 [0.304 - 0.453]||13,0||36,0||,1||,1||,4||-,1||,0|
|range4 [0.453 - 0.680]||14,0||35,0||,1||,1||,4||-,2||,0|
But this is only for 1 attribute, while I need to do this for at least 10-15 attributes. What I specifically need is the above output, but with an extra column on the left where the attribute is named next to the bin, and with all the attributes below each other. Thus, for the above example, it would result in:
|Attribute 1||range1 [-∞ - 0.149]||19,0||29,0||,2||,1||,7||-,7||,1|
|Attribute 1||range2 [0.149 - 0.304]||19,0||30,0||,2||,1||,6||-,6||,1|
|Attribute 1||range3 [0.304 - 0.453]||13,0||36,0||,1||,1||,4||-,1||,0|
|Attribute 1||range4 [0.453 - 0.680]||14,0||35,0||,1||,1||,4||-,2||,0|
|Attribute 2||range1 [-∞ - 0.011]||9,0||39,0||,1||,1||,2||,4||,0|
|Attribute 2||range2 [0.011 - 0,024]||6,0||43,0||,1||,1||,1||,9||,1|
|Attribute 2||range3 [0.024 - 0.037]||5,0||44,0||,1||,2||,1||1,1||,1|
|Attribute 2||range4 [0,037 - ∞]||8,0||41,0||,1||,1||,2||,5||,0|
And so on for al the attributes. Now I have to perform and hard code al the attributes seperately, which is not very efficient.
I already tried the loop attributes operator, but I don't seem to get it working.
I used the standard credit risk model data set available in RapidMiner. If I need to add more detail regarding the process itself, just ask!
Solved! Go to Solution.
10-05-2016 11:16 AM
This was a bit trickier than I anticipated, but the attached process shows you how to do this using loops. The first loop subprocess bins the variables and then creates an attribute corresponding to the relevant attribute name. The second loop goes through these attributes and aggregates the data with the performance information you want, once for each attribute, and then appends them all together into one large table.
This was done using some randomly generated data, so obviously you will need to modify the process to refer to your attributes and then calculate the performance variables you are interested in through the aggregate operator, but the general structure should show you how this would work. You may also need to rename your attributes to take advantage of the macro capabilities here, but that is easily done in the second loop after you have already created the attribute name in the first loop. I hope this is helpful.