Are There Known Issues With Auto Model's Handling of GBTs?

Noel_D · March 20

Hi All-

I ran Auto Model on a data set (I selected Decision Tree, Random Forest, and Gradient Boosted Trees). Afterward, I went to look at the results (see below):

Image: https://us.v-cdn.net/6030995/uploads/editor/0x/c54atow6hsoj.jpg

For some reason, the only subheading under Gradient Boosted Trees that displays anything in the main/center pane is "Performance". All the rest of them only say "No results yet..." (see below):

Image: https://us.v-cdn.net/6030995/uploads/editor/hz/e6a4g3s8exhc.jpg

I'm particularly interested in looking at the Graphical Model "viewer" (below is the Random Forest's version):

Image: https://us.v-cdn.net/6030995/uploads/editor/gk/18li5fjo9rla.jpg

Is there any reason that Auto Model's only output for Gradient Boosted Trees appears to be a confusion matrix?

Thanks,
Noel

Noel_D · March 20

Hi again-

With respect to my question regarding Auto Model's handling of Gradient Boosted Trees...

In case it is helpful in answering/troubleshooting the question, I've attached the Results folder.

Best,
Noel

rjones13 · March 20

Hi @Noel_D,

Yes this is a known issue, and I think may be fixed in a recent release - out of interest what version are you running?

The GBT results are still there, you will need to save them and then individually access them from the AutoModel results folder.

Let me know if you need any more information.

Best,

Roland

Noel_D · March 20

Hi @rjones13-

Thanks for your response -- much appreciated.

I'm running version 10.3.001 of RM Studio.

Note: When I open the saved results for the Auto Model run (via the "Select Results Folder" on the initial Auto Model screen, I see all the same "No results yet..." for all the Gradient Boosted Tree artifacts except for the confusion matrix.

Did I misunderstand your post?

Best,
Noel

rjones13 · March 20

Hi @Noel_D,

Okay, I think a new version isn't too far away which will address that issue.

Unfortunately right now you won't be able to see the results via Auto Model at all. Where you save the results you should see the following structure in the repository.

Image: https://us.v-cdn.net/6030995/uploads/editor/mg/epj6d9j5f48j.png

If you expand out Gradient Boosted Trees, you will find all the content which appears in Auto Model:

Image: https://us.v-cdn.net/6030995/uploads/editor/l8/nd2zrp2n3cu7.png

Say, I want to see the Model Simulator, I can double click it here and use the Simulator:

Image: https://us.v-cdn.net/6030995/uploads/editor/1z/mfbiod6t23wu.png

Hope this makes sense.

Best,
Roland

Noel_D · March 20

Hi @rjones13-

Super helpful and makes sense, thanks.

Can you tell me which of the items in that directory structure will reveal the graphical model viewer?

Also, I'd like to "run" the viewer on a model I've already trained, but I don't know what operator/other process accomplishes that... Can you shed some light here, please?

rjones13 · March 21

Hi @Noel_D,

The item you'll be looking for is "Production Model".

Unfortunately, I'm not sure that after initial training the model is viewable. I've tested this a couple of times and I'm unable to view the trees further. May I ask what would be the use case for viewing the individual trees, just in case there's an alternative method?

Best,
Roland

Noel_D · March 21

Hi @rjones13-

I really appreciate your help. Please bear with me as my RapidMiner/machine learning expertise and experience is limited.

I'm working with time series data and have opted to frame the forecasting task as a classification exercise (as opposed to predicting numerical values). Toward that end, I've included a trinary label with the other input data and calculated features. The values of the label occur with roughly even frequency and correspond to "future" directional changes in the time series of interest.

I've used the Sliding Window Validation operator to train a GBT and it seems as though I'm getting rough 70% success in predicting the label's values (per the resulting confusion matrix). Perhaps cause for cautious optimism...

As much as possible, I'd like to understand and be able to describe how the model works, but the iterative nature of the training process and the potential complexity/scale of GBTs seem to make this challenging. (E.G. due to the span of the data and parameters I'm using, 200k+ example rows are examined during training and the model is composed of 50 trees.)

Double-clicking/opening the model itself in the Repository browser let's you view the trees but I'm not sure how to summarize things from that perspective. The text description embedded in the model contains a confusion matrix and some info about the importance of top and bottom 10 attributes, but I'm not sure what that data corresponds to (e.g. the confusion matrix only contains one training window's worth of predictions and the attribute metrics are a little different from the weights output by the validation operator during training).

It would be super helpful to know how to interpret the contents of the text description as well as learn what other insights can be gleaned from the model after it's created (and how to do so).

For example, the Weights, Predictions, and Simulator outputs in Auto Model seem useful, but, again, I'm not sure what the data represents or how to use summarize the simulator functionality. I know one can export the AM process and dig in, but it isn't a trivial task implementing that functionality in the process I built.

If you've made it this far, RJ, thank you very much -- you deserve a badge for sticking it out!

Best,
Noel

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Are There Known Issues With Auto Model's Handling of GBTs?

Answers