How can I evaluate my Auto Model for predicting future failures of vehicles/ components?

FlachmannFlachmann Member Posts: 1 Newbie
Hello Rapid Miner Community,

I am very new to data mining and rapid miner in general. What I learned is, that a model can be evaluated by dividing the data set into a training data set and an evaluation data set, e.g. 90/10. How do I implement an evaluation in the Auto Modeller? I have already looked in the community for entries on this topic, but found nothing suitable.
To my dataset:
I have a data set of 35k vehicles of which 1.5k vehicles have already failed due to a defect in a component. In my analysis I only look at one component at a time. My data consists of the respective IDs of the vehicles, mileage, production date, first registration date, repair date and various components of a vehicle such as engine, transmission, brake, weight variant, etc. In addition I analyze the components of the vehicles, because I have an indication of a mileage from only 14k vehicles out of these 35k. In total I have 22 columns.
My goal:
I would like to create a time series analysis, with which I can drop out the vehicles in 3 or 6 months or up to 200k km, for example.
My following steps would be a time series analysis with Arima to create. But first I want to understand and evaluate the results of the Auto Modeler.


  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @Flachmann

    Automodel cannot distinguish time-series data yet. If you add time-series data to auto model, it will treat the data as regular data. So, we recommend you create a process manually for time series data in rapidminer, if you need some help about this, there are tutorials by @tftemme. Once you install time series extension, you can also see time series example processes in rapidminer.

    Auto model validation: As a general question, auto model do train and test data. The auto model splits data into 60:40 ratio data sets. The 60 percent of data is used for training and the remaining 40 percent of data for testing the model. The final performance available is related to the 40 percent dataset.

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
    Hi @Flachmann

    As @varunm1 correctly pointed out, Auto-Model is not (yet) able to handle time series data. Just to add on the information on the time series extension: It is not needed to install the extension anymore (the extension is bundled since RM version 9.0.0). The linked blog posts are still valid, but describe an older version of the extension. If you look for more information about time series analysis you can also have a look at the time series related videos on the rapidminer academy and on tutorial process of the time series operators in the product.

    To be able to still use AutoModel on time series data, you could apply the Windowing operator first, store the resulting ExampleSet and use this as an input for AutoModel.

    Best regards,
Sign In or Register to comment.