Using decision tree operator to predict patterns of multiple data sets

lizi_luolizi_luo Member Posts: 2 Contributor I
edited December 2018 in Help

Dear Sirs/Madams,


I have thousands of datasets, and each data set has occupancy pattern data of an individual household in 5-min interval during two consecutive years. I would like to use the decision tree operator to predict the occupancy patterns using my data sets. However, it seems that the decision tree operator can only be connected with one dataset. May I know how to connect the operator with the multiple data sets please?


Thank you very much.

Best regards,




  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    So first I'd like to know a bit more about your use case because there are two options which, depending on how you want to use your model(s) will be very different. 


    Option 1: Using RapidMiner Join your datasets together into a single large dataset and generate a single decision tree model.

    Option 2: Using RapidMiner loop your datasets and generate thousands of decision tree models.  Each one relevant to a different dataset. 


    How do you want to use the results?  If you are wanting to generate a general model that fits all potential datasets then Option 1 is the best choice.  If however you are trying to build a model that is relevant to each individual property then Option 2 is best. 



  • Options
    lizi_luolizi_luo Member Posts: 2 Contributor I

    Thank you very much, JEdward. I would like to combine the data sets into a large one, but don’t know how to set data of each household in the combined one. The current setting (including header and time series) of individual household’s data has been shown in the figure.



    Would you please kindly suggest how to set the data format so that the combined data set can reflect the general situation in time series, and what operator is suitable for doing this?

  • Options
    kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @lizi_luo


    I am trying to interpret your statement 'combined data set can reflect the general situation in time series'.

    Do I get it right that those datasets have data for multiple households during the same period, which means timestamps may potentially overlap in different datasets? In this case joining or appending might not work as expected, so could you please describe the desired output in more details? Depending on the nature of the data, maybe there is also a need to make some aggregations for each timestamp?

Sign In or Register to comment.