The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Best Practices for Folder Structures in Repositories

MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
edited November 2018 in Knowledge Base

RapidMiner Repositories give you the option to store anything in folders. Here is a ‘best practice’ on how to organize the folders to make them easier to use.


There should be one folder per project.  This can be either at the top-level of your Local Repository or in a projects folder on the top level of a Server repository. Our proposed folder structure would be:


  • app
    • View 1
    • View 2
  • data
  • debug
  • models
  • processes
    • subprocesses
  • results
  • webservices

Note: Italic folders are not mandatory



The app folder contains all processes related to an app. In larger processes it makes sense to use subfolders for each View on the app – View 1, View 2, above. Only the global processes (like !Initialize) would be on the top level.



Simply contains all data used in the analysis.



From time to time it is needed to have debug data - mostly to test things during the design of the process. A common example would be a data base sample which might be used instead of the real, full database.



This is the main folder holding all processes of your analysis. It often makes sense to create a subprocess folder which contains function-like processes which are used throughout the main processes via Execute Process.


results / models

The results folder contains all results of the modelling process. Usually there are performance and models. In the case of multiple models - either because there are many different types of models you want to try, or because you want to predict many labels - it makes sense to have a dedicated folder for each model.



contains all processes which are used to offer a webservice. In rare cases a subprocess folder might be of use.

- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany
Sign In or Register to comment.