RapidMiner

‎10-06-2016 08:36 AM

Many a times you may want to run workflows repeatedly and may have to save the result, if you would like to ensure that such outputs are not overwritten so that you can go back and review the results. The obvious solution here is to ensure that the store path is changed before every run.

But this option makes it difficult to manage the paths as well as error prone, where if you run without changing, you may end up overwriting previous results

 

To solve this issue we recommend you should use a process that can create new time stamp based folders or paths

We recommend using a macro as a  folder name rather than the final entry name, since that will automaticall group items underone time stamp named folder

e.g /path/to/%{t}/modelname is better than  /path/to/%{t}_modelname

 

1) Using Repository Manipulation related operator. 

RapidMiner provides several operator to work with Repository Entries like Rename, Move, copy etc

Using an Operator like "Copy Repository Entry" where the source path points to top level folder /ProjectName/outputs and destination points to something like

/Projectresults/%{t}/ will copy the outputs folder and all its subfolder, items etc to a new path, the %{t} will be replaced by the system time stamp itself.

 

 

2) Use In-built macro directly This option is generally good when there is only one store operation in the entire workflow. RapidMiner provides an in-built macros which provides system time. The current system time is available as %{t}. So dropping in %{t} in the store path will automatically add a timestamp in the path and ensuring you are creating new folders. The time stamp are down to seconds level, so any process running more than a second should be fine.

 

 

 

2016-10-03 16_08_58-__Local Repository_do not overwrite_my workflow_ – RapidMiner Studio Large 7.2.0.png

 

3) Use in-built macro to set another macro

Since  %{t1} always gives current system time down the second level, having multiple store operators which execute even a second apart will create new folder.

And also for long running process you may not exactly when the store happened leaving you to guess the right path. However in most cases you will know the time when you triggered of the process. So the solution here is the capture the start time of the process into another macro.

This can be done using the "set Macro" operator.

Use the set macro as one of the first operator and capture the start time into another macro. e.g below t1 will have the start time.

Then use t1 wherever you want to replace the path with process start time. This way t1 will be unique thru the whole time,ensuring your outputs are saved under the same folder for the same run

2016-10-03 16_17_51-__Local Repository_do not overwrite_my workflow_ – RapidMiner Studio Large 7.2.0.png

 

 

 

 

 

 

Comments
Guru
Guru

thanks, nice tutorial, but what if I have 7 results outputs set in my context window, but my path for the new folder does not exist yet, and I want to create it as soon as the results come out, how would I do that?

Should I use a set macro for the first result that comes out, use that macro to create my folder, and set all further results in the context to be stored in that same macro folder path?

e.g like: 

../%{t1}/Testperformance

../%{t1}/Trainperformance..

...

@Fred12 Folders are automatically created, you dont need to worry about missing folder etc

Guru
Guru

ok,

but wouldnt it be possible to just set a macro at beginning, let's say 0, and immediately increment it by one when you run the process...

Then this macro with number 1 is set in the context window for the result entries like "blablabla..%{macro}"

then save the process after it has finished. In the next run, it should start with number 2, then 3 etc... and the context is also updated

Guru
Guru

what if I want to use the process context to save my results, is it possible to save my results into the newly folder that I just created?

can I use macros in the context window for e.g file paths?

Contributor II 781194025
Contributor II

Yea I have a hard time 'saving results' too since I like to use 'append' with the new results onto the old, 

 

but the first set of data doesn't have any 'old' results to append so I always have to generate a set first, 

 

irritating!