Repository Paths in RapidMiner Studio

sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,426  Community Manager
Repository Paths are very powerful tools in RapidMiner but can be tricky to understand for beginners. Let's go over the basics...

What's a Repository?

A repository is simply a folder that holds all of your RapidMiner data sets (we call them "ExampleSets), processes, and other file objects that you will create using RapidMiner Studio. This folder can be stored locally on your computer, or on a RapidMiner Server.

Let's assume you're just working on your local computer for now. When you start RapidMiner Studio for the first time, it will automatically create a "Local Repository" for you. If you want to see where it is, just go to your .RapidMiner folder on your computer, go into "repositories", and then into "Local Repository". You should see two folders here: 'data' and 'processes'. These correspond EXACTLY to what you see in the Repository panel in RapidMiner Studio.

                        

[You will notice there some extra 'properties' files on your computer that you do not see in RapidMiner. They are storing metadata and you should just ignore them]

How does RapidMiner find things in your Local Repository?

Let's use the standard Retrieve operator to see how this works. Let's say you drag the Titanic data set from the Samples folder and put it in a process:



RapidMiner automatically inserted the repository path for this ExampleSet in the "repository entry" parameter. What does this mean? The most important item here is the double-forward-slash '//' in front. This means that that the path starts from your 'root' folder (we call this an absolute path). The Samples folder is a special, built-in one that you cannot see, but that's what it means. So what if you change "Samples" to "Local Repository" in the path? You get an error!!



Why? Because RapidMiner Studio is looking for an ExampleSet in your Local Repository -> data folder, and it's not there. Now copy-and-paste that Titanic data set from the Samples to Local Repository -> data and run again. Great! It found the ExampleSet right where you told it to.



Why won't processes with 'Retrieve' work when I share them with someone else?

Most likely this is due to repository path errors. For example, say you shared that process we just did with a friend. It would not work because it would be looking for 'Titanic' on her computer under Local Repository -> data, and most likely it's not there!


How can I define repository paths in RapidMiner so that I can share them with someone else?

The best way to do this is with relative repository paths, rather than absolute paths (the double slash //). You can do this as follows:

- Save your data set (ExampleSet) in the same folder as your process. Then just put the name of the ExampleSet in 'repository entry' with no other path information. RapidMiner will automatically look inside the same folder as the process if no path is specified.



- Save your data set (ExampleSet) in the data folder in your Local Repository and change the repository path to '../data/[name]'. When RapidMiner sees those '..' before a path, it looks for a folder next to the folder with the current process.


varunm1
Sign In or Register to comment.