RapidMiner Wisdom Banner

Repository Paths in RapidMiner Studio

sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,773  Community Manager
Repository Paths are very powerful tools in RapidMiner but can be tricky to understand for beginners. Let's go over the basics...

What's a Repository?

A repository is simply a folder that holds all of your RapidMiner data sets (we call them "ExampleSets), processes, and other file objects that you will create using RapidMiner Studio. This folder can be stored locally on your computer, or on a RapidMiner Server.

Let's assume you're just working on your local computer for now. When you start RapidMiner Studio for the first time, it will automatically create a "Local Repository" for you. If you want to see where it is, just go to your .RapidMiner folder on your computer, go into "repositories", and then into "Local Repository". You should see two folders here: 'data' and 'processes'. These correspond EXACTLY to what you see in the Repository panel in RapidMiner Studio.

                        

[You will notice there some extra 'properties' files on your computer that you do not see in RapidMiner. They are storing metadata and you should just ignore them]

How does RapidMiner find things in your Local Repository?

Let's use the standard Retrieve operator to see how this works. Let's say you drag the Titanic data set from the Samples folder and put it in a process:



RapidMiner automatically inserted the repository path for this ExampleSet in the "repository entry" parameter. What does this mean? The most important item here is the double-forward-slash '//' in front. This means that that the path starts from your 'root' folder (we call this an absolute path). The Samples folder is a special, built-in one that you cannot see, but that's what it means. So what if you change "Samples" to "Local Repository" in the path? You get an error!!



Why? Because RapidMiner Studio is looking for an ExampleSet in your Local Repository -> data folder, and it's not there. Now copy-and-paste that Titanic data set from the Samples to Local Repository -> data and run again. Great! It found the ExampleSet right where you told it to.



Why won't processes with 'Retrieve' work when I share them with someone else?

Most likely this is due to repository path errors. For example, say you shared that process we just did with a friend. It would not work because it would be looking for 'Titanic' on her computer under Local Repository -> data, and most likely it's not there!


How can I define repository paths in RapidMiner so that I can share them with someone else?

The best way to do this is with relative repository paths, rather than absolute paths (the double slash //). You can do this as follows:

- Save your data set (ExampleSet) in the same folder as your process. Then just put the name of the ExampleSet in 'repository entry' with no other path information. RapidMiner will automatically look inside the same folder as the process if no path is specified.



- Save your data set (ExampleSet) in the data folder in your Local Repository and change the repository path to '../data/[name]'. When RapidMiner sees those '..' before a path, it looks for a folder next to the folder with the current process.


varunm1Jasmine_

Comments

  • sonny_planktonsonny_plankton Member Posts: 1 Newbie
    Hello,

    thank you for your remarks on this topic.

    Is it possible to retrieve python scripts (.py) as well with the retrieve operator? Although the file is
    stored on the same Github repo as the main process it returns an "can not retrieve repository data" error.

    It does however work well with the open file operator. But then we facing the problem with an absolute
    path where the .py file was once stored.

    Thanks in advance. highly appreciated
    sonny.


    Jasmine_
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,773  Community Manager
    hi @sonny_plankton I'm going to cc my colleague @Marco_Boeck to see if he has any insight here.

    Scott
    Jasmine_
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Administrator, Moderator, Employee, Member, University Professor Posts: 1,885   RM Engineering
    Hi,

    Not right now. But without spoiling too much, we have something in the pipeline that will make the cross-functional team experience that much better and is very much related to your question :)

    Regards,
    Marco
    hughesfleming68Jasmine_sgenzer
Sign In or Register to comment.