Repository Paths in RapidMiner Studio

sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
Repository Paths are very powerful tools in RapidMiner but can be tricky to understand for beginners. Let's go over the basics...

What's a Repository?

A repository is simply a folder that holds all of your RapidMiner data sets (we call them "ExampleSets), processes, and other file objects that you will create using RapidMiner Studio. This folder can be stored locally on your computer, or on a RapidMiner Server.

Let's assume you're just working on your local computer for now. When you start RapidMiner Studio for the first time, it will automatically create a "Local Repository" for you. If you want to see where it is, just go to your .RapidMiner folder on your computer, go into "repositories", and then into "Local Repository". You should see two folders here: 'data' and 'processes'. These correspond EXACTLY to what you see in the Repository panel in RapidMiner Studio.

                        

[You will notice there some extra 'properties' files on your computer that you do not see in RapidMiner. They are storing metadata and you should just ignore them]

How does RapidMiner find things in your Local Repository?

Let's use the standard Retrieve operator to see how this works. Let's say you drag the Titanic data set from the Samples folder and put it in a process:



RapidMiner automatically inserted the repository path for this ExampleSet in the "repository entry" parameter. What does this mean? The most important item here is the double-forward-slash '//' in front. This means that that the path starts from your 'root' folder (we call this an absolute path). The Samples folder is a special, built-in one that you cannot see, but that's what it means. So what if you change "Samples" to "Local Repository" in the path? You get an error!!



Why? Because RapidMiner Studio is looking for an ExampleSet in your Local Repository -> data folder, and it's not there. Now copy-and-paste that Titanic data set from the Samples to Local Repository -> data and run again. Great! It found the ExampleSet right where you told it to.



Why won't processes with 'Retrieve' work when I share them with someone else?

Most likely this is due to repository path errors. For example, say you shared that process we just did with a friend. It would not work because it would be looking for 'Titanic' on her computer under Local Repository -> data, and most likely it's not there!


How can I define repository paths in RapidMiner so that I can share them with someone else?

The best way to do this is with relative repository paths, rather than absolute paths (the double slash //). You can do this as follows:

- Save your data set (ExampleSet) in the same folder as your process. Then just put the name of the ExampleSet in 'repository entry' with no other path information. RapidMiner will automatically look inside the same folder as the process if no path is specified.



- Save your data set (ExampleSet) in the data folder in your Local Repository and change the repository path to '../data/[name]'. When RapidMiner sees those '..' before a path, it looks for a folder next to the folder with the current process.


Comments

  • sonny_planktonsonny_plankton Member Posts: 2 Contributor I
    Hello,

    thank you for your remarks on this topic.

    Is it possible to retrieve python scripts (.py) as well with the retrieve operator? Although the file is
    stored on the same Github repo as the main process it returns an "can not retrieve repository data" error.

    It does however work well with the open file operator. But then we facing the problem with an absolute
    path where the .py file was once stored.

    Thanks in advance. highly appreciated
    sonny.


  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @sonny_plankton I'm going to cc my colleague @Marco_Boeck to see if he has any insight here.

    Scott
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    Not right now. But without spoiling too much, we have something in the pipeline that will make the cross-functional team experience that much better and is very much related to your question :)

    Regards,
    Marco
  • sonny_planktonsonny_plankton Member Posts: 2 Contributor I
    Thank you very @sgenzer and @Marco_Boeck for taking the time to answer.

    Looking forward to the enhancements connecting our python and RapidMiner specialists. If anyone facing
    the same issue, we have simply stored the py-Script as text File in the operator.

    Philipp
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @sonny_plankton yes good progress is being made here. Keep an eye out for our next beta release; you may find what you're looking for.  ;)

    Scott

Sign In or Register to comment.