[SOLVED] RapidMiner Working Directory on Windows?

srt19170 · January 2013

I am trying to use relative path names for files (e.g., ReadCSV) on Windows, to facilitate easily moving processes between machines with different file structures. The working directory for my RapidMiner gets set to USERPROFILE. Is there an easy way to set the working directory to some more useful value, i.e., on startup? Any thoughts?

BTW, I thought to use %{process_path} but that is getting set to the (useless) value "\\Repository-Name\Process-Name". Since process_path is documented as being the absolute path name for the current process, this seems to be broken. Comments?

Thanks in advance (as always!) for any help.

srt19170 · January 2013

No one has any good ideas?

MariusHelf · January 2013

Hi,

it would greatly help us if you described your usecase a bit more detailed.

In general, however, the concept of RapidMiner is based on repositories, not on files. So the process_path macro is working correctly, since it specifies the absolute path of the process in the repository.

If you want to share processes and data, I strongly recommend you to install the RapidAnalytics server. The feature that will be useful for you is the possibility to define a so-called Remote RapidAnalytics Repository, which you can access from any RapidMiner instance in the network and share both processes and data. That means that you have to import you csv files only once to the server, and can then use that data seamlessly from any RapidMiner instance. It is even possible to execute long-running processes on the server, such that you workstation is not blocked, or to execute recurring tasks automatically on a daily, weekly or user-defined basis.

Another possibility to store the data would be a central SQL database - again you would import your data only once, and then access them directly from the server without using any files.

Last, but not least, it is possible to run RapidMiner processes from the command line and pass macros to that process with the -M parameter. The syntax would be:

rapidminer.exe //repository-name/path-to/process -Mmacro1=value1 -Mmacro2=value2

Best regards,
Marius

srt19170 · January 2013

Marius, thanks for your reply. Let me give you a few more details.

I'm working on Windows with 5.2. I have a repository named "Test Repository" that is stored at (say) C:\My Repositories\Test Repository. I create a new process in that repository, name it "pathtest" and add the "Read CSV" operator. I set the file name for the operator to be "test.csv" and then I run the process. I get the following error:

The file 'java.io.FileNotFoundException: C:\Documents and Settings\srt19170\text.csv (The system cannot find the file specified)' does not exist.
... For a given filename, RapidMiner resolves the filename against the directory the experiment file is stored in...

This seems to be a clear error. RapidMiner is not resolving the filename against the directory the experiment file is stored in (C:\My Repositories\Test Repository) but rather against the value of $USERPROFILE$ (at least on Windows). (Side note: You should correct the error message to remove the outdated 'experiment' reference.) I haven't checked all the operators, but "Read Excel" exhibits the same behavior, so I'm assuming this is broken in all the I/O operators.

Let's move on to %{process_path}. According to the documentation, this should be set to the "absolute path name for the current process". In this case I would expect that to be "C:\My Repositories\Test Repository\pathtest". If I change Read CSV to use the filename "%{process_path}\test.csv", I get the error:

The file 'java.io.FileNotFoundException: \\Test Repository\pathtest\text.csv (The system cannot find the file specified)' does not exist.

The value of %{process_path} is clearly not the "absolute path name for the current process." It's not even a legitimate UNC pathname in Windows -- it's trying to use the name of the repository as a server name.

So regardless of my particular problem, it seems to me that both of these are "broken" at least in the sense that they don't operate as described in the documentation.

As to my problem, I appreciate the suggestion to use a central server, but that's not an option in my case. The two machines I'm moving the process between don't share any common infrastructure. Defining a macro in the RapidMiner start up may be my best option, but I think RapidMiner should have some capability to provide the path name of the current process.

srt19170 · January 2013

rapidminer.exe //repository-name/path-to/process -Mmacro1=value1 -Mmacro2=value2

Marius, can you verify that this actually works? If I try this with my RapidMiner 5.3 on Windows, RapidMiner hangs at the "Frame" step.

MariusHelf · January 2013

Sorry, that was wrong information, you have to use a script from the scripts folder and add some quotes. So from the rapidminer directory run the following:

scripts\rapidminer "//LocalRepository/test/my process" -M"macro=value"

That should get the job done

You'll probably see a NullPointerException emerging from the RapidMiner.quit() method, which is obviously a bug, but does not harm the process execution. I already filed an internal bug for that issue.

MariusHelf · January 2013

srt19170 wrote:

I'm working on Windows with 5.2. I have a repository named "Test Repository" that is stored at (say) C:\My Repositories\Test Repository. I create a new process in that repository, name it "pathtest" and add the "Read CSV" operator. I set the file name for the operator to be "test.csv" and then I run the process. I get the following error:

The file 'java.io.FileNotFoundException: C:\Documents and Settings\srt19170\text.csv (The system cannot find the file specified)' does not exist.
... For a given filename, RapidMiner resolves the filename against the directory the experiment file is stored in...

This seems to be a clear error. RapidMiner is not resolving the filename against the directory the experiment file is stored in (C:\My Repositories\Test Repository) but rather against the value of $USERPROFILE$ (at least on Windows). (Side note: You should correct the error message to remove the outdated 'experiment' reference.) I haven't checked all the operators, but "Read Excel" exhibits the same behavior, so I'm assuming this is broken in all the I/O operators.

Yes, that's obviously an error, both in the behaviour and in the produced message. I'll report it to the developers.

Let's move on to %{process_path}. According to the documentation, this should be set to the "absolute path name for the current process". In this case I would expect that to be "C:\My Repositories\Test Repository\pathtest". If I change Read CSV to use the filename "%{process_path}\test.csv", I get the error:

The file 'java.io.FileNotFoundException: \\Test Repository\pathtest\text.csv (The system cannot find the file specified)' does not exist.

The value of %{process_path} is clearly not the "absolute path name for the current process." It's not even a legitimate UNC pathname in Windows -- it's trying to use the name of the repository as a server name.

As I explained in my previous post, this actually is the absolute path of the current process, though not in terms of file system path, but in terms of RapidMiner repository paths. That will probably not be changed.

Defining a macro in the RapidMiner start up may be my best option, but I think RapidMiner should have some capability to provide the path name of the current process.

I'll propose that to the developers, but I don't think that we'll introduce such a functionality. After all, the process is located in a repository, and the repository should not contain anything but RapidMiner processes and RapidMiner data, so providing the process path encourages the user to misuse the repository folder to place his data files there. If you are dealing with file system paths, please use external scripts to find them and start RapidMiner via the command line.

However, we will probably introduce new possibilities of storing non-RapidMiner files in the repository. That will give you the following possibility:
- Copy your CSV-file into the repository folder
- In RapidMiner, use the Open File operator to create a file object from your file*
- pass the file object directly into the Read CSV operator
- continue as if you were reading the CSV directly from disk (as you did before)
This will probably not be part of the next release, but is planned for the future.

* using the Open File operator to read files from the repository folder you no longer have the need to know the file system path of your processes.

Best regards,
Marius

srt19170 · January 2013

Marius wrote:

Yes, that's obviously an error, both in the behaviour and in the produced message. I'll report it to the developers.

Thanks. Fixing that bug would solve my problems, at least.

Marius wrote:

As I explained in my previous post, this actually is the absolute path of the current process, though not in terms of file system path, but in terms of RapidMiner repository paths. That will probably not be changed.

Let me suggest then that you clarify 2.1 in the Operators Reference, because that at least implies that it is a file path.

Marius wrote:

I'll propose that to the developers, but I don't think that we'll introduce such a functionality.

Thanks. I think there's a lot of value to be able to make relative file references. For example, suppose I put together a tool using RapidMiner and want to distribute that to other users. Not knowing their file system, or where they install it makes relative references very useful.

Marius wrote:

scripts\rapidminer "//LocalRepository/test/my process" -M"macro=value"

As I pointed out in a separate topic, the scripts seem to be broken in 5.3. Does this actually work for you on Windows? If I fix the script and use the above command line, rapidminer hangs after printing "INFO: rapidminer.home is 'C:\Program Files\Rapid-I\RapidMiner5\scripts\..'." If I remove the "-M" portion I can get it to run. But the Macro capability seems to be broken. rapidminerGUI.bat seems to have some other issues as well -- it hangs in a different place.

(BTW, it's a little disheartening to install a new release and discover that the basic run scripts don't work. It seems like there was no testing at all of the scripts before the 5.3 release.)

The other problem with this solution is that (at least in 5.3), you cannot set a macro value on the command line without specifying a process, and if you specify a process, RapidMiner runs the process and exits. This makes it impossible to open RapidMiner to do development and set the macro as well.

One possibility for me is to use Set Macro and force it to be executed first. On every new machine I can modify the Set Macro to hold the appropriate file path. Unfortunately, if I have a lot of machines and a lot of processes, this ends up being a lot of work and error prone. Of course, if the command line option to set a macro can be fixed this won't be necessary.

MariusHelf · January 2013

srt19170 wrote:

Let me suggest then that you clarify 2.1 in the Operators Reference, because that at least implies that it is a file path.

You are right, I created an internal ticket suggesting to improve the documentation.

Thanks. I think there's a lot of value to be able to make relative file references. For example, suppose I put together a tool using RapidMiner and want to distribute that to other users. Not knowing their file system, or where they install it makes relative references very useful.

I understand your point, but usually if you supply such a tool, it is finished and the user only wants to run processes. Thus the mechanism of passing macros to processes from the command line should be sufficient. Anyway, as stated above, I'll raise that topic in the next planning meeting.

As I pointed out in a separate topic, the scripts seem to be broken in 5.3. Does this actually work for you on Windows? If I fix the script and use the above command line, rapidminer hangs after printing "INFO: rapidminer.home is 'C:\Program Files\Rapid-I\RapidMiner5\scripts\..'." If I remove the "-M" portion I can get it to run. But the Macro capability seems to be broken. rapidminerGUI.bat seems to have some other issues as well -- it hangs in a different place.

(BTW, it's a little disheartening to install a new release and discover that the basic run scripts don't work. It seems like there was no testing at all of the scripts before the 5.3 release.)

Yes, it works for me and all of our testers. From the output you posted it seems that you started the scripts from the scripts directory, and thus RapidMiner thinks that its home directory is located in the scripts directory. You have to either define the RAPIDMINER_HOME environment variable to point the the RapidMiner installation directory (not the scripts directory or any other subdir), or to start the script directly from the installation directory, i.e. with the exact command line I posted above, including the preprended "scripts\'. Please let me know if that solves your issues with the scripts. Otherwise, please post the complete output of the script.

The other problem with this solution is that (at least in 5.3), you cannot set a macro value on the command line without specifying a process, and if you specify a process, RapidMiner runs the process and exits. This makes it impossible to open RapidMiner to do development and set the macro as well.

That's how it is supposed to work - macros are set on a per-process basis, and the command is only supposed to be used to run already prepared processes. For development you'll have to find a work around, e.g. as you described below.

One possibility for me is to use Set Macro and force it to be executed first. On every new machine I can modify the Set Macro to hold the appropriate file path. Unfortunately, if I have a lot of machines and a lot of processes, this ends up being a lot of work and error prone. Of course, if the command line option to set a macro can be fixed this won't be necessary.

Using macros is the way to go here, though I would suggest to use the Process Context instead of Set Macro operators. That has 2 advantages:
1. you do not have to find the Set Macro operator in each process, but can edit all relevant macros in one single view.
2. using the Process Context allows you to overwrite the macros defined there from the command line, which is not as easily possible when using the Set Macro operator.

I would like to ask you to describe your setup with these multiple machines, what you are doing, and your usual workflow of developing and running processes. Maybe when having the big picture, we can give a better advice.

Best regards,
Marius

srt19170 · January 2013

First of all, thanks very much for your replies and patience! I'll mark this [SOLVED] since I think it's been beaten to death at this point

Marius wrote:

I would suggest to use the Process Context instead of Set Macro operators

Ah! I thought there must be something like this, but for the life of me I couldn't find it. Thanks for the pointer, I looked it up and found it in manual. Very useful.

Marius wrote:

Yes, it works for me and all of our testers. From the output you posted it seems that you started the scripts from the scripts directory, and thus RapidMiner thinks that its home directory is located in the scripts directory. You have to either define the RAPIDMINER_HOME environment variable to point the the RapidMiner installation directory (not the scripts directory or any other subdir), or to start the script directly from the installation directory, i.e. with the exact command line I posted above, including the preprended "scripts\'. Please let me know if that solves your issues with the scripts. Otherwise, please post the complete output of the script.

I've got a separate thread for the script problem, so I'll take this there.

MariusHelf · January 2013

Thanks to you, too, for the detailed error descriptions and objective discussion!

Happy Mining

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

[SOLVED] RapidMiner Working Directory on Windows?

Answers