Loop Files & Process Documents from Files remote directories
We have installed RapidAnalytics (RA) and would like to use RapidMiner (RM) to execute remote processes on remote data residing in the same remote server where RA is running.
We used Loop Files and Process Documents from Files and all seemed to be working.
What we did was we used //remoteReposityName/path-name to indicate the files are on the remote server. Mind you the processes are residing on the remote server i.e. we store them and run them on the remote server.
All works well but no files are processed! I looked at the log file and the Java stack reports , NO ERRORS! but it seems no files were opened and processed. But the processes were executed without any errors, seemingly on no data. It seems the processes are executed and return immediately without reporting any error and do nothing.
Our entire project hinges upon processing files remotely on RA and we only use RM for the operators to design the processes and schedule. So this is an important part of our project to select your platform for data mining.
Re: Loop Files & Process Documents from Files remote directories
your requirements can certainly be fulfilled, but let us clarify some things:
- You are processing files, right? That means they are located somewhere on the RA server, let's say in C:\path\to\files. In this case, you have to use this path in Loop Files and Process Documents from Files.
- As you write you are currently using //remoteReposityName/path-name. However, Loop Files and Process Documents from Files work completely independent from the repository, see previous sentence. The repository is only used to store the processes, but has nothing to do with any files on the server's disk.
- If you want to reference items in the remote repository, you have to use *always* relative paths. So if your process is stored in //remoteRepositoryName/path/to/process, and you want to reference another object in the repository located in //remoteRepositoryName/another/path/to/data, you have to reference it via "../../another/path/to/data" from within the process.
As a best practice I suggest to copy parts of your data files to your local machine and run the processes locally to test if everything is working. It is always easier to debug the process locally than to do it remotely.
Finally, as you most certainly know, to run the process remotely in the server you can either use the button left of the "normal" Run-button, or use one of the menu entries "Run on RapidAnalytics now" or "Schedule on RapidAnalytics Server" in the process menu.