Command Line Launch "Cannot resolve relative repository location"

dragoljubdragoljub Member Posts: 241 Contributor II
Hi RM Gurus,

I have a problem running the command line version of RM (Linux 64bit) when I try to access repositories. I get the following error. How can I associate a repository with the process I am attempting to launch via the command line.

Here is the command I launch: ( I already setup the command line script to have proper paths for java and RM home and an empty process completes successfully. : ] )

./rapidminer/scripts/rapidminer -f ./Data/Repository/TS.rmp

Here is my simple process:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
   <process expanded="true" height="539" width="1820">
     <operator activated="true" class="retrieve" compatibility="5.0.8" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
       <parameter key="repository_entry" value="Data"/>
     </operator>
     <operator activated="true" class="normalize" compatibility="5.0.8" expanded="true" height="94" name="Normalize" width="90" x="179" y="30">
       <parameter key="create_view" value="true"/>
       <parameter key="method" value="range transformation"/>
     </operator>
     <operator activated="true" class="sample_stratified" compatibility="5.0.8" expanded="true" height="76" name="Sample (Stratified)" width="90" x="313" y="30">
       <parameter key="sample_size" value="1000"/>
     </operator>
     <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
     <connect from_op="Normalize" from_port="example set output" to_op="Sample (Stratified)" to_port="example set input"/>
     <connect from_op="Sample (Stratified)" from_port="example set output" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
   

Here is the error:

INFO: Process starts (Process.run())
2010-07-22 16:43:28 SEVERE: Process failed: Cannot resolve relative repository location 'Data'. Process is not associated with a repository. (RapidMinerCommandLine.run())
 com.rapidminer.operator.UserError: Cannot resolve relative repository location 'Data'. Process is not associated with a repository.
     com.rapidminer.Process.resolveRepositoryLocation(Process.java:1139)
     com.rapidminer.operator.Operator.getParameterAsRepositoryLocation(Operator.java:1286)
     com.rapidminer.operator.io.RepositorySource.getRepositoryEntry(RepositorySource.java:90)
     com.rapidminer.operator.io.RepositorySource.read(RepositorySource.java:104)
     com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:123)
     com.rapidminer.operator.Operator.execute(Operator.java:768)
     com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
     com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
     com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
     com.rapidminer.operator.Operator.execute(Operator.java:768)
     com.rapidminer.Process.run(Process.java:863)
     com.rapidminer.Process.run(Process.java:770)
     com.rapidminer.Process.run(Process.java:765)
     com.rapidminer.Process.run(Process.java:755)
     com.rapidminer.RapidMinerCommandLine.run(RapidMinerCommandLine.java:132)
     com.rapidminer.RapidMinerCommandLine.main(RapidMinerCommandLine.java:168)
2010-07-22 16:43:28 SEVERE: Here:           Process[1] (Process)
          subprocess 'Main Process'
      ==>   +- Retrieve[1] (Retrieve)
            +- Normalize[0] (Normalize)
            +- Sample (Stratified)[0] (Sample (Stratified))
            +- Retrieve (2)[0] (Retrieve) (RapidMinerCommandLine.run())
2010-07-22 16:43:28 SEVERE: Process not successful (RapidMinerCommandLine.run())
Thanks in advance,
-Gagi

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    seems to me, RM executes your process, but since the process is started from file, it isn't attached to any repository at all. Hence you cannot use relative paths inside the process to access a repository.

    Greetings,
      Sebastian
  • dragoljubdragoljub Member Posts: 241 Contributor II
    I gathered that much. I cannot figure out how to set a direct path in the retrieve operator without getting the same error. Every time I set the full path I get the same error.

    In the XML code for Retrieve there are two settings: the repository_entry key and the value which is the path to the entry. Right now running on Linux I don't have any repository set up but I would like to read the .ioo file which contains my data. How should I set up my process to access this file. Perhaps retrieve is the wrong operator for this.

    Is there a way to setup a repository path that will always be checked on Linux?

    Here is an example of a full path set:
      
    <operator activated="true" class="retrieve" compatibility="5.0.8" expanded="true" height="60" name="P6B746 IDD Data" width="90" x="45" y="30">
           <parameter key="repository_entry" value="./full/path/to/file.ioo"/>
         </operator>

    I did some further testing:

    I can successfully Read a CSV file from the default repository however when I try to access an example-set from the repository using retrieve the process says it cannot find the file since the process is not associated with any repository.

    I cannot run this system in GUI mode so I never setup a repository. Any way to setup a repository via command line?

    Further testing:

    Under OSX (a Linux system I have GUI Access to) I was able to easily create a repository in GUI mode then access files from it by referencing them using the //repository_name/file_name notation. This type of referencing worked when launching from the command line by the RM once the correct rm_home path was set.

    Now that I know RM just parses //repository_name/file_name (without file extension by the way) how can I setup a repository via command line and use it in my command line mining server?

    Thanks all!
    -Gagi
  • dragoljubdragoljub Member Posts: 241 Contributor II
    To motivate this I am trying to launch RM on a distributed LSF (Load Sharing Facility) system where I have access to many 3.4GHz processor cores and 100s of Gigs of ram. It would be interesting to see if the parallel processing in RM can work in this way.

    ;D

    -Gagi
  • CleoCleo Member Posts: 44 Maven
    I managed to get Rapidminer to work on YD6.2 and Ubuntu10.04 without a GUI and the repository/mysql db were located on a different computer.

    These are the steps I followed.

    1) Extracted rapidminer to /opt/rapidminer
    2) cd /opt/rapidminer/lib
    3) java -jar rapidminer.jar
    4) downloaded the latest version and restarted
    5) cd /opt/rapidminer/scripts
    6) sudo chmod a-x rapidminer
    7) sudo nano rapidminer
    8) On the 6th line I added the text
    RAPIDMINER_HOME="/opt/rapidminer"
    9) sh rapidminer //RepositoryName/dir/filename


    Also try looking at the file repositories.xml in the hidden folder .Rapidminer5 located /home/username/.RapidMiner5 in my linux setup. 

    Cheers,
    Cleo
  • dragoljubdragoljub Member Posts: 241 Contributor II
    Thanks Cleo,

    I followed your procedure from a previous post. I noticed that you launched the GUI version and upgraded RM before proceeding. Unfortunately I cannot run the GUI version at all (no x windows allowed on the server) so I need a way to setup a repository and access it without ever running the GUI version.

    Where can I find a copy of repositories.xml. Can I create my own copy of this file and force rapid miner to recognize specific folders as repositories?  ???

    -Gagi
  • CleoCleo Member Posts: 44 Maven
    Hello Gagi,

    I am just guessing as in my setup I could run a GUI.  When I ran it from the command line, I first turned off the GUI to increase my free RAM.  Playstation 3 have around 256MB.

    On my Windows XP box the repository.xml is located at:
    C:\Documents and Settings\USERNAME\.RapidMiner5

    On the server try running the command
    locate .RapidMiner5

    On my Ubuntu 10.04 box is is located at
    /home/USERNAME/.RapidMiner5

    I took a quick look at the RapidMiner code, and could not find how to set this up without the GUI, but I'm not very familiar with the code.

    What I'd personally do, is install RapidMiner 5 on a Ubuntu box.  (If I didn't have a Ubuntu Box, I'd download Oracle VirtualBox and install one on my Windows or Mac)  Then do the steps I previously posted.  Then extract the downloaded RapidMiner5 without doing updating it at all (so it would be the same as the one on the server).  Run the command: diff -ru /opt/RapidMiner5 /WhereEver/Fresh/RapidMiner5/Is 

    See what files have changed and update the server accordingly. 

    You will also have to compare the hidden RapidMiner5 directory. 


    I'm sure there is a better way to do this, but this is what I'd try.

    Cheers,
    Cleo
  • dragoljubdragoljub Member Posts: 241 Contributor II
    Thanks Cleo,

    I am trying to include the repositories.xml into my home directory so I can reference the proper repository paths.

    Since the repositories are just in XML perhaps there is a way to include this xml code into a process file so that the specific repositories are always set when running the file? Anyone from the RM development team know if this will work?  ;D

    -Gagi
  • dragoljubdragoljub Member Posts: 241 Contributor II
    Update:

    Just thought I would let everyone know how to get around the problem of having no access to a GUI and wanting to run RapidMiner in batch (command line) mode.

    The 4 problems I encountered & solutions I found (after extracting rapid miner to a folder named 'rapidminer'):

    1. Setting the command line script to run RM in command line mode. ==> Edit the script 'rapidminer/scripts/rapidminer' and set JAVA_HOME and RAPIDMINER_HOME to the full paths of your systems java install directory and the directory you extracted rapidminer to, also set MAX_JAVA_MEMORY to the max amount of ram you can afford to use.

    2. Setting Repository Locations ==> In your home folder ~/.RapidMiner5/repositories.xml should be edited to contain (at least) one local repository where you can set the path to the specific location on your system where the repository files will be stored. See below:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <repositories>
      <localRepository>
        <file>/full/path/to/your/repository/Repository</file>
        <alias>Repository Name</alias>
      </localRepository>
    </repositories>
    3. Accessing Extensions such as parallel processing, etc. ==> Instll RM on a system where you have access to a GUI, update RM and install all necessary extensions using the Help>Update RapidMiner menu. After that is complete check the hidden folder in your home directory again under ~/.RapidMiner5/. This time a folder called managed appears with all the extensions installed. Copy this folder into the same hidden folder in the system you have no GUI access to.

    4. Setting Number of Parallel Threads. ==> create a file in your rapidminer directory called rapidminerrc and enter the following on the first line: replace the number with the number of cores you want to experement with.
    rapidminer.parallel.number_of_threads = 8
    I hope this helps someone! I am currently using this system to run batch jobs on a distributes system. Performance seems strangely slow for now so I will report back if I get some massive performance boost.

    :P
Sign In or Register to comment.