SOLVED: Passing parameters to operators from the command line

krishnakukrishnaku Member Posts: 3 Contributor I
edited November 2018 in Help
Hi,

I'm a newbie, so apologies if this question is trivial. I have a very simple import preprocessor reads a set of CSV files and imports them into the repository. However,I have nearly 100 files to import, all of whom are in the same format. I would like to run this importer process from the command line passing in the  csvFile  argument to the the ReadCSV Operator and the repositoryEntry argument to the Store operator as command line parameters. Is there a way of doing this?

Krishna

Answers

  • [Deleted User][Deleted User] Posts: 0 Learner II
    Hi Krishna,

    Using the following process you do not have to execute the process for each file separately.
    At first you have to save the CSV files you want to import in a folder on your computer.

    Then set the folder path as directory in the Loop Files operator.
    You have 2 possibilities to store the files.
    The first operator would store each data set separately in the repository under the name of the CSV file.
    The second operator would store the data sets in one file in the repository.


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.017">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
        <process expanded="true" height="666" width="962">
          <operator activated="true" class="loop_files" compatibility="5.1.017" expanded="true" height="60" name="Loop Files" width="90" x="45" y="30">
            <parameter key="directory" value="INSERT DIRECTORY"/>
            <process expanded="true" height="666" width="962">
              <operator activated="true" class="read_csv" compatibility="5.1.017" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
                <list key="annotations"/>
                <list key="data_set_meta_data_information"/>
              </operator>
              <operator activated="true" class="store" compatibility="5.1.017" expanded="true" height="60" name="Each File Separately" width="90" x="246" y="30">
                <parameter key="repository_entry" value="%{file_name}"/>
              </operator>
              <operator activated="true" class="handle_exception" compatibility="5.1.017" expanded="true" height="76" name="One Repository entry" width="90" x="246" y="120">
                <process expanded="true" height="666" width="456">
                  <operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
                    <parameter key="repository_entry" value="one_entry"/>
                  </operator>
                  <operator activated="true" class="append" compatibility="5.1.017" expanded="true" height="94" name="Append" width="90" x="179" y="30"/>
                  <connect from_port="in 1" to_op="Append" to_port="example set 1"/>
                  <connect from_op="Retrieve" from_port="output" to_op="Append" to_port="example set 2"/>
                  <connect from_op="Append" from_port="merged set" to_port="out 1"/>
                  <portSpacing port="source_in 1" spacing="0"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                  <portSpacing port="sink_out 1" spacing="0"/>
                  <portSpacing port="sink_out 2" spacing="0"/>
                </process>
                <process expanded="true" height="666" width="456">
                  <operator activated="true" class="store" compatibility="5.1.017" expanded="true" height="60" name="Store (2)" width="90" x="45" y="30">
                    <parameter key="repository_entry" value="one_entry"/>
                  </operator>
                  <connect from_port="in 1" to_op="Store (2)" to_port="input"/>
                  <connect from_op="Store (2)" from_port="through" to_port="out 1"/>
                  <portSpacing port="source_in 1" spacing="0"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                  <portSpacing port="sink_out 1" spacing="0"/>
                  <portSpacing port="sink_out 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="file object" to_op="Read CSV" to_port="file"/>
              <connect from_op="Read CSV" from_port="output" to_op="Each File Separately" to_port="input"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
            </process>
          </operator>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>


    Best,

    Edin
  • krishnakukrishnaku Member Posts: 3 Contributor I
    Thank you. This was very helpful.

    I had to make two modifications to the second process to make it work correctly: first, there was a missing Store operator in the Try block of the HandleException operator. I also had to create a initial HandlException block to delete any existing value for the "one_entry" entry so that the process did the right thing if you ran it more than once on the same file.

    It was interesting to see Try-Catch blocks being used to handle the initialization of a  loop invariant. Is this the idiomatic way of handling this scenario in RapidMiner? I struggled for quite a bit to figure out how to create an empty data set with a given metadata signature for initializing the Append operation and finally figured out a somewhat ugly way of doing it using using LoopCollection and selecting the first item and subtracting it from itself.

    This method is simpler, but the idea of using HandleException here is a bit scary since it will swallow unexpected errors if I understand the behavior correctly. Is there any way to test for the exception class one handles so that we can re-propagate unexpected exceptions?

    Krishna
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    just two comments:
    • instead of "Retrieve" and "Store" you could also use "Remember" and "Recall" - a scenario like this is exactly the one those two operator have been created for;
    • instead of "Handle Exception", which would indeed hide all exceptions which might not be desired, you could also use a Branch operator checking for the input or a macro storing the information if this is the first iteration.
    Cheers,
    Ingo
  • krishnakukrishnaku Member Posts: 3 Contributor I
    Thanks! I like this version much better.

    Interesting thing about RM is it makes the really complicated algorithms simple, but one has to re-learn how to do simple loops etc in the new idioms of operators and macros etc.. am getting the hang of it... slowly :)


    krishna
Sign In or Register to comment.