Problem with Loop Files

ripkarsripkars Member Posts: 4 Contributor I
edited June 27 in Help
Hello everybody

I'm willing to write a process whose aim is reading all csv files from a directory and perform the very same operation on them.

I have this problem with the Loop Files operator and its subprocess.

The Loop operator looks like this
Loop [Filter: '*.csv', Directory: /home/riccardo/Workspace/unrealtournament3-dmtm2010/Training Data, File Name Macro: file_name File Path Macro: file_path etc] (the directory is made of two parts....could it be a problem? I also tried to rename it to Training_Data but hadn't got any success ... )

Inside it I have put a Read CSV operator where the File Name is set to %{file_path} (and another operator just for the sake of connecting the output somewhere).

The error I get is:
Cannot create example set meta data: Could not read file 'null': /home/riccardo/file_path (No such file or directory)..

Shouldn't RapidMiner set the value at runtime for each of the CSV file in that directory?

Why is this process broken??

(Please answer me asap as I need to finish this work by today, 23:59 UTC +01:00)

Answers

  • ripkarsripkars Member Posts: 4 Contributor I
    I try to write the same file to xrff format....not working
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="logfile" value="/home/riccardo/Workspace/unrealtournament3-dmtm2010/Processes/Log/Loop.log"/>
        <parameter key="resultfile" value="/home/riccardo/resulloop.res"/>
        <process expanded="true" height="632" width="1044">
          <operator activated="true" class="loop_files" expanded="true" height="60" name="Loop Files" width="90" x="108" y="92">
            <parameter key="directory" value="/home/riccardo/Workspace/unrealtournament3-dmtm2010/Training Data"/>
            <parameter key="filter" value="'*.csv'"/>
            <process expanded="true" height="650" width="1062">
              <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="246" y="75">
                <parameter key="file_name" value="%{file_path}.csv"/>
              </operator>
              <operator activated="true" class="write_xrff" expanded="true" height="60" name="Write XRFF" width="90" x="380" y="75">
                <parameter key="example_set_file" value="/home/riccardo/Workspace/unrealtournament3-dmtm2010/Training Data/pippo.xrff"/>
              </operator>
              <connect from_op="Read CSV" from_port="output" to_op="Write XRFF" to_port="input"/>
              <portSpacing port="source_in 1" spacing="0"/>
            </process>
          </operator>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
  • haddockhaddock Member Posts: 849  Guru
    Hi there,

    The devil is always in the detail, it was the regex '*.csv' in this case .  The following logs my files..
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.0" expanded="true" name="Process">
        <process expanded="true" height="632" width="1044">
          <operator activated="true" class="loop_files" compatibility="5.0.0" expanded="true" height="76" name="Loop Files" width="90" x="108" y="92">
            <parameter key="directory" value="C:\Documents and Settings\Alien\My Documents\rm_workspace"/>
            <parameter key="filter" value=".*csv"/>
            <parameter key="iterate_over_subdirs" value="true"/>
            <process expanded="true" height="296" width="705">
              <operator activated="true" class="provide_macro_as_log_value" compatibility="5.0.8" expanded="true" height="76" name="Provide Macro as Log Value" width="90" x="179" y="30">
                <parameter key="macro_name" value="file_path"/>
              </operator>
              <operator activated="true" class="provide_macro_as_log_value" compatibility="5.0.8" expanded="true" height="76" name="Provide Macro as Log Value (2)" width="90" x="380" y="30">
                <parameter key="macro_name" value="file_name"/>
              </operator>
              <operator activated="true" class="log" compatibility="5.0.8" expanded="true" height="76" name="Log" width="90" x="585" y="30">
                <list key="log">
                  <parameter key="path" value="operator.Provide Macro as Log Value.value.macro_value"/>
                  <parameter key="name" value="operator.Provide Macro as Log Value (2).value.macro_value"/>
                </list>
              </operator>
              <connect from_port="in 1" to_op="Provide Macro as Log Value" to_port="through 1"/>
              <connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Provide Macro as Log Value (2)" to_port="through 1"/>
              <connect from_op="Provide Macro as Log Value (2)" from_port="through 1" to_op="Log" to_port="through 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
            </process>
          </operator>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
    So now you have time for a splendid dinner as well!

    Ciao.
  • ripkarsripkars Member Posts: 4 Contributor I
    Thank you very much for your interest! Now it works!
  • haddockhaddock Member Posts: 849  Guru
    Nice one! Have fun..
  • cherokeecherokee Member Posts: 82  Guru
    Hi!

    haddock, your solution is as always correct (and fast). Nevertheless I'm a bit confused. Of course the regexp "*.csv" does not express what is intended but isn't it also not well-formed. The star at the beginning is the problem; what is to be present zero or more times? Shouldn't there be some kind of MalformedRegExpException (I'm not shure of the correct name right now)?

    Best regards,
    chero
  • haddockhaddock Member Posts: 849  Guru
    Greets Chero,

    As I see it *.csv would choke the parrot, because, as you say,  * has to follow what it can repeat, but '*.csv' ( notice the single quotes ) would not. I use RegexBuddy for all this regex stuff ( brill ), about which I understand zippo!

    Ciao!

  • cherokeecherokee Member Posts: 82  Guru
    Hi haddock,

    of course you are right. I missed the single quotes  :-[

    Best regards,
    chero
  • cthielcthiel Member Posts: 16  Maven
    ripkars wrote:

    The error I get is:
    Cannot create example set meta data: Could not read file 'null': /home/riccardo/file_path (No such file or directory)..

    Shouldn't RapidMiner set the value at runtime for each of the CSV file in that directory?
    Coming back to the original post: why does RM not replace the name of the macro with the content?

    I'm running into this issue in plenty of places, see thread at
    http://rapid-i.com/rapidforum/index.php/topic,2304.0.html

    Oddly, my processes all function, but I get plenty of "Cannot create example set meta data".

    Debugging this error class since 5+ hours... Any help would be appreciated!

    Christian
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,525   Unicorn
    Hi,
    as I already wrote in another thread: Macros are only evaluated during run time, because they are assigned only by the execution of the respective operators. Unfortunately their value simply can't be known during execution time! Hence they can't be replaced with their values during meta data transformation and this might result in errors.
    You can't solve anything without taking a look at the actual data...

    Greetings,
      Sebastian
Sign In or Register to comment.