🦉🦉   WOOT WOOT!   RAPIDMINER WISDOM 2020 EARLY BIRD REGISTRATION ENDS FRIDAY DEC 13!   REGISTER NOW!   🦉🦉
NOTE: IF YOU WISH TO REPORT A NEW BUG, PLEASE POST A NEW QUESTION AND TAG AS "BUG REPORT". THANK YOU.

loop files recursively isn't working as expected

kaymankayman Member Posts: 416   Unicorn
edited August 10 in Product Feedback
I've just noticed that the recursive setting of loop files isn't making any real difference. I created some test setting as follows : 

[FileFolder]
    File1.txt
    File2.txt
    File3.txt
    File4.txt
    [NestedFileFolder]
       File5.txt
       File6.txt

and only the content of File 1 to 4 is loaded, The files in the nested folder (5 and 6) are ignored whether I select or deselect the recursive setting.

Using RM9.3 on windows 10, and the test files were on a shared network drive
Tghadially
0
0 votes

Sent to Engineering · Last Updated

RM-4180

Comments

  • varunm1varunm1 Moderator, Member Posts: 965   Unicorn
    edited August 10
    Hello @kayman

    I tried with 5 csv files with the recursive option set on RM 9.3 and Windows 10. It worked fine for me. I have a directory inside which there are two subdirectories.

    Maybe it's with txt files, will check and see.

    UPDATE: I tried .txt files as well and it did read the files in subdirectories as well. I will try with your's if you can share XML and files. I tried the folder in BOX drive.
    Tghadially
  • kaymankayman Member Posts: 416   Unicorn
    Hi @varunm1 , it seems only a problem when using shared network folders (using windows 10).

    So when my folder is on a networked drive, and there is a folder within, it only shows the content in the master folder and ignores the included folders.

    If however I copy the exact same folder structure on my local disc I get the included data as expected.

    Whether I use the full path to the shared folder, or select it as a mounted drive doesn't make a difference, only the main folder files are loaded. So I suspect the path logic might be a bit different when using a shared network folder versus a local folder.

    I've attached my test process, but as you cannot simulate my server environment it is probably not very useful.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true">
          <operator activated="true" class="concurrency:loop_files" compatibility="9.3.001" expanded="true" height="82" name="Loop Files" width="90" x="179" y="34">
            <parameter key="directory" value="\\servername\SharedServerFolder\files"/>
            <parameter key="filter_type" value="glob"/>
            <parameter key="recursive" value="true"/>
            <parameter key="enable_macros" value="false"/>
            <parameter key="macro_for_file_name" value="file_name"/>
            <parameter key="macro_for_file_type" value="file_type"/>
            <parameter key="macro_for_folder_name" value="folder_name"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="false"/>
            <process expanded="true">
              <operator activated="true" class="text:read_document" compatibility="8.2.000" expanded="true" height="68" name="Read Document" width="90" x="447" y="34">
                <parameter key="extract_text_only" value="true"/>
                <parameter key="use_file_extension_as_type" value="true"/>
                <parameter key="content_type" value="txt"/>
                <parameter key="encoding" value="UTF-8"/>
                <description align="center" color="transparent" colored="false" width="126"/>
              </operator>
              <connect from_port="file object" to_op="Read Document" to_port="file"/>
              <connect from_op="Read Document" from_port="output" to_port="output 1"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">using network folder full path</description>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="8.2.000" expanded="true" height="82" name="Documents to Data" width="90" x="313" y="34">
            <parameter key="text_attribute" value="doc_content"/>
            <parameter key="add_meta_information" value="false"/>
            <parameter key="datamanagement" value="double_sparse_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="use_processed_text" value="false"/>
            <description align="center" color="transparent" colored="false" width="126">outcome shows 4 files (main folder only)</description>
          </operator>
          <operator activated="true" class="concurrency:loop_files" compatibility="9.3.001" expanded="true" height="82" name="Loop Files (2)" width="90" x="179" y="340">
            <parameter key="directory" value="C:\Users\me\files"/>
            <parameter key="filter_type" value="glob"/>
            <parameter key="recursive" value="true"/>
            <parameter key="enable_macros" value="false"/>
            <parameter key="macro_for_file_name" value="file_name"/>
            <parameter key="macro_for_file_type" value="file_type"/>
            <parameter key="macro_for_folder_name" value="folder_name"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="false"/>
            <process expanded="true">
              <operator activated="true" class="text:read_document" compatibility="8.2.000" expanded="true" height="68" name="Read Document (2)" width="90" x="447" y="34">
                <parameter key="extract_text_only" value="true"/>
                <parameter key="use_file_extension_as_type" value="true"/>
                <parameter key="content_type" value="txt"/>
                <parameter key="encoding" value="UTF-8"/>
                <description align="center" color="transparent" colored="false" width="126"/>
              </operator>
              <connect from_port="file object" to_op="Read Document (2)" to_port="file"/>
              <connect from_op="Read Document (2)" from_port="output" to_port="output 1"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">Using local folder, same structure</description>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="8.2.000" expanded="true" height="82" name="Documents to Data (2)" width="90" x="313" y="340">
            <parameter key="text_attribute" value="doc_content"/>
            <parameter key="add_meta_information" value="false"/>
            <parameter key="datamanagement" value="double_sparse_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="use_processed_text" value="false"/>
            <description align="center" color="transparent" colored="false" width="126">outcome shows 6 files, as expected</description>
          </operator>
          <operator activated="true" class="concurrency:loop_files" compatibility="9.3.001" expanded="true" height="82" name="Loop Files (3)" width="90" x="179" y="187">
            <parameter key="directory" value="V:\SharedServerFolder\files"/>
            <parameter key="filter_type" value="glob"/>
            <parameter key="recursive" value="true"/>
            <parameter key="enable_macros" value="false"/>
            <parameter key="macro_for_file_name" value="file_name"/>
            <parameter key="macro_for_file_type" value="file_type"/>
            <parameter key="macro_for_folder_name" value="folder_name"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="false"/>
            <process expanded="true">
              <operator activated="true" class="text:read_document" compatibility="8.2.000" expanded="true" height="68" name="Read Document (3)" width="90" x="447" y="34">
                <parameter key="extract_text_only" value="true"/>
                <parameter key="use_file_extension_as_type" value="true"/>
                <parameter key="content_type" value="txt"/>
                <parameter key="encoding" value="UTF-8"/>
                <description align="center" color="transparent" colored="false" width="126"/>
              </operator>
              <connect from_port="file object" to_op="Read Document (3)" to_port="file"/>
              <connect from_op="Read Document (3)" from_port="output" to_port="output 1"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">using network drive as local share</description>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="8.2.000" expanded="true" height="82" name="Documents to Data (3)" width="90" x="313" y="187">
            <parameter key="text_attribute" value="doc_content"/>
            <parameter key="add_meta_information" value="false"/>
            <parameter key="datamanagement" value="double_sparse_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="use_processed_text" value="false"/>
            <description align="center" color="transparent" colored="false" width="126">outcome shows 4 files (main folder only)</description>
          </operator>
          <connect from_op="Loop Files" from_port="output 1" to_op="Documents to Data" to_port="documents 1"/>
          <connect from_op="Documents to Data" from_port="example set" to_port="result 1"/>
          <connect from_op="Loop Files (2)" from_port="output 1" to_op="Documents to Data (2)" to_port="documents 1"/>
          <connect from_op="Documents to Data (2)" from_port="example set" to_port="result 3"/>
          <connect from_op="Loop Files (3)" from_port="output 1" to_op="Documents to Data (3)" to_port="documents 1"/>
          <connect from_op="Documents to Data (3)" from_port="example set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>
    


    varunm1
  • varunm1varunm1 Moderator, Member Posts: 965   Unicorn
    Thanks @kayman for your response. Lets see if @Marco_Boeck has some suggestion
    kaymanTghadially
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Administrator, Moderator, Employee, Member, University Professor Posts: 1,851   RM Engineering
    Hi,

    hmpf, that makes no sense code-wise. It obviously is the same logic in either case, so I have to suspect that for some reason, the subfolders are not listed to Java when it queries the contents of the folder..
    Files.walkFileTree(fileSystem.getPath(path), EnumSet.of(FileVisitOption.FOLLOW_LINKS), Integer.MAX_VALUE, visitor);
    And as we rely on whatever Java gets told by the OS, I'm afraid that I cannot do much :(


    Regards,
    Marco
    Tghadiallyvarunm1
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Administrator, Moderator, Employee, Member, University Professor Posts: 1,851   RM Engineering
    Internal note for when this is moved to investigations: RM-4180
    Tghadially
Sign In or Register to comment.