NOTE: IF YOU WISH TO REPORT A NEW BUG, PLEASE POST A NEW QUESTION AND TAG AS "BUG REPORT". THANK YOU.
Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

loop files recursively isn't working as expected

kaymankayman Member Posts: 662 Unicorn
edited August 2019 in Product Feedback
I've just noticed that the recursive setting of loop files isn't making any real difference. I created some test setting as follows : 

[FileFolder]
    File1.txt
    File2.txt
    File3.txt
    File4.txt
    [NestedFileFolder]
       File5.txt
       File6.txt

and only the content of File 1 to 4 is loaded, The files in the nested folder (5 and 6) are ignored whether I select or deselect the recursive setting.

Using RM9.3 on windows 10, and the test files were on a shared network drive
0
0 votes

Sent to Engineering · Last Updated

RM-4180

Comments

  • varunm1varunm1 Member Posts: 1,207 Unicorn
    edited August 2019
    Hello @kayman

    I tried with 5 csv files with the recursive option set on RM 9.3 and Windows 10. It worked fine for me. I have a directory inside which there are two subdirectories.

    Maybe it's with txt files, will check and see.

    UPDATE: I tried .txt files as well and it did read the files in subdirectories as well. I will try with your's if you can share XML and files. I tried the folder in BOX drive.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • kaymankayman Member Posts: 662 Unicorn
    Hi @varunm1 , it seems only a problem when using shared network folders (using windows 10).

    So when my folder is on a networked drive, and there is a folder within, it only shows the content in the master folder and ignores the included folders.

    If however I copy the exact same folder structure on my local disc I get the included data as expected.

    Whether I use the full path to the shared folder, or select it as a mounted drive doesn't make a difference, only the main folder files are loaded. So I suspect the path logic might be a bit different when using a shared network folder versus a local folder.

    I've attached my test process, but as you cannot simulate my server environment it is probably not very useful.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true">
          <operator activated="true" class="concurrency:loop_files" compatibility="9.3.001" expanded="true" height="82" name="Loop Files" width="90" x="179" y="34">
            <parameter key="directory" value="\\servername\SharedServerFolder\files"/>
            <parameter key="filter_type" value="glob"/>
            <parameter key="recursive" value="true"/>
            <parameter key="enable_macros" value="false"/>
            <parameter key="macro_for_file_name" value="file_name"/>
            <parameter key="macro_for_file_type" value="file_type"/>
            <parameter key="macro_for_folder_name" value="folder_name"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="false"/>
            <process expanded="true">
              <operator activated="true" class="text:read_document" compatibility="8.2.000" expanded="true" height="68" name="Read Document" width="90" x="447" y="34">
                <parameter key="extract_text_only" value="true"/>
                <parameter key="use_file_extension_as_type" value="true"/>
                <parameter key="content_type" value="txt"/>
                <parameter key="encoding" value="UTF-8"/>
                <description align="center" color="transparent" colored="false" width="126"/>
              </operator>
              <connect from_port="file object" to_op="Read Document" to_port="file"/>
              <connect from_op="Read Document" from_port="output" to_port="output 1"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">using network folder full path</description>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="8.2.000" expanded="true" height="82" name="Documents to Data" width="90" x="313" y="34">
            <parameter key="text_attribute" value="doc_content"/>
            <parameter key="add_meta_information" value="false"/>
            <parameter key="datamanagement" value="double_sparse_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="use_processed_text" value="false"/>
            <description align="center" color="transparent" colored="false" width="126">outcome shows 4 files (main folder only)</description>
          </operator>
          <operator activated="true" class="concurrency:loop_files" compatibility="9.3.001" expanded="true" height="82" name="Loop Files (2)" width="90" x="179" y="340">
            <parameter key="directory" value="C:\Users\me\files"/>
            <parameter key="filter_type" value="glob"/>
            <parameter key="recursive" value="true"/>
            <parameter key="enable_macros" value="false"/>
            <parameter key="macro_for_file_name" value="file_name"/>
            <parameter key="macro_for_file_type" value="file_type"/>
            <parameter key="macro_for_folder_name" value="folder_name"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="false"/>
            <process expanded="true">
              <operator activated="true" class="text:read_document" compatibility="8.2.000" expanded="true" height="68" name="Read Document (2)" width="90" x="447" y="34">
                <parameter key="extract_text_only" value="true"/>
                <parameter key="use_file_extension_as_type" value="true"/>
                <parameter key="content_type" value="txt"/>
                <parameter key="encoding" value="UTF-8"/>
                <description align="center" color="transparent" colored="false" width="126"/>
              </operator>
              <connect from_port="file object" to_op="Read Document (2)" to_port="file"/>
              <connect from_op="Read Document (2)" from_port="output" to_port="output 1"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">Using local folder, same structure</description>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="8.2.000" expanded="true" height="82" name="Documents to Data (2)" width="90" x="313" y="340">
            <parameter key="text_attribute" value="doc_content"/>
            <parameter key="add_meta_information" value="false"/>
            <parameter key="datamanagement" value="double_sparse_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="use_processed_text" value="false"/>
            <description align="center" color="transparent" colored="false" width="126">outcome shows 6 files, as expected</description>
          </operator>
          <operator activated="true" class="concurrency:loop_files" compatibility="9.3.001" expanded="true" height="82" name="Loop Files (3)" width="90" x="179" y="187">
            <parameter key="directory" value="V:\SharedServerFolder\files"/>
            <parameter key="filter_type" value="glob"/>
            <parameter key="recursive" value="true"/>
            <parameter key="enable_macros" value="false"/>
            <parameter key="macro_for_file_name" value="file_name"/>
            <parameter key="macro_for_file_type" value="file_type"/>
            <parameter key="macro_for_folder_name" value="folder_name"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="false"/>
            <process expanded="true">
              <operator activated="true" class="text:read_document" compatibility="8.2.000" expanded="true" height="68" name="Read Document (3)" width="90" x="447" y="34">
                <parameter key="extract_text_only" value="true"/>
                <parameter key="use_file_extension_as_type" value="true"/>
                <parameter key="content_type" value="txt"/>
                <parameter key="encoding" value="UTF-8"/>
                <description align="center" color="transparent" colored="false" width="126"/>
              </operator>
              <connect from_port="file object" to_op="Read Document (3)" to_port="file"/>
              <connect from_op="Read Document (3)" from_port="output" to_port="output 1"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">using network drive as local share</description>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="8.2.000" expanded="true" height="82" name="Documents to Data (3)" width="90" x="313" y="187">
            <parameter key="text_attribute" value="doc_content"/>
            <parameter key="add_meta_information" value="false"/>
            <parameter key="datamanagement" value="double_sparse_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="use_processed_text" value="false"/>
            <description align="center" color="transparent" colored="false" width="126">outcome shows 4 files (main folder only)</description>
          </operator>
          <connect from_op="Loop Files" from_port="output 1" to_op="Documents to Data" to_port="documents 1"/>
          <connect from_op="Documents to Data" from_port="example set" to_port="result 1"/>
          <connect from_op="Loop Files (2)" from_port="output 1" to_op="Documents to Data (2)" to_port="documents 1"/>
          <connect from_op="Documents to Data (2)" from_port="example set" to_port="result 3"/>
          <connect from_op="Loop Files (3)" from_port="output 1" to_op="Documents to Data (3)" to_port="documents 1"/>
          <connect from_op="Documents to Data (3)" from_port="example set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>
    


  • varunm1varunm1 Member Posts: 1,207 Unicorn
    Thanks @kayman for your response. Lets see if @Marco_Boeck has some suggestion
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering
    Hi,

    hmpf, that makes no sense code-wise. It obviously is the same logic in either case, so I have to suspect that for some reason, the subfolders are not listed to Java when it queries the contents of the folder..
    Files.walkFileTree(fileSystem.getPath(path), EnumSet.of(FileVisitOption.FOLLOW_LINKS), Integer.MAX_VALUE, visitor);
    And as we rely on whatever Java gets told by the OS, I'm afraid that I cannot do much :(


    Regards,
    Marco
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering
    Internal note for when this is moved to investigations: RM-4180
  • Shan_b_NimblShan_b_Nimbl Member Posts: 1 Learner I
    Good Day, could someone help with a solution?
Sign In or Register to comment.