Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Process Documents from Files - file pattern

CharlieFirpoCharlieFirpo Member Posts: 48 Contributor II
edited October 2019 in Help
Dear all!

I have a folder that contains several different files (different in names and in extensions as well). I want only use the files that contains <text> string in filename. How can I set it in the file pattern parameter using Process Documents from Files operator? If I use *text* , RM does not read any files from the appropriate folder.
Let say I have these filenames: asd_text-fgh.txt, qwe.textrty.xls, ... I think *text* pattern should return these two filenames. Or not?

Thanks for reply!!!

Answers

  • venkateshvenkatesh Member Posts: 15 Contributor II
    CharlieFirpo wrote:

    Let say I have these filenames: asd_text-fgh.txt, qwe.textrty.xls, ... I think *text* pattern should return these two filenames. Or not?
    Thanks for reply!!!
    The pattern is a regular expression *text* will not work. try ".*text.*"

    Here is a sample process for your reference

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.007">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="loop_files" compatibility="5.3.007" expanded="true" height="76" name="Loop Files" width="90" x="179" y="75">
            <parameter key="directory" value="/tmp"/>
            <parameter key="filter" value=".*text.*"/>
            <process expanded="true">
              <operator activated="true" class="provide_macro_as_log_value" compatibility="5.3.007" expanded="true" height="76" name="Provide Macro as Log Value" width="90" x="45" y="30">
                <parameter key="macro_name" value="file_name"/>
              </operator>
              <operator activated="true" class="log" compatibility="5.3.007" expanded="true" height="76" name="Log" width="90" x="447" y="120">
                <list key="log">
                  <parameter key="file_name" value="operator.Provide Macro as Log Value.parameter.macro_name"/>
                </list>
              </operator>
              <connect from_port="file object" to_op="Provide Macro as Log Value" to_port="through 1"/>
              <connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Log" to_port="through 1"/>
              <connect from_op="Log" from_port="through 1" to_port="out 1"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Loop Files" from_port="out 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • CharlieFirpoCharlieFirpo Member Posts: 48 Contributor II
    Thank you for reply!

    Your process works for me (returns the expected filenames/paths), but when using the same regular expression at "Process Documents from Files" operator, it does not work :(
    If I use nothing in "file pattern" then my process work properly. But I want it not read all files but only that contains specific string...
Sign In or Register to comment.