The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Process Documents from Files - file pattern

CharlieFirpoCharlieFirpo Member Posts: 48 Contributor II
edited October 2019 in Help
Dear all!

I have a folder that contains several different files (different in names and in extensions as well). I want only use the files that contains <text> string in filename. How can I set it in the file pattern parameter using Process Documents from Files operator? If I use *text* , RM does not read any files from the appropriate folder.
Let say I have these filenames: asd_text-fgh.txt, qwe.textrty.xls, ... I think *text* pattern should return these two filenames. Or not?

Thanks for reply!!!


  • Options
    venkateshvenkatesh Member Posts: 15 Contributor II
    CharlieFirpo wrote:

    Let say I have these filenames: asd_text-fgh.txt, qwe.textrty.xls, ... I think *text* pattern should return these two filenames. Or not?
    Thanks for reply!!!
    The pattern is a regular expression *text* will not work. try ".*text.*"

    Here is a sample process for your reference

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.007">
      <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="loop_files" compatibility="5.3.007" expanded="true" height="76" name="Loop Files" width="90" x="179" y="75">
            <parameter key="directory" value="/tmp"/>
            <parameter key="filter" value=".*text.*"/>
            <process expanded="true">
              <operator activated="true" class="provide_macro_as_log_value" compatibility="5.3.007" expanded="true" height="76" name="Provide Macro as Log Value" width="90" x="45" y="30">
                <parameter key="macro_name" value="file_name"/>
              <operator activated="true" class="log" compatibility="5.3.007" expanded="true" height="76" name="Log" width="90" x="447" y="120">
                <list key="log">
                  <parameter key="file_name" value="operator.Provide Macro as Log Value.parameter.macro_name"/>
              <connect from_port="file object" to_op="Provide Macro as Log Value" to_port="through 1"/>
              <connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Log" to_port="through 1"/>
              <connect from_op="Log" from_port="through 1" to_port="out 1"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
          <connect from_op="Loop Files" from_port="out 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
  • Options
    CharlieFirpoCharlieFirpo Member Posts: 48 Contributor II
    Thank you for reply!

    Your process works for me (returns the expected filenames/paths), but when using the same regular expression at "Process Documents from Files" operator, it does not work :(
    If I use nothing in "file pattern" then my process work properly. But I want it not read all files but only that contains specific string...
Sign In or Register to comment.