Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

[SOLVED] Extract filename

rowan_growan_g Member Posts: 47 Contributor II
edited November 2018 in Help
Hi All,

I am operating the "Read" function on several .csv and .xls files and merging them. I want to generate a new attribute "file name" and list all the filenames against each example (row).
Any ideas on how to achieve that?

Many thanks!

Cheers,

Answers

  • Nils_WoehlerNils_Woehler Member Posts: 463 Maven
    Hi,

    you could start with something like Annotation to Data and Cartesian Product.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.009">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.009" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_csv" compatibility="5.3.009" expanded="true" height="60" name="Read CSV" width="90" x="112" y="75">
            <parameter key="csv_file" value="CSV FILE"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information"/>
          </operator>
          <operator activated="true" class="annotations_to_data" compatibility="5.3.009" expanded="true" height="76" name="Annotations to Data" width="90" x="246" y="75"/>
          <operator activated="true" class="cartesian_product" compatibility="5.3.009" expanded="true" height="76" name="Cartesian" width="90" x="380" y="75"/>
          <connect from_op="Read CSV" from_port="output" to_op="Annotations to Data" to_port="object"/>
          <connect from_op="Annotations to Data" from_port="annotations" to_op="Cartesian" to_port="left"/>
          <connect from_op="Annotations to Data" from_port="object through" to_op="Cartesian" to_port="right"/>
          <connect from_op="Cartesian" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="90"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Best,
    Nils
  • rowan_growan_g Member Posts: 47 Contributor II
    Excellent! Thanks Nils
  • rowan_growan_g Member Posts: 47 Contributor II
    The solution only works for one spreadsheet.

    How would I implement it in a "Loop Files" operator?

    Many thanks.

    Cheers,
  • SkirzynskiSkirzynski Member Posts: 164 Maven
    The "Loop File" operator provides a macro to the file path of the current file. You can use this macro to generate a new attribute. See the process below:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.009">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.009" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="loop_files" compatibility="5.3.009" expanded="true" height="76" name="Loop Files" width="90" x="45" y="30">
            <parameter key="directory" value="/home/marcin/temp/loop-file"/>
            <process expanded="true">
              <operator activated="true" class="read_csv" compatibility="5.3.009" expanded="true" height="60" name="Read CSV" width="90" x="112" y="30">
                <parameter key="csv_file" value="/home/marcin/forum.csv"/>
                <parameter key="first_row_as_names" value="false"/>
                <list key="annotations">
                  <parameter key="0" value="Name"/>
                </list>
                <parameter key="encoding" value="windows-1252"/>
                <list key="data_set_meta_data_information">
                  <parameter key="0" value="name.true.polynominal.attribute"/>
                  <parameter key="1" value="freq.true.integer.attribute"/>
                </list>
              </operator>
              <operator activated="true" class="generate_attributes" compatibility="5.3.009" expanded="true" height="76" name="Generate Attributes" width="90" x="246" y="30">
                <list key="function_descriptions">
                  <parameter key="source" value="macro(&quot;file_path&quot;)"/>
                </list>
              </operator>
              <connect from_port="file object" to_op="Read CSV" to_port="file"/>
              <connect from_op="Read CSV" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="5.3.009" expanded="true" height="76" name="Append" width="90" x="246" y="30"/>
          <connect from_op="Loop Files" from_port="out 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="90"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • rowan_growan_g Member Posts: 47 Contributor II
    Thanks for that. Works absolutely perfectly!
Sign In or Register to comment.