RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

How to loop over Excel files in a folder and append them to one Example Set containing all files

ella_ella_ Member Posts: 3 Contributor I
edited February 1 in Help
Hi,

I was trying to loop over 3 Excel files in a folder and append them to one Example Set containing all 3 Example Sets.
Plotting over a continuously id shows that unfortunately my final Example Set contains 3 times the first Example Set.

I was looping over the parameter index of the select operator which selects a file of the loop files subprocess which contains the read Excel operator.

Can someone please help me to solve the problem?

Best

Ella

Attached Process:

<?xml version="1.0" encoding="UTF-8"?><process version="9.5.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.5.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="concurrency:loop_parameters" compatibility="9.5.001" expanded="true" height="82" name="Loop Parameters" width="90" x="45" y="85">
        <list key="parameters">
          <parameter key="Select.index" value="[1.0;3;2;linear]"/>
        </list>
        <parameter key="error_handling" value="fail on error"/>
        <parameter key="log_performance" value="true"/>
        <parameter key="log_all_criteria" value="false"/>
        <parameter key="synchronize" value="false"/>
        <parameter key="enable_parallel_execution" value="true"/>
        <process expanded="true">
          <operator activated="true" class="concurrency:loop_files" compatibility="9.5.001" expanded="true" height="82" name="Loop Files" width="90" x="45" y="85">
            <parameter key="filter_type" value="glob"/>
            <parameter key="recursive" value="false"/>
            <parameter key="enable_macros" value="false"/>
            <parameter key="macro_for_file_name" value="file_name"/>
            <parameter key="macro_for_file_type" value="file_type"/>
            <parameter key="macro_for_folder_name" value="folder_name"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="read_excel" compatibility="9.5.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="85">
                <parameter key="sheet_selection" value="sheet number"/>
                <parameter key="sheet_number" value="1"/>
                <parameter key="imported_cell_range" value="A1"/>
                <parameter key="encoding" value="SYSTEM"/>
                <parameter key="first_row_as_names" value="true"/>
                <list key="annotations"/>
                <parameter key="date_format" value=""/>
                <parameter key="time_zone" value="SYSTEM"/>
                <parameter key="locale" value="English (United States)"/>
                <parameter key="read_all_values_as_polynominal" value="false"/>
                <list key="data_set_meta_data_information"/>
                <parameter key="read_not_matching_values_as_missings" value="true"/>
                <parameter key="datamanagement" value="double_array"/>
                <parameter key="data_management" value="auto"/>
              </operator>
              <connect from_op="Read Excel" from_port="output" to_port="output 1"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="select" compatibility="9.5.001" expanded="true" height="68" name="Select" width="90" x="246" y="85">
            <parameter key="index" value="1"/>
            <parameter key="unfold" value="false"/>
          </operator>
          <connect from_op="Loop Files" from_port="output 1" to_op="Select" to_port="collection"/>
          <connect from_op="Select" from_port="selected" to_port="output 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="append" compatibility="9.5.001" expanded="true" height="82" name="Append" width="90" x="179" y="85">
        <parameter key="datamanagement" value="double_array"/>
        <parameter key="data_management" value="auto"/>
        <parameter key="merge_type" value="all"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="9.5.001" expanded="true" height="82" name="Generate ID" width="90" x="313" y="85">
        <parameter key="create_nominal_ids" value="false"/>
        <parameter key="offset" value="0"/>
      </operator>
      <connect from_op="Loop Parameters" from_port="output 1" to_op="Append" to_port="example set 1"/>
      <connect from_op="Append" from_port="merged set" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Generate ID" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>



Jasmine_

Answers

  • varunm1varunm1 Moderator, Member Posts: 1,203   Unicorn
    edited February 1
    Hello @ella_

    Why don't you directly loop files and append, instead of using the loop parameters? 

    One issue I found is in the "Select" operator. You hardcoded "1" in that, you change it to macro %{execution_count} and check if you are getting all three.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

    Jasmine_
Sign In or Register to comment.