The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

"[SOLVED] Reading multiple worksheets from a single Excel file"

dominiciuidominiciui Member Posts: 2 Contributor I
edited June 2019 in Help
Hello All -

How do you load multiple worksheets from a single Excel spreadsheet file whereby the worksheets' attributes differ from each other.

What I found was:

'Loop Files' - ['Read Excel'] (as the nested operator) --> 'Append'

The 'Loop Files' provides me with a directory - do I just move the Excel spreadsheet to it's own folder even though it's a single file?

The 'Read Excel' provides with the import wizard - which only allows me to import one sheet.

I appreciate your help!
Dominic
Tagged:

Answers

  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    You could use the Loop operator containing the Read Excel operator. Define an iteration macro in the Loop operator and use it as the sheet number for the Read Excel operator. One gotcha will be the imported cell range which may need to be different for different sheets.

    Here's a simple example
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="loop" compatibility="6.0.002" expanded="true" height="76" name="Loop" width="90" x="246" y="210">
            <parameter key="set_iteration_macro" value="true"/>
            <parameter key="iterations" value="2"/>
            <process expanded="true">
              <operator activated="true" class="read_excel" compatibility="6.0.002" expanded="true" height="60" name="Read Excel" width="90" x="179" y="120">
                <parameter key="excel_file" value="d:\tmp\loopBook.xlsx"/>
                <parameter key="sheet_number" value="%{iteration}"/>
                <parameter key="imported_cell_range" value="A1:C2"/>
                <parameter key="first_row_as_names" value="false"/>
                <list key="annotations">
                  <parameter key="0" value="Name"/>
                </list>
                <list key="data_set_meta_data_information">
                  <parameter key="0" value="att1.true.integer.attribute"/>
                  <parameter key="1" value="att2.true.integer.attribute"/>
                  <parameter key="2" value="att3.true.integer.attribute"/>
                </list>
              </operator>
              <connect from_op="Read Excel" from_port="output" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    regards

    Andrew
  • Options
    dominiciuidominiciui Member Posts: 2 Contributor I
    Andrew -

    Thank you very much for your help and code! Problem solved.

    Best,
    Dominic 
Sign In or Register to comment.