RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

"[SOLVED] Reading multiple worksheets from a single Excel file"

dominiciuidominiciui Member Posts: 2 Contributor I
edited June 2019 in Help
Hello All -

How do you load multiple worksheets from a single Excel spreadsheet file whereby the worksheets' attributes differ from each other.

What I found was:

'Loop Files' - ['Read Excel'] (as the nested operator) --> 'Append'

The 'Loop Files' provides me with a directory - do I just move the Excel spreadsheet to it's own folder even though it's a single file?

The 'Read Excel' provides with the import wizard - which only allows me to import one sheet.

I appreciate your help!
Dominic
Tagged:

Answers

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458   Unicorn
    Hello

    You could use the Loop operator containing the Read Excel operator. Define an iteration macro in the Loop operator and use it as the sheet number for the Read Excel operator. One gotcha will be the imported cell range which may need to be different for different sheets.

    Here's a simple example
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="loop" compatibility="6.0.002" expanded="true" height="76" name="Loop" width="90" x="246" y="210">
            <parameter key="set_iteration_macro" value="true"/>
            <parameter key="iterations" value="2"/>
            <process expanded="true">
              <operator activated="true" class="read_excel" compatibility="6.0.002" expanded="true" height="60" name="Read Excel" width="90" x="179" y="120">
                <parameter key="excel_file" value="d:\tmp\loopBook.xlsx"/>
                <parameter key="sheet_number" value="%{iteration}"/>
                <parameter key="imported_cell_range" value="A1:C2"/>
                <parameter key="first_row_as_names" value="false"/>
                <list key="annotations">
                  <parameter key="0" value="Name"/>
                </list>
                <list key="data_set_meta_data_information">
                  <parameter key="0" value="att1.true.integer.attribute"/>
                  <parameter key="1" value="att2.true.integer.attribute"/>
                  <parameter key="2" value="att3.true.integer.attribute"/>
                </list>
              </operator>
              <connect from_op="Read Excel" from_port="output" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    regards

    Andrew
  • dominiciuidominiciui Member Posts: 2 Contributor I
    Andrew -

    Thank you very much for your help and code! Problem solved.

    Best,
    Dominic 
Sign In or Register to comment.