🥳 RAPIDMINER 9.9 IS OUT!!! 🥳

The updates in 9.9 power advanced use cases and offer productivity enhancements for users who prefer to code.

CLICK HERE TO DOWNLOAD

Removing rows

rowan_growan_g Member Posts: 47 Contributor II
edited November 2018 in Help
Hi All,

I'm a complete newbie so any help is appreciated. I'm using the GUI as my coding skills are pretty poor.

I have several xls invoices I'm trying to append for analysis. Each of them has a name at row 25 and the invoice information from row 26 onwards. Each invoice will vary in the amount of total rows. Unfortunately there's a whole bunch of useless information in the bottom few rows of each invoice that I need to get rid of (footer).

I've got several invoice files in xls format (10+ files) that I am reading through "Loop files" with a nested "Read xls"
My cell range is A25:BO1000. All rows above A25 have data I don't want (invoice headings ect ect).

I'm reading the data successfully but am not able to append the files due to the data contained in the footer.

Any ideas how to get rid of these rows?

Thanks in advance!

Answers

  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Administrator, Moderator, Employee, Member, University Professor Posts: 1,958   RM Engineering
    Hi,

    is the useless information in the bottom always the same? In other words, do you know how many rows you need to cut at the end each time? If so, you can read your Excel files, add a "Filter Examples" operator with an inverted filter afterwards and be done with it. You just need to set 2 Macros before doing it. Have a look at this example process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.009">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.009" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.009" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="5.3.009" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="30">
            <parameter key="macro" value="end"/>
            <list key="additional_macros"/>
          </operator>
          <operator activated="true" class="generate_macro" compatibility="5.3.009" expanded="true" height="76" name="Generate Macro" width="90" x="313" y="30">
            <list key="function_descriptions">
              <parameter key="start" value="%{end}-25"/>
            </list>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="5.3.009" expanded="true" height="76" name="Filter Example Range" width="90" x="447" y="30">
            <parameter key="first_example" value="%{start}"/>
            <parameter key="last_example" value="%{end}"/>
            <parameter key="invert_filter" value="true"/>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Extract Macro" to_port="example set"/>
          <connect from_op="Extract Macro" from_port="example set" to_op="Generate Macro" to_port="through 1"/>
          <connect from_op="Generate Macro" from_port="through 1" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Regards,
    Marco
  • rowan_growan_g Member Posts: 47 Contributor II
    Thanks for that! It works. Very much appreciated.
    For the moment the amount of useless information still occupies the same amount of rows.

    How would you tackle a changing amount of rows?

    Cheers,

  • rowan_growan_g Member Posts: 47 Contributor II
    Any ideas on how you would calculate the end point of an exampleset with varying numbers of examples?
Sign In or Register to comment.