Read Excel changes integer to integer

CharlieFirpoCharlieFirpo Member Posts: 48 Contributor II
edited September 2019 in Help
Dear All!

A have an excel table with a column that has text and number values as well. The first row of this column is the name (it is a text) of the RapidMiner attribute. The second row in excel is also a text. The third, fourth... are numbers, integers, like 1 2 3. These numbers are not 1.0 2.0 3.0. But when I use the Read Excel operator, RapidMiner changes these 1 2 3 numbers into 1.0 2.0 3.0. Why? How can I change them back? Using Real to Integer operator does not work, because this attribute has text value as well.

If I use Read CSV (reading a .csv with the same content), then during the read of the csv (using the Import Configuration Wizard), these 1 2 3 numbers are not change into 1.0 2.0 3.0. So reading csv is correct. I could use this operator, but it has a disadvantage. I can not set the Imported cell range. In Read Excel operator, there is a parameter to set the imported cell range.

So what is the solution? Read CSV or Read Excel operators do not read the tables automaticully. If I create a new (next) column in the csv or xls using not the RapidMiner, and after that I run my RapidMiner process, it will not see my new column. Why? Why the Read operators do not read the entire tables at every run of the process? They read only the actual values of the cell range set at first use of the Read operator.

Using Read Excel, I can set a large imported cell range, so if I create a new column or row in the table, the RapidMiner process will read the new column or row. But I'm not able to set the imported cell range at Read CSV.

But using Read Excel, my 1 2 3 numbers change into 1.0 2.0 3.0.

Is there any solution for my problem? I want a huge (not known) imported cell range, and I want my number in the format 1 2 3, not 1.0 2.0 3.0.

Thank you reading this and trying to help me!
Tagged:

Answers

  • CharlieFirpoCharlieFirpo Member Posts: 48 Contributor II
    If I create a new row in my .xls or .csv table, then the Read CSV operator will automatically read the new row. Of course at Read Excel, it is needed to set a huge imported cell range to read the new row.
    But the read operators do not read the new columns. Neither Read Excel if one set a huge imported cell range. It does not work. There are the 'data set meta data information' at both read operators, and here needed to add the new columns. But to add a huge number of columns does not work, because if an added column does not exist in the table, then the read operators will throw a failure message.

    So if one does not know the exact size of a table (the exact number of the columns), and the number of the columns can change by the time, it is necessary to use the Import Configuration Wizard if the number of the columns change or set the correct 'data set meta data information' manually.

    :(

    Nice day!
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Hi Charlie,

    It is possible, but unfortunately I can't recall exactly how. 

    I would have suggested using the Read Excel With Format operator & the using a regular expression to remove the formatting information, but it seems this is not working on my test. 
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.012">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.012" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_excel_format" compatibility="5.3.012" expanded="true" height="60" name="Read Excel with Format" width="90" x="45" y="30">
            <parameter key="excel_file" value="C:\Users\john.heath\Desktop\Test.xls"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.3.012" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="120">
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="regular_expression" value="(format)"/>
            <parameter key="invert_selection" value="true"/>
          </operator>
          <connect from_op="Read Excel with Format" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    I think it is something like this that should work though. 
    Good luck!
Sign In or Register to comment.