Why is the new Rapid Miner 9.4 removing some content for my excel document after importing.

AmosGHAmosGH Member Posts: 7 Learner I
edited September 2019 in Product Feedback - Resolved
I was importing an excel file into the new RAPID MINER 9.4 repository but after loading it, it removed most of my content before the hashtag.
My old Rapid miner was not giving me this issue. Help
0
0 votes

Declined · Last Updated

March 2020 - no votes since Sept 2019 - please comment if you want to revive this issue. PROD-916

Comments

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @AmosGH,

    There is in deed a bug in Read Excel operator when there are return carriages (\r)  and jump to the line (\n) in your initial excel file.
    I already described this bug in a previous thread.
    After experimentations and "a little luck", the workaround here is to replace the "return carriages" and the "jump to the line" by .... nothing via the operator Replace just after Read Excel.



    The process in attached file.
    PS : @sgenzer . I suggest to push this thread in "Bug Report" if the RM developpers are not already aware of this "bug".
    Thanks you,

    Regards,

    Lionel


  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    done. Thank you @lionelderkrikor
  • AmosGHAmosGH Member Posts: 7 Learner I
    Thank you, sir, your reply was helpful :) but aside writing the expression in the replace operator is there a permanent solution looking forward.
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @AmosGH,

    A priori, actually there is only this workaround (via  the Replace operator).
    There is no permanent solution actually, that's why Scott submitted this to RapidMiner's development team : in order to find a permanent solution (fix the problem).

    Regards,

    Lionel
  • AmosGHAmosGH Member Posts: 7 Learner I
    Thank you
  • aleboalebo Employee, Member Posts: 15 RM Product Management
    Hi @AmosGH ,
    The content is still there, it's just visible when you hover over the field. What is the expected behavior? We don't change the content when reading a file. I would consider this to be a feature request. 
    Regards,
    Andras

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi Andras,

    I think there is still a problem with the Read Excel operator associated with Execute Python operator when there is text in an example set : 
    When a such example set enters in the Execute Python operator, it seems that some examples are "splitted" in several examples.

    In entry of the Execute Python, and so after the Read Excel operator i have the following exampleset : 

     
     
    After the Execute Python operator , the example set becomes an example set with 122 examples : (The Execute Python does nothing)

    I think this behaviour is linked to the "carriage return" and "line feed" because if we put a Replace operator with the setting (described in my previous post) just before the Execute Python operator, all is allright.

    You can reproduce this behaviour by running the process in attached file (Enable/disable the Replace operator to see the difference).
    The Excel file (data) is provided by @AmosGH in a previous post.

    Regards,

    Lionel

    <?xml version="1.0" encoding="UTF-8"?><process version="9.4.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.4.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="9.4.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="187">
            <parameter key="excel_file" value="C:\Users\Lionel\Downloads\MoMoAt10hashtagonTwi2.xlsx"/>
            <parameter key="sheet_selection" value="sheet number"/>
            <parameter key="sheet_number" value="1"/>
            <parameter key="imported_cell_range" value="A1"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="first_row_as_names" value="true"/>
            <list key="annotations"/>
            <parameter key="date_format" value=""/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="read_all_values_as_polynominal" value="false"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Name.true.polynominal.attribute"/>
              <parameter key="1" value="Text.true.polynominal.attribute"/>
              <parameter key="2" value="Time.true.polynominal.attribute"/>
            </list>
            <parameter key="read_not_matching_values_as_missings" value="false"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <operator activated="true" breakpoints="before" class="python_scripting:execute_python" compatibility="9.3.001" expanded="true" height="103" name="Execute Python" width="90" x="514" y="187">
            <parameter key="script" value="import pandas&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main(data):&#10;    &#10;&#10;    # connect 2 output ports to see the results&#10;    return data"/>
            <parameter key="notebook_cell_tag_filter" value=""/>
            <parameter key="use_default_python" value="true"/>
            <parameter key="package_manager" value="conda (anaconda)"/>
          </operator>
          <operator activated="false" breakpoints="before" class="replace" compatibility="9.4.001" expanded="true" height="82" name="Replace" width="90" x="179" y="289">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Text"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="\r|\n"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Execute Python" to_port="input 1"/>
          <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    









  • gmeiergmeier Employee, Member Posts: 25 RM Engineering

    you are right, the Execute Python operator has a problem with multi-line Strings. We are already working on fixing this.
    However, this is not connected to the Read Excel operator. The excel file has entries with more than one line.

    Best,
    G
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @gmeier,

    OK, thanks you for your reply.

    Regards,

    Lionel
Sign In or Register to comment.