Bug in Execute R operator

RapidminerpartnerRapidminerpartner Member Posts: 35 Contributor II
edited July 2019 in Help

Hello, how are you, everyone.

I am using "Execute R" operator.

However, if the column name of the input table has Korean alphabet

(that is, if column name is Korean)

it crashes. (Error message shows, talking about java exeception...)


So please fix this problem for Korean users.

Thank you in advance and see you again.


KMC


Tagged:

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    @Rapidminerpartner can you please provide the XML and a sample data set of this process so we can reproduce the error?
  • RapidminerpartnerRapidminerpartner Member Posts: 35 Contributor II
    edited October 2019
  • RapidminerpartnerRapidminerpartner Member Posts: 35 Contributor II

    Below is rmp file...

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve 101_DT_1B04005N_Y_2016---" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Local Repository/processes/101_DT_1B04005N_Y_2016---"/>
          </operator>
          <operator activated="true" class="r_scripting:execute_r" compatibility="9.1.000" expanded="true" height="103" name="Execute R" width="90" x="246" y="34">
            <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function(data)&#10;{&#10;    print('Hello, world!')&#10;    # output can be found in Log View&#10;    print(str(data))&#10;    &#10;    # your code goes here&#10;&#10;    # for example:&#10;    data2 &lt;- as.data.table(matrix(1:16,4,4))&#10;&#10;    # connect 2 output ports to see the results&#10;    return(list(data,data2))&#10;}&#10;"/>
          </operator>
          <connect from_op="Retrieve 101_DT_1B04005N_Y_2016---" from_port="output" to_op="Execute R" to_port="input 1"/>
          <connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @Rapidminerpartner can you please also post the exampleset 101_DT_1B04005N_Y_2016--- ?
  • RapidminerpartnerRapidminerpartner Member Posts: 35 Contributor II

    Hell, I didn't know I could attach files.

    Here you are... and Thank  you.


  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited July 2019
    Hello @Rapidminerpartner

    I tried your dataset using your process and it didn't give any error for me, but it is changing the attribute names inside R-script. I tried adding breakpoint before R-Script (Execute-R operator), and it showed me the exact attribute names as present in the CSV file attached in your post. But once it is processed by the script in R it blanked some symbols with boxes as shown in the center figure below. I also see that you didn't write any script in R and just using the default script in Execute R operator. I uploaded the CSV data using read.csv in R-studio separately and observed R is changing your attribute names. This is shown in the last image in the below screenshot.



    I used the data imported from CSV file to train a decision tree in rapidminer instead of R-script and see if there is any change in attribute names by Rapidminer, I see there is no change in attribute names.


    So, my understanding is that the R program is changing your attribute names as it is unable to understand some special characters. I am not so sure what kind of error you are getting if you have any images of error you can attach the same.

    @sgenzer might have something for this.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • RapidminerpartnerRapidminerpartner Member Posts: 35 Contributor II
    edited July 2019

    Hello, varunm1

    Thank you for your help

    I will read your detailed message this evening when I return from

    my office

    Also I will attach the error message window

    Have a nice day, varunm1!

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    @varunm1 @Rapidminerpartner I have pinged our resident R expert, Dr. @yyhuang and I hope she has a moment to chime in here.

    Scott
  • RapidminerpartnerRapidminerpartner Member Posts: 35 Contributor II
    edited July 2019

    Hello, varunm1 and everybody

    I upload the repository data file (file extension ioo)

    Please try to test my source with this attached data file

    I believe all of  you will see the error message

    Thank you

  • RapidminerpartnerRapidminerpartner Member Posts: 35 Contributor II

    I also upload the capture images showing error messages

    Thanks.

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @Rapidminerpartner

    I am able to reproduce the error with the repository file you provided. I am kind of confused seeing your data in the repository file, it all consists of some boxes. The earlier .csv file that you provided and I uploaded is fine and it doesn't even throw any errors. I am not sure why this exception is coming maybe Dr. YY can help you with this. Thanks.



    Error:

    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • RapidminerpartnerRapidminerpartner Member Posts: 35 Contributor II

    Thank yyhuang and varunm1.

    I will read your comment when I return from office this evening again

    Have a nice day!

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Thanks YY, got it. 
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • RapidminerpartnerRapidminerpartner Member Posts: 35 Contributor II

    Hello, YY

    Thank you for your help

    I will return after checking with my source file

    I thought Rapidminer doesn't support Korean column names.

    I will check it as you said

    Have a nice day!

  • RapidminerpartnerRapidminerpartner Member Posts: 35 Contributor II

    Hello, YY and varunm1

    I have  to report that still there's problem

    YY said that it will be OK if there's no special characters in the column(attribute) name

    but I just checked it cause crash even in such case.

    I attached "Select Attribute" to the process

    so that "Select Attribute" selects just one attribute, the fifth attribute ("시점") which doesn't contain special characters

    but in that case, it still crashes.

    I attached the capture images. so please solve the problem for me.

    Thank you and see you

  • RapidminerpartnerRapidminerpartner Member Posts: 35 Contributor II

    Hello, YY and varunm1

    Here is the xml, rmp files

    Please check those for me,

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    @Rapidminerpartner

    I am unable to reproduce this error, its working fine for me. @yyhuang I have a question. Why am I seeing boxes instead of korean characters? Am I missing some setting?


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
    Hi @varunm1,

    Good question. I guess you and @Rapidminerpartner are using windows OS. 
    @Michael also helped test the same data and the encodings under MacOS is smoother.
    https://answers.microsoft.com/en-us/windows/forum/all/korean-characters-shown-as-blocks/471ca66a-c09c-4d18-85ed-7aed8afde075
    If you have never installed language pack besides English, you may have issues for display of korean characters on windwos. 

    So I did the following on my win10


    I installed language pack for Korean. I have Chinese pack installed for testing Chinese text mining long time ago



    The system setting for WinOS is tricky. Hope it helps.

    YY 

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Thanks @yyhuang, it worked well and displays characters. Once it is processed by R-operator its again throwing boxes. Is this because of the conversion between R and RapidMiner on windows? I used the exact XML provided in your earlier post. I just selected one attribute which is attribute 5. I also tried with set locale


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
    edited July 2019
    Yes, @varunm1 This is indeed a bug in the scripting integration under RapidMiner hood. We are investigating this and will keep you posted!
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Great, thanks. Sorry for bugging you multiple times. Have a great day.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
    No problems at all.
    Thank you @varunm1 for all your help testing and troubleshooting!!
Sign In or Register to comment.