python modules not working on linux server when they need to connect to internet?

kaymankayman Member Posts: 662 Unicorn
edited December 2018 in Help

I've encountered a very strange and very annoying problem when trying to run some python packages. All of them work on local desktop, or when running the server process in local mode. But whenever I want to run the same process entirly on the server (Ubuntu 16.04) it fails and gives me 'the script can not be parsed'.

 

On a windows server setup they work fine, so my first guess was security settings, but running the same process on another ubuntu test server where I really give everything all options it still gave problems, so I can probably count that out. 

 

Some packages work fine on the server, basically any standard python command works fine but it seems as soon as there is some internet connection required the script fails. I have 2 totally different ones giving the same problems, one that I use to call the microsoft translation API's and another one I use to validate a language. As mentioned they work fine on the desktop framework, and under windows server, and when using them on the linux servers outside of Rapidminer. So I'm really stuck and it's a key aspect of our to be process.

 

If added a simplified workflow, with one sentence. First part it uses a beautiful soup pythin script, that works fine. Second part uses langid.py to get the language. This fails, only when executed on the server (ubuntu)

 

I would stringly appreciate if someone could take a look at this, as this is of extreme importance for us. We are going to make a big investment in RM and translation to allow text mining is a huge part of the process flow. It worked all fine on a smaller windows test server, but the final production server will be Linux.

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="7.5.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="179" y="187">
<list key="attribute_values">
<parameter key="data" value="&quot;Dit is een zin in het Nederlands&quot;"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="nominal_to_text" compatibility="7.5.001" expanded="true" height="82" name="Nominal to Text" width="90" x="313" y="187"/>
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="simple py" width="90" x="447" y="187">
<parameter key="script" value="import pandas as pd&#10;from bs4 import BeautifulSoup&#10;&#10;def rm_main(data):&#10;&#9;langs=[]&#10;&#10;&#9;for index,row in data.iterrows():&#10;&#9;&#9;# we select the first interaction field to be translated, and strip eventual tags&#10;&#9;&#9;s=BeautifulSoup(row[&quot;data&quot;],&quot;lxml&quot;).get_text(&quot; \[-\] &quot;)&#10;&#9;&#9;langs.append(s)&#10;&#9;# and finally we add all the new data to the dataframe&#10;&#9;data['data']=langs&#10;&#10;&#9;return data&#10;"/>
<description align="center" color="transparent" colored="false" width="126">This works so python is installed correctly on server</description>
</operator>
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="get language" width="90" x="581" y="187">
<parameter key="script" value="import pandas as pd&#10;import langid&#10;&#10;def rm_main(data):&#10;&#9;langs=[]&#10;&#10;&#9;for index,row in data.iterrows():&#10;&#9;&#9;# we select the first interaction field to be translated, and strip eventual tags&#10;&#9;&#9;s=row[&quot;data&quot;]&#10;&#9;&#9;try:&#10;&#9;&#9;&#9;rl = langid.classify(s)[0]&#10;&#9;&#9;except:&#10;&#9;&#9;&#9;pass&#10;&#9;&#9;&#9;rl = &quot;undefined&quot;&#10;&#10;&#9;&#9;langs.append(rl)&#10;&#9;# and finally we add all the new data to the dataframe&#10;&#9;data['lang']=langs&#10;&#10;&#9;return data&#10;"/>
<description align="center" color="transparent" colored="false" width="126">This one fails. Using the same script in other programs, or from cmd line works fine, so the package is installed correctly. Also works fine on local machine</description>
</operator>
<operator activated="true" class="store" compatibility="7.5.001" expanded="true" height="68" name="Store" width="90" x="715" y="187">
<parameter key="repository_entry" value="result"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
<connect from_op="Nominal to Text" from_port="example set output" to_op="simple py" to_port="input 1"/>
<connect from_op="simple py" from_port="output 1" to_op="get language" to_port="input 1"/>
<connect from_op="get language" from_port="output 1" to_op="Store" to_port="input"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>

Best Answer

  • homburghomburg Moderator, Employee, Member Posts: 114 RM Data Scientist
    Solution Accepted

    Hmm, looks like the information in the referrenced community post is wrong. Please always use the key value "rapidminer.python_scripting.path" - it is the same for all possible operations systems. 

Answers

  • pschlunderpschlunder Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 96 RM Research

    Hi Kayman,

    a few first ideas, that might help identifying the problem first:

    - Have you checked, that the user used for running the rapidminer server process on your ubuntu server has access to the same python environment, as the user you've used to test the python script via command line? Please run the process attached to this post and provide the log output of it. It recalls the python include directory from within the python operator in the rapidminer process and provides it as an output to the log. The printed directory should be the same (except, that it points to an include subfolder) as the one you get when entering which python in the command line on the server.

    - Can you provide the server log of your failing process? You can find it over at <server_location>/standalone/log/server.log

    - How did you install the langid package on your ubuntu server? Did you use pip install langid or pip install --user langid?

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="179" y="85">
    <parameter key="script" value="import pandas&#10;from distutils.sysconfig import get_python_inc&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main():&#10; print(get_python_inc())"/>
    </operator>
    <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

    Best regards,

    Philipp

  • pschlunderpschlunder Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 96 RM Research

    If you need to know, where to set the python path variable, so that "Execute Python" uses the python version you want it to use, have a look at this thread: Using python scripting extension with RM Server

  • kaymankayman Member Posts: 662 Unicorn

    Thanks guys,

    The server is indeed using the wrong flavour. The log is showing that I'm using /usr/include/python2.7 even if my settings are pointing to 3.5

     

    I've tried with both /usr/include/python3.5 and /usr/bin/python3.5 but the server keeps pointing to 2.7 instead.

    I did restart the server each time so it should have been upgraded. I could actually just remove the property completely and it still works (pointing to 2.7)

     

    I've entered as follows under system settings :

    rapidminer.python.path - /usr/include/python3.5

     

    Any clue? I used a headless setup if that might have some impact.

     

     (PS: I used sudo pip3 install langid)

  • pschlunderpschlunder Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 96 RM Research

    Good to hear, that we now at least know, what's the problem is =)

     

    Okay, using pip install without the user option at leasts installs the plugin for every user if you invoked it as an administrator. But here it's not available since I guess you installed it for python 3 (for which you need it?) but as you pointed out, your server is only accessing python 2. Hence the package is not available. On option could be to install the package system wide for the given python 2.7 as well.

     

    A drastic way would be to change the systems main python. For that you could add the python 3.5 bin folder to the system PATH variable in e.g. your .profile setting. But keep in mind, that especially on ubuntu this might cause problems when updating stuff because some apps won't properly ask for the system python but just use the python command not caring where it links to.

     

    For a temporary change you could create a start-up routine/script, that first changes the python in the "PATH" variable only for the session where you execute the server in. E.g. by invoking "export PATH = /usr/include/python3.5/:$PATH" before stating RapidMiner Server.

     

    Hope this helps.

     

    Best regards,

    Philipp

  • kaymankayman Member Posts: 662 Unicorn

    Thanks Philipp,

     

    I guess that for some of the packages we use it could indeed be an option to use the 2.7 variation, but some of our in house made stuff is 3.x only. I've done some investigation myself also in the meantime, and changing the systems main python doesn't seem like a good option indeed. 

     

    I've tried with adding the invoke, looking at the log it is loaded correctly, but when calling the operator it just uses 2.7 again. Seems therefore RMS looses anyway, and whatever is added in the preferences is overwritten, or just ignored as I can jsut leave it empty also.

     

     

     

     

     

  • pschlunderpschlunder Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 96 RM Research

    Did you restart the RapidMiner Server after changing the setting? If not, please try that. A restart is required in order for the changed settings to be taken into account.

     

    Best regards,

    Philipp

  • kaymankayman Member Posts: 662 Unicorn

    Yeah, made no difference. The actual key just seems to be completely ignored. I can literally add whatever I want or even remove it and it keeps working, but only using 2.7

     

    I'm going to try with installing anaconda or so, and point to that entry. Maybe that will make some difference.

     

    In the meantime still open for any good suggestion.

  • kaymankayman Member Posts: 662 Unicorn

    Yup, This did the trick. Thanks!

Sign In or Register to comment.