Python Scripting Extension - Installation and Getting Started

pschlunderpschlunder Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 96 RM Research
edited January 2019 in Knowledge Base

 For those wishing to use the versatile Python Scripting extension, the following Installation and Getting Started notes should help...

Installation

The "Execute Python" operator accesses your installed version of Python version by specifying the path under "Settings" -> "Preferences..." -> "Python Scripting" -> "Path to Python executable":

 

Screenshot 15pngDefine Python Version to use

For example, the screenshot shows a typical path specification using Python from Anaconda installed in Windows. Now you can access libraries installed for this Python version in the operator by importing them like you're used to do in Python.

Usage

The Python code in the operator uses 4 spaces as one indentation level. So if you receive indentation errors make sure the indentation equals 4 times the indentation level desired. For example, when I copy the code of your "rm_main" it contains a mixture of tabs and spaces, as well as indentations consisting of only 2 spaces. Some editors (like sublime for example) offer the option to display whether tabs or spaces are used.

 

After dealing with the indentation error make sure to form a proper Pandas DataFrame object. I looked into the "nmrglue" library and the "fileio.bruker.read_pdata" method seems to already return a dictionary of the given data. Fortunately Pandas DataFrames take that as an input. So you might directly create a DataFrame out of the returned object. This even has the advantage, that columns are properly named from the beginning.

 

Now having your Pandas DataFrame instance you can deliver that in the return statement of your "rm_main" function. Afterwards the "Execute Pyhton" Operator converts the DataFrame to an Example Set (which is used for RapidMiner to manage matrix like data). You can access this Example Set at the Operators Outputport. The first DataFrame returned is delivered at the top most Output Port and so on.

 

Here is some example code, where you only need to adjust the path to the file you want to read:

import nmrglue as ng
import pandas as pd

def rm_main():
path = "C:\\my_great_data_file.ending"
// read data using nmrglue from file located at path
dic, _ = ng.fileio.bruker.read_pdata(path)

// create pandas data frame from the given dictionary
df = pd.DataFrame(dic)

// check if data frame creation worked
if not isinstance(df, pd.DataFrame):
print("Conversion to data frame failed.")

// deliver the data frame to the operators output port
return df

Notes

  • You always need a function called "rm_main" in the "Execute Python" Operator. If you connect Example Sets to its input port(s) you need to specify the same amount of parameters for the function. For your case you would not need to provide something at the input port, hence you would not need any parameter for "rm_main()".
  • All things printed using pythons "print" function are displayed in RapidMiners Log. You can enable it over the menu option "View" -> "Show Panel" -> "Log".
  • RapidMiner offers Operators to change the type of attributes after having loaded them, if you still need to define these times within the "Execute Python" Operator convert the read data to numpy arrays first. For them you can specify the attribute types by the so called "dtype" parameter. Find some example here.
  • If you are using Windows, make sure to escape backslashes, when providing the path. This means, that you need to provide 2 backslashes, I added it to my code example above.

 

For further reading, check out Thomas Ott's excellent blog article on using R and Python scripts in RapidMiner.

 

Thanks and enjoy coding!

 

Philipp Schlunder

RapidMiner Research, Dortmund

May 2017

Sign In or Register to comment.