Options

how to implement python code for the text mining process ?

ksnugrohoksnugroho Member Posts: 1 Newbie
edited December 2018 in Help
Hello, 

Answers

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @ksnugroho - you can use the Execute Python operator (in the Python extension) anywhere you want.

    Scott
  • Options
    kaymankayman Member Posts: 662 Unicorn
    Some background on using the python operator : 

    - You can use it as a standalone 'script container' wherever you want, so there isn't even a need to use input or output data.
    - If you want to use data (either incoming or outgoing) remember that the operator is treating your data by default as a panda's dataframe. So simply entering data to the inputs allows you to work with the data as a dataframe, and in case you want to manipulate data in other def's, or  load external data you just need to return it in the rm_man block as dataframe again.

    Find below a simple example, where I use 2 inputs and xlsxwriter, and the python script will generate a multi tabbed excel file, adding the inputs each on one tab, and that's it.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="python_scripting:execute_python" compatibility="8.2.000" expanded="true" height="124" name="Execute Python (2)" width="90" x="246" y="85">
            <parameter key="script" value="import pandas as pd&#10;import xlsxwriter&#10;&#10;def rm_main(data1, data2):&#10;&#10;    writer = pd.ExcelWriter('my_file.xlsx', engine='xlsxwriter')&#10;&#10;    # Write your DataFrame to a file   &#10;    data1.to_excel(writer, 'Page 1')  &#10;    data2.to_excel(writer, 'Page 2')&#10;&#10;    # Save the result &#10;    writer.save()&#10;&#10;    return"/>
          </operator>
          <connect from_port="input 1" to_op="Execute Python (2)" to_port="input 1"/>
          <connect from_port="input 2" to_op="Execute Python (2)" to_port="input 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="source_input 3" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
    


Sign In or Register to comment.