"automate rapidminer process in java program"

Hi all,

I use rapidminer 5.1.011 to design a text classification process, it's working fine in GUI. Now i'm trying to automate this process in java program. I wrote a simple Java program and when i run it i got the following errors:

com.rapidminer.operator.UserError: Could not read file '/home/some_user/wordlist': java.io.IOException: Cannot read from XML stream, wrong format: WordList : WordList.
at com.rapidminer.operator.io.IOObjectReader.read(IOObjectReader.java:100)
at com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:123)
at com.rapidminer.operator.Operator.execute(Operator.java:833)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:369)
at com.rapidminer.operator.Operator.execute(Operator.java:833)
at com.rapidminer.Process.run(Process.java:920)
at com.rapidminer.Process.run(Process.java:843)
at com.rapidminer.Process.run(Process.java:802)
at com.rapidminer.Process.run(Process.java:797)
at com.rapidminer.Process.run(Process.java:787)
at ProcessCreator.main(ProcessCreator.java:18)
Caused by: java.io.IOException: Cannot read from XML stream, wrong format: WordList : WordList
at com.rapidminer.tools.XMLSerialization.fromXML(XMLSerialization.java:141)
Process config XML:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.011">
  <operator activated="true" class="process" compatibility="5.1.011" expanded="true" name="Process">
    <process expanded="true" height="695" width="989">
      <operator activated="true" class="read" compatibility="5.1.011" expanded="true" height="60" name="Read" width="90" x="45" y="210">
        <parameter key="object_file" value="/home/some_user/wordlist"/>
        <parameter key="io_object" value="WordList"/>
      <operator activated="true" class="text:process_document_from_file" compatibility="5.1.003" expanded="true" height="76" name="Process Documents from Files" width="90" x="179" y="210">
        <list key="text_directories">
          <parameter key="porn" value="/home/some_user/test_data/porn"/>
        <process expanded="true" height="695" width="989">
          <operator activated="true" class="web:extract_html_text_content" compatibility="5.1.004" expanded="true" height="60" name="Extract Content" width="90" x="45" y="30"/>
          <operator activated="true" class="text:transform_cases" compatibility="5.1.003" expanded="true" height="60" name="Transform Cases" width="90" x="180" y="30"/>
          <operator activated="true" class="text:tokenize" compatibility="5.1.003" expanded="true" height="60" name="Tokenize" width="90" x="315" y="30"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.003" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="450" y="30"/>
          <operator activated="true" class="text:stem_snowball" compatibility="5.1.003" expanded="true" height="60" name="Stem (Snowball)" width="90" x="585" y="30"/>
          <operator activated="true" class="text:filter_by_length" compatibility="5.1.003" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="787" y="30">
            <parameter key="min_chars" value="2"/>
            <parameter key="max_chars" value="99"/>
          <connect from_port="document" to_op="Extract Content" to_port="document"/>
          <connect from_op="Extract Content" from_port="document" to_op="Transform Cases" to_port="document"/>
          <connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Stem (Snowball)" to_port="document"/>
          <connect from_op="Stem (Snowball)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
          <connect from_op="Filter Tokens (by Length)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
      <operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="210">
        <parameter key="attribute_filter_type" value="no_missing_values"/>
        <parameter key="attributes" value="|label|text"/>
      <operator activated="true" class="read_model" compatibility="5.1.011" expanded="true" height="60" name="Read Model" width="90" x="380" y="30">
        <parameter key="model_file" value="/home/some_user/svm_model"/>
      <operator activated="true" class="set_role" compatibility="5.1.011" expanded="true" height="76" name="Set Role" width="90" x="447" y="210">
        <parameter key="name" value="label"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      <operator activated="true" class="apply_model" compatibility="5.1.011" expanded="true" height="76" name="Apply Model" width="90" x="581" y="120">
        <list key="application_parameters"/>
      <operator activated="true" class="performance_classification" compatibility="5.1.011" expanded="true" height="76" name="Performance" width="90" x="715" y="120">
        <list key="class_weights"/>
      <connect from_op="Read" from_port="output" to_op="Process Documents from Files" to_port="word list"/>
      <connect from_op="Process Documents from Files" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Read Model" from_port="output" to_op="Apply Model" to_port="model"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Performance" from_port="performance" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
Java code:

import com.rapidminer.tools.OperatorService;
import com.rapidminer.RapidMiner;
import com.rapidminer.Process;
import java.io.*;
import java.io.IOException;

public class ProcessCreator {

    public static void main(String[] argv) {

        try {

            Process process = new Process(new File(argv[0]));

            // perform process
        } catch (Exception e) { e.printStackTrace(); }
Appreciated ur help!


    It looks like i have to specify mode as "ExecutionMode.COMMAND_LINE" instead of "ExecutionMode.EMBEDDED_WITHOUT_UI", though i still dont know the difference between these two  ???

    From API doc:
    public static final RapidMiner.ExecutionMode COMMAND_LINE
        RM is executed using RapidMinerCommandLine.main(String[]).

    public static final RapidMiner.ExecutionMode EMBEDDED_WITHOUT_UI
        RM is embedded into another program.

    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering

    have a look at the constructor of the ExecutionMode enum:

    private ExecutionMode(boolean isHeadless, boolean canAccessFilesystem, boolean hasMainFrame, boolean loadManagedExtensions) {
    COMMAND_LINE sets loadManagedExtensions to true, however EMBEDDED_WITHOUT_UI sets it to false.
    WordList is an IOObject from the Text Processing Extension, therefore managed plugins need to be loaded, otherwise a process using anything from these plugins will fail (as yours did).

    nawafpowernawafpower Member Posts: 34 Contributor II
    If I have a model build up using rapidminer and need to check the exact java code for this model, like where is the file that does stemporter in code for example, can I do that?
    haddockhaddock Member Posts: 849 Maven
    nawafpowernawafpower Member Posts: 34 Contributor II
    Thanks haddock, I did go to the com.rapidminer.operator.text package and it was empty, may be when I did the run configuration steps to make RM runs through eclipse did miss something, actually I have other packages showing as empty, can anybody tell me how to get the contents of these packages? Thanks
    RapidQuesRapidQues Member Posts: 8 Contributor II
    How did you input the process through command line. Please let me know.
    RapidQuesRapidQues Member Posts: 8 Contributor II
    Also, what do i input as command line argument to the program?
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering

    for RapidMiner Studio 6, call the rapidminer-batch.bat/rapidminer-batch.sh file and pass the repository location, i.e. "//Local Repository/My folder/my_process". Alternatively you can add "-f" as a parameter followed by a whitespace and then the path to the .rmp file on the harddrive, i.e. -f C:\Users\xyz\.RapidMiner\repositories\Local Repository\Process.rmp

    RapidQuesRapidQues Member Posts: 8 Contributor II
    Thank you :)
