Hi all,

I use rapidminer 5.1.011 to design a text classification process, it's working fine in GUI. Now i'm trying to automate this process in java program. I wrote a simple Java program and when i run it i got the following errors:

com.rapidminer.operator.UserError: Could not read file '/home/some_user/wordlist': java.io.IOException: Cannot read from XML stream, wrong format: WordList : WordList.
at com.rapidminer.operator.io.IOObjectReader.read(IOObjectReader.java:100)
at com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:123)
at com.rapidminer.operator.Operator.execute(Operator.java:833)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:369)
at com.rapidminer.operator.Operator.execute(Operator.java:833)
at com.rapidminer.Process.run(Process.java:920)
at com.rapidminer.Process.run(Process.java:843)
at com.rapidminer.Process.run(Process.java:802)
at com.rapidminer.Process.run(Process.java:797)
at com.rapidminer.Process.run(Process.java:787)
at ProcessCreator.main(ProcessCreator.java:18)
Caused by: java.io.IOException: Cannot read from XML stream, wrong format: WordList : WordList
at com.rapidminer.tools.XMLSerialization.fromXML(XMLSerialization.java:141)
Process config XML:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.011">
  <operator activated="true" class="process" compatibility="5.1.011" expanded="true" name="Process">
    <process expanded="true" height="695" width="989">
      <operator activated="true" class="read" compatibility="5.1.011" expanded="true" height="60" name="Read" width="90" x="45" y="210">
        <parameter key="object_file" value="/home/some_user/wordlist"/>
        <parameter key="io_object" value="WordList"/>
      <operator activated="true" class="text:process_document_from_file" compatibility="5.1.003" expanded="true" height="76" name="Process Documents from Files" width="90" x="179" y="210">
        <list key="text_directories">
          <parameter key="porn" value="/home/some_user/test_data/porn"/>
        <process expanded="true" height="695" width="989">
          <operator activated="true" class="web:extract_html_text_content" compatibility="5.1.004" expanded="true" height="60" name="Extract Content" width="90" x="45" y="30"/>
          <operator activated="true" class="text:transform_cases" compatibility="5.1.003" expanded="true" height="60" name="Transform Cases" width="90" x="180" y="30"/>
          <operator activated="true" class="text:tokenize" compatibility="5.1.003" expanded="true" height="60" name="Tokenize" width="90" x="315" y="30"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.003" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="450" y="30"/>
          <operator activated="true" class="text:stem_snowball" compatibility="5.1.003" expanded="true" height="60" name="Stem (Snowball)" width="90" x="585" y="30"/>
          <operator activated="true" class="text:filter_by_length" compatibility="5.1.003" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="787" y="30">
            <parameter key="min_chars" value="2"/>
            <parameter key="max_chars" value="99"/>
          <connect from_port="document" to_op="Extract Content" to_port="document"/>
          <connect from_op="Extract Content" from_port="document" to_op="Transform Cases" to_port="document"/>
          <connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Stem (Snowball)" to_port="document"/>
          <connect from_op="Stem (Snowball)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
          <connect from_op="Filter Tokens (by Length)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
      <operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="210">
        <parameter key="attribute_filter_type" value="no_missing_values"/>
        <parameter key="attributes" value="|label|text"/>
      <operator activated="true" class="read_model" compatibility="5.1.011" expanded="true" height="60" name="Read Model" width="90" x="380" y="30">
        <parameter key="model_file" value="/home/some_user/svm_model"/>
      <operator activated="true" class="set_role" compatibility="5.1.011" expanded="true" height="76" name="Set Role" width="90" x="447" y="210">
        <parameter key="name" value="label"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      <operator activated="true" class="apply_model" compatibility="5.1.011" expanded="true" height="76" name="Apply Model" width="90" x="581" y="120">
        <list key="application_parameters"/>
      <operator activated="true" class="performance_classification" compatibility="5.1.011" expanded="true" height="76" name="Performance" width="90" x="715" y="120">
        <list key="class_weights"/>
      <connect from_op="Read" from_port="output" to_op="Process Documents from Files" to_port="word list"/>
      <connect from_op="Process Documents from Files" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Read Model" from_port="output" to_op="Apply Model" to_port="model"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Performance" from_port="performance" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
Java code:

import com.rapidminer.tools.OperatorService;
import com.rapidminer.RapidMiner;
import com.rapidminer.Process;
import java.io.*;
import java.io.IOException;

public class ProcessCreator {

    public static void main(String[] argv) {

        try {

            Process process = new Process(new File(argv[0]));

            // perform process
        } catch (Exception e) { e.printStackTrace(); }
Appreciated ur help!


    It looks like i have to specify mode as "ExecutionMode.COMMAND_LINE" instead of "ExecutionMode.EMBEDDED_WITHOUT_UI", though i still dont know the difference between these two  ???

    From API doc:
    public static final RapidMiner.ExecutionMode COMMAND_LINE
        RM is executed using RapidMinerCommandLine.main(String[]).

    public static final RapidMiner.ExecutionMode EMBEDDED_WITHOUT_UI
        RM is embedded into another program.

    have a look at the constructor of the ExecutionMode enum:

    private ExecutionMode(boolean isHeadless, boolean canAccessFilesystem, boolean hasMainFrame, boolean loadManagedExtensions) {
    COMMAND_LINE sets loadManagedExtensions to true, however EMBEDDED_WITHOUT_UI sets it to false.
    WordList is an IOObject from the Text Processing Extension, therefore managed plugins need to be loaded, otherwise a process using anything from these plugins will fail (as yours did).

    If I have a model build up using rapidminer and need to check the exact java code for this model, like where is the file that does stemporter in code for example, can I do that?
    Thanks haddock, I did go to the com.rapidminer.operator.text package and it was empty, may be when I did the run configuration steps to make RM runs through eclipse did miss something, actually I have other packages showing as empty, can anybody tell me how to get the contents of these packages? Thanks
    How did you input the process through command line. Please let me know.
    Also, what do i input as command line argument to the program?
    for RapidMiner Studio 6, call the rapidminer-batch.bat/rapidminer-batch.sh file and pass the repository location, i.e. "//Local Repository/My folder/my_process". Alternatively you can add "-f" as a parameter followed by a whitespace and then the path to the .rmp file on the harddrive, i.e. -f C:\Users\xyz\.RapidMiner\repositories\Local Repository\Process.rmp

    Thank you :)
