"[SOLVED] Setting the parameters of TextMining Extension operator via java!"

ReemReem Member Posts: 20 Contributor I
edited June 9 in Help
I was trying to set a Parameter of an operator via java, so I can give the my trianed model a text file and it would classify it.
The operator is "Process Documents from Files", the parameter is "test directories" which contains "class name" and "directory"

I tried to follow the post http://rapid-i.com/rapidforum/index.php/topic,5807.0.html, but I didn't find the class name in the OperatorCore.xml
So, where can I find the classes related to the parameters of text mining extension .

My second question is, the parameter has 2 sub parameters  "class name" and "directory". So, how to set them? using dot or under score or what?

Note: the process is already running correctly in Rapid-miner, and now I deleted the parameter values and I am tried to set them via java.

Any help or tips are appreciated,

Answers

  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    the operator classname is from the text extension, i.e. you will find the name in the OperatorsTextProcessing.xml file of it. The class is com.rapidminer.operator.text.io.FileDocumentInputOperator.

    You can use the setListParameter(String, List) method in your case. just pass it a list of string arrays (each array of size = number of columns in the parameter).

    Regards,
    Marco
  • ReemReem Member Posts: 20 Contributor I
    Thanks for replying!

    I've the following run java code:
    package RapidMiner;

    import com.rapidminer.Process;
    import com.rapidminer.RapidMiner;
    import com.rapidminer.example.Attribute;
    import com.rapidminer.example.Example;
    import com.rapidminer.example.ExampleSet;
    import com.rapidminer.operator.IOContainer;
    import com.rapidminer.operator.Operator;
    import com.rapidminer.operator.OperatorException;
    import com.rapidminer.repository.ProcessEntry;
    import com.rapidminer.repository.RepositoryException;
    import com.rapidminer.repository.RepositoryLocation;
    import com.rapidminer.tools.XMLException;
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.Iterator;
    import java.util.List;

    public class RapidMinerClassifier {

        static Example exampleSet = null;
        String category;

        public RapidMinerClassifier() {
            RapidMiner.setExecutionMode(RapidMiner.ExecutionMode.COMMAND_LINE);
            RapidMiner.init();
        }//end of constructor

        public String classifyDocumentByTopic(String rapidMinerProcess, String DocumentsToBeClassfied) throws RepositoryException {
            ExampleSet resultSet = null;

            try {
                // loads the process from the repository
                RepositoryLocation pLoc = new RepositoryLocation(rapidMinerProcess);
                ProcessEntry pEntry = (ProcessEntry) pLoc.locateEntry();
                String processXML = pEntry.retrieveXML();
                Process myProcess = new Process(processXML);

                Operator ProcessDocumentsOperator = myProcess.getOperator("Process Documents from Files");
                ProcessDocumentsOperator.setParameter("FileDocumentInputOperator.TEXT_DIRECTORIES.DIRECTORY", DocumentsToBeClassfied);
                ProcessDocumentsOperator.setParameter("FileDocumentInputOperator.TEXT_DIRECTORIES.CLASS_NAME", "Unknown");
                List<String[]> parametersList = new ArrayList<String[]> ();
                String[] directoryParameterValues = {"DIRECTORY", DocumentsToBeClassfied};
                String[] classNameParameterValues = {"CLASS_NAME", "Unknown"};
                parametersList.add(directoryParameterValues);
                parametersList.add(classNameParameterValues);
                ProcessDocumentsOperator.setListParameter("FileDocumentInputOperator.TEXT_DIRECTORIES", parametersList) ;
               
                IOContainer ioResult = myProcess.run();

                if (ioResult.getElementAt(0) instanceof ExampleSet) {
                    resultSet = (ExampleSet) ioResult.getElementAt(0);
                }

               
                for (Example example : resultSet) {
                    Iterator<Attribute> allAtts = exampleSet.getAttributes().allAttributes();
                    while (allAtts.hasNext()) {
                        Attribute attribute = allAtts.next();
                        category = example.getValueAsString(attribute);
                        System.out.println(category);
                    }
                }
            } catch (IOException | XMLException | OperatorException ex) {
                ex.printStackTrace();
            }
           
            return category;
        }//end of classifyDocumentByTopic

        public static void main(String[] args) throws RepositoryException {
            RapidMinerClassifier rapidMinerClassifier = new RapidMinerClassifier();
           
            rapidMinerClassifier.classifyDocumentByTopic("//Local Repository/SVMtesting.rmp", "D:\\Dropbox\\SeniorProject\\_Spring2014\\_3 Gurus\\Reem_Classification of documents based on Topic\\Corpus\\Processed\\singleFileForRMTesting");
        }//end of main
    }
    but got the following error:

    INFO: JDBC driver ca.ingres.jdbc.IngresDriver not found. Probably the driver is not installed.
    [Fatal Error] :1:1: Premature end of file.
    Exception in thread "main" java.lang.NullPointerException
    at RapidMiner.RapidMinerClassifier.classifyDocumentByTopic(RapidMinerClassifier.java:37)
    at RapidMiner.RapidMinerClassifier.main(RapidMinerClassifier.java:75)
    Java Result: 1

    here is the XML content: (I've created the model using the disabled operators in r=the following process)

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.015">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true">
          <operator activated="false" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files (2)" width="90" x="42" y="75">
            <list key="text_directories">
              <parameter key="Geography" value="D:\Dropbox\Senior Project\_Spring2014\_3 Gurus\Reem_Classification of documents based on Topic\Corpus\Processed\Geography"/>
              <parameter key="Religion" value="D:\Dropbox\Senior Project\_Spring2014\_3 Gurus\Reem_Classification of documents based on Topic\Corpus\Processed\Religion"/>
              <parameter key="Science" value="D:\Dropbox\Senior Project\_Spring2014\_3 Gurus\Reem_Classification of documents based on Topic\Corpus\Processed\Science"/>
            </list>
            <parameter key="encoding" value="UTF-8"/>
            <parameter key="prune_below_rank" value="5.0"/>
            <parameter key="prune_above_rank" value="5.0"/>
            <process expanded="true">
              <operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" name="Tokenize (3)"/>
              <connect from_port="document" to_op="Tokenize (3)" to_port="document"/>
              <connect from_op="Tokenize (3)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="false" class="store" compatibility="5.3.015" expanded="true" height="60" name="Store Wordlist" width="90" x="179" y="30">
            <parameter key="repository_entry" value="svmWordlist"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Model" width="90" x="179" y="165">
            <parameter key="repository_entry" value="svmModel"/>
          </operator>
          <operator activated="false" class="x_validation" compatibility="5.3.015" expanded="true" height="112" name="Validation" width="90" x="447" y="75">
            <process expanded="true">
              <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.015" expanded="true" name="SVM">
                <parameter key="gamma" value="0.9"/>
                <parameter key="C" value="8.0"/>
                <parameter key="epsilon" value="0.0010"/>
                <list key="class_weights"/>
              </operator>
              <connect from_port="training" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" name="Apply Model">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.3.015" expanded="true" name="Performance"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="false" class="store" compatibility="5.3.015" expanded="true" height="60" name="Store Model" width="90" x="582" y="75">
            <parameter key="repository_entry" value="svmModel"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Wordlist" width="90" x="45" y="300">
            <parameter key="repository_entry" value="svmWordlist"/>
          </operator>
          <operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="246" y="300">
            <list key="text_directories"/>
            <process expanded="true">
              <connect from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model (2)" width="90" x="447" y="255">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Process Documents from Files (2)" from_port="example set" to_op="Validation" to_port="training"/>
          <connect from_op="Process Documents from Files (2)" from_port="word list" to_op="Store Wordlist" to_port="input"/>
          <connect from_op="Retrieve Model" from_port="output" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Validation" from_port="model" to_op="Store Model" to_port="input"/>
          <connect from_op="Retrieve Wordlist" from_port="output" to_op="Process Documents from Files" to_port="word list"/>
          <connect from_op="Process Documents from Files" from_port="example set" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>

    svmModel and svmWordlist are located in the local repository
    I always get the same error when I want to use the local repository as recommended http://rapid-i.com/rapidforum/index.php/topic,5807.0.html

    Another problem is when I run the process from java without changing the parameter and when I give the whole path like: D:\\Local Repository\\SVMTesting.rmp

    the returned output is the training set, not the predicted class of the new unseen documents, So what to change in the code to get the predicted labels?

    and If I change the code to set the operators as useing the path of the repository as D:\\Local Repository\\SVMTesting.rmp, the error state that, it can't reach svmModel and svmWordlist  files!

    Any hints?
    Your help is appreciated,
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    1) change your rapidMinerProcess parameter to "//Local Repository/SVMtesting", file endings do not exist for the repository. That's why you are getting the NPE.
    2) Your posted process returns nothing - no operator is connected to the "res" ports on the right side of the process, so I can't tell you why you are receiving the training data instead of the classified results. Note that the clasified data also includes the input data, just with additional attribute columns.
    3) I don't understand you last sentence. Please include the actual error message.

    Regards,
    Marco
  • ReemReem Member Posts: 20 Contributor I
    I've run the code based on the information given for the repository
    package RapidMiner;

    import com.rapidminer.Process;
    import com.rapidminer.RapidMiner;
    import com.rapidminer.example.Attribute;
    import com.rapidminer.example.Example;
    import com.rapidminer.example.ExampleSet;
    import com.rapidminer.operator.IOContainer;
    import com.rapidminer.operator.OperatorException;
    import com.rapidminer.repository.ProcessEntry;
    import com.rapidminer.repository.RepositoryException;
    import com.rapidminer.repository.RepositoryLocation;
    import com.rapidminer.tools.XMLException;
    import java.io.IOException;

    public class RapidMinerClassifier1 {

        static Example exampleSet = null;
        String category;

        public RapidMinerClassifier1() {
            RapidMiner.setExecutionMode(RapidMiner.ExecutionMode.COMMAND_LINE);
            RapidMiner.init();
        }//end of constructor

        public String classifyDocumentByTopic(String rapidMinerProcess, String DocumentsToBeClassfied) throws RepositoryException {
            ExampleSet resultSet = null;

            try {
                // loads the process from the repository
                RepositoryLocation pLoc = new RepositoryLocation(rapidMinerProcess);
                ProcessEntry pEntry = (ProcessEntry) pLoc.locateEntry();
                String processXML = pEntry.retrieveXML();
                Process myProcess = new Process(processXML);

                IOContainer ioResult = myProcess.run();
                if (ioResult.getElementAt(0) instanceof ExampleSet) {
                    resultSet = (ExampleSet) ioResult.getElementAt(0);
                }
                           
                Attribute att = resultSet.getAttributes().get("prediction");
                for (Example example : resultSet) {
                        example.getValue(att);
                }

            } catch (IOException | XMLException | OperatorException ex) {
                ex.printStackTrace();
            }
           
            return category;
        }//end of classifyDocumentByTopic

        public static void main(String[] args) throws RepositoryException {
            RapidMinerClassifier1 rapidMinerClassifier = new RapidMinerClassifier1();
            rapidMinerClassifier.classifyDocumentByTopic("//RapidMinerRepository/Applyingk-NNModel", "D:\\CORPUS\\Testing");
        }//end of main
    }
    The XML:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true" height="371" width="671">
          <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve k-NN Model" width="90" x="112" y="30">
            <parameter key="repository_entry" value="k-NNModel"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve Wordlist" width="90" x="112" y="120">
            <parameter key="repository_entry" value="Wordlist"/>
          </operator>
          <operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="313" y="120">
            <list key="text_directories">
              <parameter key="Unknown" value="D:\CORPUS\Testing"/>
            </list>
            <parameter key="encoding" value="UTF-8"/>
            <process expanded="true" height="371" width="671">
              <connect from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply k-NN Model" width="90" x="450" y="30">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Retrieve k-NN Model" from_port="output" to_op="Apply k-NN Model" to_port="model"/>
          <connect from_op="Retrieve Wordlist" from_port="output" to_op="Process Documents from Files" to_port="word list"/>
          <connect from_op="Process Documents from Files" from_port="example set" to_op="Apply k-NN Model" to_port="unlabelled data"/>
          <connect from_op="Apply k-NN Model" from_port="labelled data" to_port="result 1"/>
          <connect from_op="Apply k-NN Model" from_port="model" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="234"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    I don't know why it appeared for you as not connected to to the "res" port, however, I am connecting the the "label" port of Apply model operator to the "res" port. In rapidminer, I got the the predicted labels in the "Data View". My problem is how to get them (the predicted labels) via java if I used java to set the parameters of "Process text from files" operator?


    When I run the above process in java class; I got the following error:

    May 01, 2014 10:46:20 PM com.rapidminer.tools.ParameterService init
    INFO: Reading configuration resource com/rapidminer/resources/rapidminerrc.
    May 01, 2014 10:46:20 PM com.rapidminer.tools.I18N <clinit>
    INFO: Set locale to en.
    May 01, 2014 10:46:20 PM com.rapid_i.Launcher ensureRapidMinerHomeSet
    INFO: Property rapidminer.home is not set. Guessing.
    May 01, 2014 10:46:20 PM com.rapid_i.Launcher ensureRapidMinerHomeSet
    INFO: Trying parent directory of 'C:\Program Files\Rapid-I\RapidMiner5\lib\rapidminer.jar'...gotcha!
    May 01, 2014 10:46:20 PM com.rapid_i.Launcher ensureRapidMinerHomeSet
    INFO: Trying parent directory of 'C:\Program Files\Rapid-I\RapidMiner5\lib\launcher.jar'...gotcha!
    May 01, 2014 10:46:22 PM com.rapidminer.parameter.ParameterTypePassword decryptPassword
    WARNING: Password in XML file looks like unencrypted plain text.
    May 01, 2014 10:46:25 PM com.rapidminer.tools.jdbc.JDBCProperties <init>
    WARNING: Missing database driver class name for ODBC Bridge (e.g. Access)
    May 01, 2014 10:46:25 PM com.rapidminer.tools.jdbc.JDBCProperties registerDrivers
    INFO: JDBC driver ca.ingres.jdbc.IngresDriver not found. Probably the driver is not installed.
    May 01, 2014 10:46:25 PM com.rapidminer.tools.jdbc.JDBCProperties registerDrivers
    INFO: JDBC driver oracle.jdbc.driver.OracleDriver not found. Probably the driver is not installed.
    May 01, 2014 10:46:25 PM com.rapidminer.tools.WrapperLoggingHandler log
    INFO: No filename given for result file, using stdout for logging results!
    May 01, 2014 10:46:25 PM com.rapidminer.Process run
    INFO: Process starts
    com.rapidminer.operator.UserError: Cannot resolve relative repository location 'k-NNModel'. Process is not associated with a repository.
    at com.rapidminer.Process.resolveRepositoryLocation(Process.java:1210)
    at com.rapidminer.operator.Operator.getParameterAsRepositoryLocation(Operator.java:1383)
    at com.rapidminer.operator.io.RepositorySource.getRepositoryEntry(RepositorySource.java:91)
    at com.rapidminer.operator.io.RepositorySource.read(RepositorySource.java:105)
    at com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:123)
    at com.rapidminer.operator.Operator.execute(Operator.java:834)
    at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
    at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
    at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
    at com.rapidminer.operator.Operator.execute(Operator.java:834)
    at com.rapidminer.Process.run(Process.java:925)
    at com.rapidminer.Process.run(Process.java:848)
    at com.rapidminer.Process.run(Process.java:807)
    at com.rapidminer.Process.run(Process.java:802)
    at com.rapidminer.Process.run(Process.java:792)
    at RapidMinerTopicClassifier.classifyDocumentByTopic(RapidMinerTopicClassifier.java:49)
    at RapidMinerTopicClassifier.main(RapidMinerTopicClassifier.java:68)
    Jars in my java project are
    - RapidMiner
    - Luncher
    - Vldocking
    - All jars inside (JDBC folder) - 4 jar files

  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    sorry, the FAQ was missing a bit:

    myProcess.setProcessLocation(pLoc);
    this links the process to the repository so it can resolve relative paths. Otherwise RapidMiner Studio does not know where to look when you reference other data in the repository w/o a fully qualified location.

    Regards,
    Marco
  • ReemReem Member Posts: 20 Contributor I
    I added that line and it didn't work,
    I changed the way to load the process to java application as follows:
    Process process = new Process(Tools.readTextFile(new File(rapidMinerProcess)));
    Also, I changed the value to be the Repository Location as follows it did work for retrieving the model:
    <parameter key="repository_entry" value="//RapidMinerRepository/k-NNModel"/>
    However, I don't want to rely on the repository, so I set the parameters via Java as follows:
    process.getOperator("Retrieve k-NN Model").setParameter("RepositorySource.PARAMETER_REPOSITORY_ENTRY", "D:\\k-NNModel");
    NOW, it works fine!

    My new task is to run "my course project" on a machine that doesn't have RapidMiner already installed.

    I tries to follow what mentioned in "RapidMiner as a library" section here http://rapid-i.com/wiki/index.php?title=Integrating_RapidMiner_into_your_application
    However, I didn't get from where I can put the rapidminerrc in my project? and how can I specify the required files fro my f\project from rapidminer.home/lib?
    Any further explanation or resources to read to achieve this will be appreciated.

    In addition,
    I still have problem in setting the parameters via java, I used the way mentioned in FAQ post as follow:

    List<String[]> list = new LinkedList<>();
    String[] directoryParameterValues = {"Unknown", "D:\\Testing"};
    list.add(directoryParameterValues);
    Operator ProcessDocumentsOperator = process.getOperator("Process Documents from Files");      ProcessDocumentsOperator.setListParameter("FileDocumentInputOperator.PARAMETER_TEXT_DIRECTORIES", list);

    Note: the process works in Rapidminer, but when passing the parameters via Java it doesn't work!!!

    regards,
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    your post is a bit confusing now, so I'll try to point some things out.

    1) When you have a process which makes use of operators that load further data/models/etc from the repository, your process needs a repository location. Otherwise it does not know how to resolve the relative locations for the "Retrieve" operators, as the process does not know where itself is from. That's why the process location has to be set and is set by RapidMiner Studio when starting the process from the GUI.

    2) process.getOperator("Retrieve k-NN Model").setParameter("RepositorySource.PARAMETER_REPOSITORY_ENTRY", "D:\\k-NNModel"); That this works is surprising to me as that parameter is intended for repository access which is an abstraction of a file system..

    3) The wiki section - good grief, I didn't even know that existed. Looks quite outdated to be honest. If you want to run rapidminer from your own application, make sure that all jars from RapidMiner/lib are in the classpath of your application. Just as any other library jars.

    4) Where exactly do we stand now - what is the latest error message you get?

    Regards,
    Marco
  • ReemReem Member Posts: 20 Contributor I
    Hi,

    I just realized yesterday that I was wrong about setting the repository_entry with a path of my file system.
    This is my mistake. Sorry for confusing you!

    1+2) Thanks for your clear explanation about the repository.
    So, what I understood is that having a repository is a "must" in using RapidMiner.
    but I still didn't get how to rely on a repository if we want to run this code on a computer that doesn't have rapidminer.

    3). Is there any resource/documentation to read the description for each jar in rapidminer/lib?
    Sorry for asking this question again and again; can I run my java application on a computer that does not have RapidMiner installed. Actually, I tried to run the last working code on a laptop that does not have RapidMiner and it gave me errors (I'll report them later to not confuse you with all the problems at once).

    4) sorry again for confusing you,
    the process was working correctly yesterday but I run in now and It give me new error:
    May 06, 2014 12:09:16 AM com.rapidminer.tools.ParameterService init
    INFO: Reading configuration resource com/rapidminer/resources/rapidminerrc.
    May 06, 2014 12:09:16 AM com.rapidminer.tools.I18N <clinit>
    INFO: Set locale to en.
    May 06, 2014 12:09:16 AM com.rapid_i.Launcher ensureRapidMinerHomeSet
    INFO: Property rapidminer.home is not set. Guessing.
    May 06, 2014 12:09:16 AM com.rapid_i.Launcher ensureRapidMinerHomeSet
    INFO: Trying parent directory of 'C:\Program Files\Rapid-I\RapidMiner5\lib\rapidminer.jar'...gotcha!
    May 06, 2014 12:09:16 AM com.rapid_i.Launcher ensureRapidMinerHomeSet
    INFO: Trying parent directory of 'C:\Program Files\Rapid-I\RapidMiner5\lib\launcher.jar'...gotcha!
    May 06, 2014 12:09:16 AM com.rapid_i.Launcher ensureRapidMinerHomeSet
    INFO: Trying parent directory of 'D:\Senior\RapidMinerClassifier\lib\launcher.jar'...gotcha!
    May 06, 2014 12:09:16 AM com.rapid_i.Launcher ensureRapidMinerHomeSet
    INFO: Trying parent directory of 'D:\Senior\RapidMinerClassifier\lib\rapidminer.jar'...gotcha!
    May 06, 2014 12:09:18 AM com.rapidminer.parameter.ParameterTypePassword decryptPassword
    WARNING: Password in XML file looks like unencrypted plain text.
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/C:/Program%20Files/Rapid-I/RapidMiner5/lib/slf4j-simple-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/D:/Senior/RapidMinerClassifier/lib/slf4j-simple-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    May 06, 2014 12:09:21 AM com.rapidminer.tools.jdbc.JDBCProperties <init>
    WARNING: Missing database driver class name for ODBC Bridge (e.g. Access)
    May 06, 2014 12:09:21 AM com.rapidminer.tools.jdbc.JDBCProperties registerDrivers
    INFO: JDBC driver ca.ingres.jdbc.IngresDriver not found. Probably the driver is not installed.
    May 06, 2014 12:09:21 AM com.rapidminer.tools.jdbc.JDBCProperties registerDrivers
    INFO: JDBC driver oracle.jdbc.driver.OracleDriver not found. Probably the driver is not installed.
    java.io.FileNotFoundException: \\RapidMinerRepository\Applyingk-NNModel (The network path was not found)
    null
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at java.io.FileReader.<init>(FileReader.java:72)
    at com.rapidminer.tools.Tools.readTextFile(Tools.java:714)
    at RapidMinerTopicClassifier.classifyDocumentByTopic(RapidMinerTopicClassifier.java:57)
    at RapidMinerTopicClassifier.main(RapidMinerTopicClassifier.java:93)
    The java code:

    import com.rapidminer.Process;
    import com.rapidminer.RapidMiner;
    import com.rapidminer.example.Attribute;
    import com.rapidminer.example.Example;
    import com.rapidminer.example.ExampleSet;
    import com.rapidminer.operator.IOContainer;
    import com.rapidminer.operator.Operator;
    import com.rapidminer.operator.OperatorException;
    import com.rapidminer.repository.RepositoryException;
    import com.rapidminer.tools.Tools;
    import com.rapidminer.tools.XMLException;
    import java.io.File;
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.LinkedList;
    import java.util.List;

    public class RapidMinerTopicClassifier implements RapidMinerClassifier {

        private static Example exampleSet = null;
        private String topic;
        private String classifierModel;

        public RapidMinerTopicClassifier() {
            RapidMiner.setExecutionMode(RapidMiner.ExecutionMode.COMMAND_LINE);
            RapidMiner.init();
        }//end of constructor

        /**
        *
        * @param rapidMinerProcess
        * @param DocumentsToBeClassfied
        * @return topic
        * @throws RepositoryException
        */
        @Override
        public String classifyDocumentByTopic(String rapidMinerProcess, String DocumentsToBeClassfied) throws RepositoryException {
            ExampleSet resultSet = null;
            try {
                // loads the process from the repository
                Process process = new Process(Tools.readTextFile(new File(rapidMinerProcess)));
                Operator ProcessDocumentsOperator = process.getOperator("Process Documents from Files");
                List<String[]> list = new LinkedList<>();
                String[] directoryParameterValues = {"Unknown", DocumentsToBeClassfied};
                list.add(directoryParameterValues);
                ProcessDocumentsOperator.setListParameter("FileDocumentInputOperator.PARAMETER_TEXT_DIRECTORIES", list);

                Attribute attribute;
                IOContainer ioResult = process.run();

                for (int i = 0; i < ioResult.size(); i++) {
                    if (ioResult.getElementAt(i) instanceof ExampleSet) {
                        resultSet = (ExampleSet) ioResult.getElementAt(0);
                    }

                    attribute = resultSet.getAttributes().get("prediction");
                    for (Example example : resultSet) {
                        topic = example.getNominalValue(attribute);
                    }
                }


            } catch (IOException | XMLException | OperatorException ex) {
                ex.printStackTrace();
            }

            return topic;
        }//end of classifyDocumentByTopic

        public static void main(String[] args) throws RepositoryException {
            RapidMinerClassifier rapidMinerClassifier = new RapidMinerTopicClassifier();
            /*
            * 1. SVM
            * 2. k-NN
            * 3. Naive Bayes
            */
            String topic = rapidMinerClassifier.classifyDocumentByTopic("//RapidMinerRepository/Applyingk-NNModel" , "/Testing");
            System.out.println(topic);
        }//end of main
    }
    .rmp file:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true" height="359" width="570">
          <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve k-NN Model" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//RapidMinerRepository/k-NNModel"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve Wordlist" width="90" x="45" y="120">
            <parameter key="repository_entry" value="//RapidMinerRepository/Wordlist"/>
          </operator>
          <operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="313" y="120">
            <list key="text_directories"/>
            <parameter key="encoding" value="UTF-8"/>
            <process expanded="true" height="371" width="671">
              <connect from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply k-NN Model" width="90" x="450" y="30">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Retrieve k-NN Model" from_port="output" to_op="Apply k-NN Model" to_port="model"/>
          <connect from_op="Retrieve Wordlist" from_port="output" to_op="Process Documents from Files" to_port="word list"/>
          <connect from_op="Process Documents from Files" from_port="example set" to_op="Apply k-NN Model" to_port="unlabelled data"/>
          <connect from_op="Apply k-NN Model" from_port="labelled data" to_port="result 1"/>
          <connect from_op="Apply k-NN Model" from_port="model" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="234"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Your help and effort are appreciated,
    Regards,
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    1) It's very much recommended, yes. You can create a new local repository (pointing to a folder of your choice) and add it to the RepositoryManager to use a repository on a shipped product.

    2) Not really, no. Most of them are libraries that RapidMiner Studio itself uses for certain tasks (e.g. a library to read Excel files etc). If in doubt, they are all needed.

    3) The error looks like the path on the filesystem where the repository is stored cannot be accessed. Sounds like the repository points to a network drive which is no longer available when executing the process.

    Regards,
    Marco
  • ReemReem Member Posts: 20 Contributor I
    Hi,

    Thank you for reply,
    As I go through the FAQ many time,
    I found the following
    // loads the process from the repository (if you do not have one, see alternative below)
    RepositoryLocation pLoc = new RepositoryLocation("//LocalRepository/folder/as/needed/yourProcessName"));..
    ...
    myProcess.setProcessLocation(pLoc);
    the second line showed me a error because the setProcessLocation() method receives a ProcessLocation object not a RepositoryLocation object!
    So, I tried to do it this way which doesn't make sense:
    myProcess.setProcessLocation(myProcess.getProcessLocation());
    So, did I miss any point?
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    sorry about that! You spotted an error in the FAQ.
    It should actually be:

    myProcess.setProcessLocation(new RepositoryProcessLocation(pLoc));
    Regards,
    Marco
  • ReemReem Member Posts: 20 Contributor I
    Hi,
    Many Thanks for your continues help!

    returning back to my main problem of setting the parameters, I tried to follow the way as mentioned above and in FAQ:

    Operator retrieveOperator = process.getOperator("Retrieve");
    retrieveOperator.setParameter("RepositorySource.PARAMETER_REPOSITORY_ENTRY", "//Repository/modelname");

    Operator processOperator = process.getOperator("Process Documents from Files");
    List<String[]> list = new LinkedList<>();
    String[] values = {"Unknown", "D:/Files"};//size of array must be the same as number of columns in the parameter GUI
    list.add(values);
    processOperator.setListParameter("FileDocumentInputOperator.PARAMETER_TEXT", list);
    I also tried the following for "Process Documents from Files" operator:

    processDocumentsOperator.setListParameter("FileDocumentInputOperator.PARAMETER_TEXT_DIRECTORIES.Unknown", DocumentsToBeClassfied);
    This doesn't work, the output is the following line:
    WARNING: Kernel Model: The given example set does not contain a regular attribute with name 'day'. This might cause problems for some models depending on this particular attribute.
    This warning line appears for every attribute in the training set!

    When I open the process.rmp file I don't see any changes even for the retrieve operator.
    Also, if I set the parameters' values in RapidMiner GUI, every thing works fine!

    Any tips?
    Thanks !
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    1)

    retrieveOperator.setParameter("RepositorySource.PARAMETER_REPOSITORY_ENTRY", "//Repository/modelname");
    processOperator.setListParameter("FileDocumentInputOperator.PARAMETER_TEXT", list);
    Will not work, you are not using the String constant but instead creating your own string which consists of the constant name ;)
    Correct would be:

    processOperator.setListParameter(FileDocumentInputOperator.PARAMETER_TEXT, list);
    2) When you change a process in your Java application, you are NOT working on the stored .rmp file. You have created a local in memory copy on which you are working. To persist your changes, you need to manually store your process again. To do so have a look at the StoreProcessAction.

    Regards,
    Marco
  • ReemReem Member Posts: 20 Contributor I
    Ops, silly mistake!

    Thanks for your patience and help!
Sign In or Register to comment.