"Issue in getting the predicted label from a Java application"

spgspg Member Posts: 3 Contributor I
edited June 9 in Help
Hi,


I created a classification process using Rapidminer, now I'm trying to get the predicted label from this process in my JAVA application, for that I am using the following code:
            RapidMiner.setExecutionMode(RapidMiner.ExecutionMode.COMMAND_LINE);
            RapidMiner.init();

            com.rapidminer.Process process = new com.rapidminer.Process(new File("readModel.rmp"));

            IOContainer results = process.run();
            results.asList();

            ExampleSet resultSet2 = results.get(ExampleSet.class);

            for (Example example : resultSet2) {

                Attribute predictedLabel = example.getAttributes().getPredictedLabel();
                predictionLabel = Double.parseDouble(example.getNominalValue(predictedLabel));

                System.out.println(predictionLabel);
            }
The problem is that the predicted label that I receive in my application is different from the predicted label from the rapidminer.

I am using the same files and data in both sides.

Thanks for any help.
Tagged:

Answers

  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    please be more specific. What do you mean by different?  0.349745 instead of 0.35? Or 0.1 instead of 0.7? Also please post the process XML here.

    Regards,
    Marco
  • spgspg Member Posts: 3 Contributor I
    Thanks for the reply,

    My classification process classifies new data into one of four possible classes (1.0, 2.0, 3.0 or 4.0).
    I'm trying get the prediction label in my JAVA application.
    The problem is that sometimes for example I get the prediction label 1.0, but if I run the calssification process with the same new data (test data) in Rapidminer I get the prediction label 3.0.
    I don't know why this happens, I think the results should be the same.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="6.0.002" expanded="true" height="60" name="New data test" width="90" x="246" y="300">
            <parameter key="excel_file" value="C:\Test Data.xls"/>
            <parameter key="sheet_number" value="1"/>
            <parameter key="imported_cell_range" value="A1:L3"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="date_format" value=""/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="ID.true.integer.id"/>
              <parameter key="1" value="Projecto.true.binominal.attribute"/>
              <parameter key="2" value="NUM_T.true.integer.attribute"/>
              <parameter key="3" value="NUM_M.true.integer.attribute"/>
              <parameter key="4" value="META.true.binominal.attribute"/>
              <parameter key="5" value="INSTANT.true.integer.attribute"/>
              <parameter key="11" value="LABEL.true.integer.label"/>
            </list>
            <parameter key="read_not_matching_values_as_missings" value="true"/>
            <parameter key="datamanagement" value="double_array"/>
          </operator>
          <operator activated="true" class="read_model" compatibility="6.0.002" expanded="true" height="60" name="Read Model" width="90" x="246" y="75">
            <parameter key="model_file" value="C:\model.mod"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="6.0.002" expanded="true" height="76" name="Apply Model" width="90" x="648" y="165">
            <list key="application_parameters"/>
            <parameter key="create_view" value="false"/>
          </operator>
          <connect from_op="New data test" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Read Model" from_port="output" to_op="Apply Model" to_port="model"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Thanks for any help

  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    what is the actual model you are using? Your process hides that quite well ;)
    Also it would be quite helpful if you provided a sample of the data with which I could replicate this behavior.

    Regards,
    Marco
  • spgspg Member Posts: 3 Contributor I
    I'm reading the model from another file, here is the XML:

    http://pastebin.com/q4Fq021j

    Data sample:

    ID   Project     NUM_T    NUM_M INSTANT C_EST LABEL

    2196 A       13         3         81           132       1
    2197 B       14         3         51           136 2
    2198 C       15         3         94           145 2
    2199 D       16         3         48           152 2
    2200 E       17         3         12           136 3
    2201 F       18         3         100           151 2
    2202 G       19         3           22           162       3
    2203 H       20         3           66           177 4
    2204 I         21         3         130             184         2
    2205 J       22         3   29             199              3
    2206 M       23         3   90             213 2
    2207 N         6         4   9               21 4
    2208 O         8         5   55             129 2
    2209 P         9         3   2               95 1
    2210 Q         6         4         2             17 2


    The model is using just the vars C_EST, NUM_M and INSTANT.

    Thanks for any help
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    the following code produces the exact same result with your model and your sample data regardless whether I use the GUI or a custom Java program to execute it.

    RapidMiner.setExecutionMode(ExecutionMode.COMMAND_LINE);
    RapidMiner.init();

    // loads the process from the repository (if you do not have one, see alternative below)
    RepositoryLocation pLoc = new RepositoryLocation("//Local Repository/testProcess");
    ProcessEntry pEntry = (ProcessEntry) pLoc.locateEntry();
    String processXML = pEntry.retrieveXML();
    com.rapidminer.Process myProcess = new com.rapidminer.Process(processXML);
    myProcess.setProcessLocation(new RepositoryProcessLocation(pLoc));
    IOContainer ioResult = myProcess.run();

    // use the result(s) as needed, for example if your process just returns one ExampleSet, use
    // this:
    if (ioResult.getElementAt(0) instanceof ExampleSet) {
    ExampleSet resultSet = (ExampleSet) ioResult.getElementAt(0);
    RepositoryManager.getInstance(null).getRepository("Local Repository")
    .createIOObjectEntry("testResult", resultSet, null, null);
          System.out.println("Stored");
    }
    Regards,
    Marco
Sign In or Register to comment.