RapidMiner

RapidMiner

Frequently Asked Questions (Development)

Moderator

Frequently Asked Questions (Development)

[ Edited ]

Hello and welcome to the Developer subforum! This forum is meant to provide a platform to ask Java code related questions - for example, in case you want to integrate RapidMiner Studio into your own application and want to know specific details.
However some questions get raised again and again, so this thread aims to answer the most frequently asked questions. If your specific questions are not covered here, please use the search function of this community portal. Also please note that a certain knowledge of the Java programming language is required, you will not have much luck doing your first steps into Java here.
If all of the above does not help, do not hesitate to open your own thread asking for advice!



1) Question: I want to integrate RapidMiner into my application! What do I have to do?
Answer: You need to either add the RapidMiner project as a library to your java project, or the RapidMiner library .jar files. RapidMiner sourcecode can be accessed via Git here.



2) Question: What else do I need to keep in mind when using RapidMiner Studio in my own application? Licensing?
Answer: The core of RapidMiner Studio is licensed under the AGPL 3. This means that your application must be licensed under the AGPL 3 as well!
If that is not possible/desirable or you need stuff from outside the free core, you need to purchase an OEM license.
Please contact us here for details.



3) Question: Is there any documentation available which may help me?
Answer: Yes, have a look here and here. A lot of information can be found there, especially in the "How to extend RapidMiner" whitepaper. Of course, searching the community portal may also help you solve your problem as you may not be the first person with a specific problem.



4) Question: Everytime I try to use RM Studio there is an exception like XMLException: Unknown operator class: 'process'! What is wrong?
Answer: You probably forgot to initialize RapidMiner via


RapidMiner.setExecutionMode(ExecutionMode.COMMAND_LINE);
RapidMiner.init();

Without initializing RapidMiner, it won't work.



5) Question: I want to create my own process and execute it via Java. What is the best way to do this?
Answer: The recommended way is to create the process(es) you need via the RapidMiner GUI, store them in your repository (or on the file system) and then open the processes via Java and execute them. That is much less error-prone and easier than trying to create a process programmatically.
Loading a process from a repository, executing it and using the results (IOObjects):


// this initializes RapidMiner with your repositories available
// you have to call this for all your programs, otherwise your RapidMiner integration will not work correctly
RapidMiner.setExecutionMode(ExecutionMode.COMMAND_LINE);
RapidMiner.init();

// loads the process from the repository (if you do not have one, see alternative below)
RepositoryLocation pLoc = new RepositoryLocation("//Local Repository/folder/as/needed/yourProcessName");
ProcessEntry pEntry = (ProcessEntry) pLoc.locateEntry();
String processXML = pEntry.retrieveXML();
Process myProcess = new Process(processXML);
myProcess.setProcessLocation(new RepositoryProcessLocation(pLoc));

// if need be, you can give the process IOObjects as parameter (this would be the case if you used the process input ports)
RepositoryLocation loc = new RepositoryLocation("//Local Repository/folder/as/needed/yourData");
IOObjectEntry entry = (IOObjectEntry) loc.locateEntry();
myIOObject= entry.retrieveData(null);

// execute the process and get the resulting objects
IOContainer ioInput = new IOContainer(new IOObject[] {myIOObject});
// just use myProcess.run() if you don't use the input ports for your process
IOContainer ioResult = myProcess.run(ioInput);

// use the result(s) as needed, for example if your process just returns one ExampleSet, use this:
if (ioResult.getElementAt(0) instanceof ExampleSet) {
   ExampleSet resultSet = (ExampleSet)ioResult.getElementAt(0);
}





6) Question: I want to change some parameters of the process before executing it. How do I do that?
Answer: Let's take the "Execute SQL" operator as an example (but of course it works for any operator):


// previous code that loads the process and sets the operator variable to the operator in question
operator.setParameter(SQLExecution.PARAMETER_QUERY, "SELECT * FROM mytable");


The first argument always specifies the parameter you want to edit, which is a String constant in the operator implementation class. To find the correct class you can use the OperatorsCore.xml file, which contains the mapping between RapidMiner operator names and the implementation classes. Note that the 'key' element is the name of each operator (whitespaces have been replaced with underscores). So if you are looking for the class which implements the "Execute SQL" operator, you'd search for "execute_sql" in said file. You will then find "<class>com.rapidminer.operator.SQLExecution</class>" which is the implementation class and contains constants for each parameter key.

If you want to set a more complex parameter like a list (where in the GUI you have a table-like structure with multiple columns), you can use the following method:


List<String[]> list = new LinkedList<>;
list.add(new String[2] = { "Positive", "C:\users\username\documents\positive" }); // size of array must be the same as number of columns in the parameter GUI
operator.setListParameter(FileDocumentInputOperator.PARAMETER_TEXTS, list);





7) Question: I want to develop my own operators. Where do I start?
Answer: First, have a look at the various documentations here, especially the "How to extend RapidMiner" whitepaper. Also you can check the 'OperatorsCore.xml' file, which tells you the java class for each operator in RapidMiner. You can then see how it is done for RM operators and just create your own operators the same way.
Also check out the example extension which you can build upon, which can be found here. Or you can use the template extension which you can convert to a fully fledged extension via a simple Gradle command here!



8) Question: I want to create a new ExampleSet from scratch. How do I do that?
Answer: It's actually quite easy, have a look at the following code:

List<Attribute> listOfAtts = new LinkedList<>();
// you can create any attribute type here, see Ontology class for more information
Attribute newNumericAtt = AttributeFactory.createAttribute("Numerical Att", Ontology.REAL);
listOfAtts.add(newNumericAtt);
Attribute newNominalAtt = AttributeFactory.createAttribute("Nominal Att", Ontology.POLYNOMINAL);
listOfAtts.add(newNominalAtt);

// ExampleSets provides access to a builder which can be used to create an ExampleSet
ExampleSetBuilder builder = ExampleSets.from(listOfAtts);

// every row is a double array internally; create and fill in data
double[] doubleArray = new double[listOfAtts.size()];
// numerical values are easy, just set them directly
doubleArray[0] = 42;
// nominal values need to be mapped and the mapped index set as the value
doubleArray[1] = newNominalAtt.getMapping().mapString("hello");

// just add our double array as data to the builder
builder.addRow(doubleArray);

// finally create the ExampleSet from the builder
ExampleSet exSet = builder.build();





9) Question: What else is there to know about the results of a RapidMiner Studio process?
Answer: The result of a RapidMiner Studio process is an IOContainer. This container can contain any number of results (including none at all). This depends on the number of connections to the process output ports. Essentially, you get the same number of results in the IOContainer as the number of connected process result ports (the ones on the top right corner of a process in RapidMiner Studio GUI). Caveat: If one or more of the lines results in a null result, it is omitted from the result IOContainer by default. This behavior can be changed by calling "process.setOmitNullResults(false);"
Iterate over all results of a process like so:


IOContainer container = process.run();
for (int i = 0; i < container.size(); i++) {
IOObject ioObject = container.getElementAt(i);
// do something
}



All results contained in an IOContainer implement the interface "IOObject". You can detect what they really are by simply checking for expected results or calling ioObject.getClass() to find out. The most common result would be an ExampleSet (wich is the main data class of RapidMiner Studio). However you can also get a Model or a FileObject or many, many others. To find out what is available, you can use Eclipse by selecting the IOObject interface in Eclipse and pressing Ctrl+T.

Extensions can define their own IOObject implementations. This is what you are faced with, because the Text Extension adds things like com.rapidminer.operator.text.Document or com.rapidminer.operator.text.WorldList to the mix. To use them programatically, you need to have the text extension sources available in your IDE. See my previous reply for a link. Once you have these sources available, you can work with the results and use them like you would for RapidMiner core IOObjects.

You can specify the repository location where process results for each port should be stored automatically. To access these programatically, you can do this:


ProcessContext context = process.getContext();
for (String loc : context.getOutputRepositoryLocations()) {
// do something
}


To set them, call


ProcessContext context = process.getContext();
context.setOutputRepositoryLocation(index, location);
// or
List<String> outputRepositoryLocations = // create list
context.setOutputRepositoryLocations(outputRepositoryLocations);





10) Question: How do I access specific parts of an ExampleSet?
Answer: You can iterate over each example (think row) in an example set (think table). You can then get the value (think cell) for specific attributes (think column) of this example.
There are three diferent cases to take care of:

The attribute is a numeric attribute:


Attribute attr = exampleSet.getAttributes().get("myAttr");
for (Example ex : exampleSet) {
    if (attr.isNumeric()) {
       double value = ex.getValue(attr);
   }
}



The attribute is a nominal attribute:


Attribute attr = exampleSet.getAttributes().get("myAttr");
for (Example ex : exampleSet) {
    if (attr.isNominal()) {
       String value = ex.getNominalValue(attr);
   }
}



The attribute is a datetime attribute:


Attribute attr = exampleSet.getAttributes().get("myAttr");
for (Example ex : exampleSet) {
    if (attr.isDateTime()) {
      double value = ex.getValue(attr);
      String result;
      if (Ontology.ATTRIBUTE_VALUE_TYPE.isA(attr.getValueType(), Ontology.DATE)) {
result = com.rapidminer.tools.Tools.formatDate(new Date((long) value));
} else if (Ontology.ATTRIBUTE_VALUE_TYPE.isA(attr.getValueType(), Ontology.TIME)) {
result = com.rapidminer.tools.Tools.formatTime(new Date((long) value));
} else {
result = com.rapidminer.tools.Tools.formatDateTime(new Date((long) value));
}
   }
}





Latest change: Adapted question 8 "ExampleSet creation" to use new builder introduced in RapidMiner Studio 7.3

_________________________________________________________
Team Lead Software Engineering | RapidMiner GmbH