How to change number of fraction digits seen in Rapidminer client

by RMStaff Thursday

On Rapidminer Client, navigate to Settings>Preferences>General>"Number Format" and change the number of digits there as can be seen below:

 

Preferences.png 

 

Inside a process, one would have to use a "Format Numbers" operator with a pattern of 0.00 (from 3 to 2 decimals) but it would turn the field into a nominal

Using a "Parse Numbers" operator afterwards would reset the formatting.

 

Format Numbers.png

How to Combine Models with Stacking

by RMStaff a week ago

At some point in your analysis you come at the point where you want to boost your model performance. The first step for this would be to go in the feature generation phase and search for better attribute combination for your learners.

As a next step, you might want to boost the performance of your machine learning method. A pretty common approach for this is called ensemble learning. In ensemble learning you build a lot of different (base) learners. The results of these base learners are combined or chained in different ways. In this article, we will focus on a technique called Stacking. Other approaches are Voting, Bagging or Boosting. These methods also available in RapidMiner.

In Stacking you have at least two algorithms called base learners. These learners do what they do if you would train them on the data set itself separately. You use the base learners on the data set which results in adata set containing your usual attributes and the prediction of your base learners.

Afterwards you  use another algorithm on this enriched data set which usesthe original attributes and the results of the previous learning step.

In essence, you use two algorithms to build an enriched data set so that a third algorithm can deliver better results.

Problem and Learners

To illustrate this problem, we have a look at a problem called Checkerboard. The data has two attributes att1 and att2 and is structured with square patches belonging to one group.

 

Data.png

 

 

Let’s try to solve this problem with a few learners and see what they can archive. To see what the algorithm found we can apply our model on random data. Afterwards we do a scatter plot with prediction on the colour axis to investigate decision boundaries. The results for each algorithm are depicted below.

Naïve Bayes: By design Naïve Bayes can only model a n-dimensional ellipsoid. Because we are working in two dimensions, Naïve Bayes tries to find the most discriminating ellipse. As seen below it puts one on the origin. This is the only pattern it can recognize.

k-NN: We also try k-NN with cosine similarity as distance measure (Euclidian distance would solve the problem well on its own). With cosine similarity k-NN can find angular regions as a pattern. The result is that k-NN finds a star like pattern and recognizes that the corners are blue. It fails to recognize the central region as a blue area.

Decision Tree: A Decision Tree model fails to discriminate. The reason for this is, that the decision tree looks at each dimension separately. But in each dimension the data is uniformly distributed. A Decision Tree finds no cut to solve this.pics.png

 

 

 

Stacking

Now, let’s “stack” these algorithms together. We use k-NN and Naïve Bayes as a base learner and Decision Tree to combine the results.

 

 The decision tree will get the results of both base learners as well as the original attributes as input:

 dataset.png

Where base_prediction0 is the result of Naïve Bayes and base_prediction1 is the result of k-NN. The tree can thus pick regions where it trusts different algorithms. In those areas, the tree can even split into smaller regions. The resulting tree looks like this:tree.png

Applied on random test data we get a result which is depicted below.

 

 

result.png

 

 

This is an impressive result. We take two learners which are not creating good results by their own and combine them with a learner which was not able to do anything on the data set and get a good result.

Example Process for Reporting Extension

by RMStaff 2 weeks ago

Usually, you can use RapidMiner Server for creating web-based reports and even complete web applications.  Other users prefer to deliver results into other data visualization products like Qlik or Tableau which is also supported by RapidMiner.


But there is another simple way to generate visual outputs as a result of your processes.  This is done by using the Reporting extension for RapidMiner which is available here: https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_reporting

 

The Reporting extension does not require the use of any other parts of the RapidMiner platform but RapidMiner Studio.  It also does not require any third party products in general. It simply generated the elements of HTML pages or PDF reports while RapidMiner executes the different steps of a process.  The output is very similar to Notebooks as they are used by many data scientists.

 

Please download and install the extension first.

 

How to generate PDF / HTML / ... reports with a RapidMiner process?

 

The extension works quite simply. First thing you need to do is to "open" a new report and give it a name.  This is done with the operator "Generate Report".  In the settings of this operator you specify the name of the report (important: you will need to use the same report name for all other operators later on!).  You can also select the type of the report (HTML, PDF etc.) as well as configure the look of the report.

 

Pro-Tip: You can even generate multiple reports in same process by using different report names.

 

Then you can add different operators to add elements to your report while RapidMiner is progressing its process.  Those operators are:

 

  • Add Section: adds a new section to the report.  This section gets a name as well as a level which basically defines the hierarchy of your reporting document.
  • Add Text: adds an arbitrary text to the report.
  • Add Pagebreak: adds a page break (for example in PDF reports).
  • Report: this is the key operator for reporting all dynamic content like data or models and will be explained below.

The operator "Report" is taking an arbitrary input and turns it into a graphical representation.  Just like for all other reporting operators, you need to specify the name of the report to which the visualization should be added.  Then you can configure the object by clicking on the button "Configure Report..." in the settings of the operator.

Here you can specify the output type and how the output should look like, e.g. that you want to export the data or a chart.

 

The process below is a working example which generates a PDF report based on the Iris data set.  Read here about how to import the XML description below.  Also make sure that you edit the filename in the "Generate Report" operator.

 

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="reporting:generate_report" compatibility="5.3.000" expanded="true" height="82" name="Generate Report" width="90" x="45" y="34">
        <parameter key="report_name" value="Report1"/>
        <parameter key="pdf_output_file" value="C:\Users\IngoMierswa\Desktop\Report1_Test.pdf"/>
      </operator>
      <operator activated="true" class="reporting:add_section" compatibility="5.3.000" expanded="true" height="82" name="Add Section" width="90" x="179" y="34">
        <parameter key="report_name" value="Report1"/>
        <parameter key="report_section_name" value="Data"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="7.4.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="51" y="136">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="reporting:report" compatibility="5.3.000" expanded="true" height="68" name="Report" width="90" x="185" y="136">
        <parameter key="report_name" value="Report1"/>
        <parameter key="report_item_header" value="Data"/>
        <parameter key="specified" value="true"/>
        <parameter key="reportable_type" value="Data Table"/>
        <parameter key="renderer_name" value="Data View"/>
        <list key="parameters">
          <parameter key="attribute_filter_type" value="all"/>
          <parameter key="use_except_expression" value="false"/>
          <parameter key="value_type" value="attribute_value"/>
          <parameter key="use_value_type_exception" value="false"/>
          <parameter key="except_value_type" value="time"/>
          <parameter key="block_type" value="attribute_block"/>
          <parameter key="use_block_type_exception" value="false"/>
          <parameter key="except_block_type" value="value_matrix_row_start"/>
          <parameter key="invert_selection" value="false"/>
          <parameter key="include_special_attributes" value="false"/>
          <parameter key="min_row" value="1"/>
          <parameter key="max_row" value="150"/>
        </list>
      </operator>
      <operator activated="true" class="reporting:add_section" compatibility="5.3.000" expanded="true" height="82" name="Add Section (3)" width="90" x="313" y="136">
        <parameter key="report_name" value="Report1"/>
        <parameter key="report_section_name" value="Class Distribution"/>
      </operator>
      <operator activated="true" class="reporting:report" compatibility="5.3.000" expanded="true" height="68" name="Report (3)" width="90" x="447" y="136">
        <parameter key="report_name" value="Report1"/>
        <parameter key="report_item_header" value="Class Distribution"/>
        <parameter key="specified" value="true"/>
        <parameter key="reportable_type" value="Data Table"/>
        <parameter key="renderer_name" value="Plot View"/>
        <list key="parameters">
          <parameter key="plotter" value="Pie"/>
          <parameter key="scatter_axis_x_axis_log_scale" value="false"/>
          <parameter key="scatter_axis_y_axis_log_scale" value="false"/>
          <parameter key="scatter_jitter_amount" value="0"/>
          <parameter key="scatter_rotate_labels" value="false"/>
          <parameter key="scatter_multiple_axis_x_axis_log_scale" value="false"/>
          <parameter key="scatter_multiple_jitter_amount" value="0"/>
          <parameter key="scatter_multiple_rotate_labels" value="false"/>
          <parameter key="scatter_matrix_jitter_amount" value="0"/>
          <parameter key="bubble_axis_x_axis_log_scale" value="false"/>
          <parameter key="bubble_rotate_labels" value="false"/>
          <parameter key="parallel_rotate_labels" value="false"/>
          <parameter key="parallel_local_normalization" value="false"/>
          <parameter key="series_rotate_labels" value="false"/>
          <parameter key="series_multiple_rotate_labels" value="false"/>
          <parameter key="som_jitter_amount" value="0"/>
          <parameter key="block_axis_x_axis_log_scale" value="false"/>
          <parameter key="block_axis_y_axis_log_scale" value="false"/>
          <parameter key="block_jitter_amount" value="0"/>
          <parameter key="block_rotate_labels" value="false"/>
          <parameter key="deviation_rotate_labels" value="false"/>
          <parameter key="deviation_local_normalization" value="false"/>
          <parameter key="histogram_absolute_values" value="false"/>
          <parameter key="histogram_rotate_labels" value="false"/>
          <parameter key="histogram_log_scale" value="false"/>
          <parameter key="histogram_number_of_bins" value="40"/>
          <parameter key="histogram_opaqueness" value="100"/>
          <parameter key="histogram_color_absolute_values" value="false"/>
          <parameter key="histogram_color_rotate_labels" value="false"/>
          <parameter key="histogram_color_log_scale" value="false"/>
          <parameter key="histogram_color_number_of_bins" value="40"/>
          <parameter key="histogram_color_opaqueness" value="100"/>
          <parameter key="bars_absolute_values" value="false"/>
          <parameter key="bars_rotate_labels" value="false"/>
          <parameter key="bars_aggregation" value="none"/>
          <parameter key="bars_use_distinct" value="false"/>
          <parameter key="bars_orientation" value="vertical"/>
          <parameter key="bars_stacked_absolute_values" value="false"/>
          <parameter key="bars_stacked_rotate_labels" value="false"/>
          <parameter key="bars_stacked_aggregation" value="none"/>
          <parameter key="bars_stacked_use_distinct" value="false"/>
          <parameter key="bars_stacked_orientation" value="vertical"/>
          <parameter key="pareto_rotate_labels" value="false"/>
          <parameter key="pareto_sorting_direction" value="Descending Keys"/>
          <parameter key="pareto_show_bar_labels" value="true"/>
          <parameter key="pareto_show_cumulative_labels" value="false"/>
          <parameter key="distribution_rotate_labels" value="false"/>
          <parameter key="web_absolute_values" value="false"/>
          <parameter key="web_rotate_labels" value="false"/>
          <parameter key="web_aggregation" value="none"/>
          <parameter key="web_use_distinct" value="false"/>
          <parameter key="pie_axis_group_by_column" value="label"/>
          <parameter key="pie_plot_column" value="label"/>
          <parameter key="pie_absolute_values" value="false"/>
          <parameter key="pie_aggregation" value="count"/>
          <parameter key="pie_use_distinct" value="false"/>
          <parameter key="pie_explosion_amount" value="0"/>
          <parameter key="pie_3d_absolute_values" value="false"/>
          <parameter key="pie_3d_aggregation" value="none"/>
          <parameter key="pie_3d_use_distinct" value="false"/>
          <parameter key="ring_absolute_values" value="false"/>
          <parameter key="ring_aggregation" value="none"/>
          <parameter key="ring_use_distinct" value="false"/>
          <parameter key="ring_explosion_amount" value="0"/>
        </list>
      </operator>
      <operator activated="true" class="reporting:add_text" compatibility="5.3.000" expanded="true" height="68" name="Add Text" width="90" x="45" y="238">
        <parameter key="report_name" value="Report1"/>
        <parameter key="report_text_header" value="End of Report"/>
        <parameter key="report_text" value="This is the end of this report."/>
      </operator>
      <connect from_op="Generate Report" from_port="through 1" to_op="Add Section" to_port="through 1"/>
      <connect from_op="Retrieve Iris" from_port="output" to_op="Report" to_port="reportable in"/>
      <connect from_op="Report" from_port="reportable out" to_op="Add Section (3)" to_port="through 1"/>
      <connect from_op="Add Section (3)" from_port="through 1" to_op="Report (3)" to_port="reportable in"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

 

How can I share processes without RapidMiner Server?

by RMStaff 2 weeks ago

Sample formats for Date Parsing

by RMStaff 2 weeks ago - edited 2 weeks ago

RapidMiner's Nominal to Date operator provides a very powerful parser to deal with various date time formats and automatic standarization to common timezones

This document provides additonal example to work with

 

Wed Mar 25 2015 09:56:24 UTC -0500

Wed Mar 25 2015 09:56:24 UTC +0000

can be parsed using

EEE MMM dd yyyy HH:mm:ss zzz X

 

 

Wed Mar 25 2015 09:56:24 EST

Wed Mar 25 2015 09:56:24 GMT

can be parsed using

EEE MMM dd yyyy HH:mm:ss zzzz

 

2001.07.04 AD at 12:08:56 PDT

can be parsed using

yyyy.MM.dd G 'at' HH:mm:ss z

 

010704120856-0700

can be parsed using

yyMMddHHmmssZ

 

Date and time formats are specified by date and time pattern strings. Within date and time pattern strings, unquoted letters from 'A' to 'Z' and from 'a' to 'z' are interpreted as pattern letters representing the components of a date or time string. Text can be quoted using single quotes (') to avoid interpretation. "''" represents a single quote. All other characters are not interpreted; they're simply copied into the output string during formatting or matched against the input string during parsing.

The following pattern letters are defined (all other characters from 'A' to 'Z' and from 'a' to 'z' are reserved):

 

Letter Date or Time Component Presentation Examples
G Era designator Text AD
y Year Year 1996; 96
Y Week year Year 2009; 09
M Month in year Month July; Jul; 07
w Week in year Number 27
W Week in month Number 2
D Day in year Number 189
d Day in month Number 10
F Day of week in month Number 2
E Day name in week Text Tuesday; Tue
u Day number of week (1 = Monday, ..., 7 = Sunday) Number 1
a Am/pm marker Text PM
H Hour in day (0-23) Number 0
k Hour in day (1-24) Number 24
K Hour in am/pm (0-11) Number 0
h Hour in am/pm (1-12) Number 12
m Minute in hour Number 30
s Second in minute Number 55
S Millisecond Number 978
z Time zone General time zone Pacific Standard Time; PST; GMT-08:00
Z Time zone RFC 822 time zone -0800
X Time zone ISO 8601 time zone -08; -0800; -08:00

From: https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html

 

A Practical Guide to Gradient Boosted Trees - Part I, Regression

by RMStaff ‎02-09-2017 10:43 AM - edited ‎02-13-2017 07:36 AM

Gradient Boosted Trees are one of the most powerful algorithms for classification and regression. They have all advantages of trees including the strength of handling nominal data well. The downside of any powerful, multivariate method is its complexity. This is the first of a series of articles drilling down into the algorithms and explain how they work.

 

The Setting

In this tutorial we will investigate how a Gradient Boosted Tree is doing a regression. As training data we use a sqrt-function between 0 and 1.

 

 

To make our algorithm easier more graspable we use a few simplifications. We do a regression for a label variable (y) with only one dependet variable (x).

We do use a GBT with a depth of 1 and learning rate of 1. We will discuss these options in later articles in more depth.

 

Initialization

 

Before starting with the real algorithm we will set a base line prediction. Our whole algorithm will consist of a lot of steps. The resulting function f(x) which will approximate the sqrt-function is a sum of the results of the individual steps. The first step is slightly different. We do set the base line prediction f0 to the average label. In our data this is 0.667.

 

Now we can move to the first iteration.

 

Step 1 - Calculate Errors

 

As a first step we calculate the errors. In our case these are the residuals between or our previous prediction f(x) and the truth y. In our fist iteration we do have f(x) = f0 = 0.667.

We define r = y - f(x) as the error for each individual example. It is crucial to note that the tree in each iteration is not predicting the label, but r! Since this is only a change in scale our sqrt-function still looks like this:

Step 2 - Tree Building

We now built a regression tree predicting r. Since we limit our tree to a depth of one we get:

 

 

This tree results in two leafs. We call this leafs R11 and R12 representing the first and the second leaf in the first iteration. For each of the two leafs we need to find the best prediction.

 

Step 3 - Prediction Definition

In our regression problem the calculation of what we predict in the leafs is fairly straight forward. We want to find the prediction, which minimizes the difference between our function and the truth. In our case this is the average of r in each leaf. We define the prediction in the i-th iteration and the j-th leaf as gij and get for our data:

 

g11 = -0.254

g12 = +0.157

 

 

Step 4 - Update of our Function

 

 We can now update our approximated function to estimate y, called f(x). Let's define the result of our first iteration f1:

f1 = -0.254 if x < 0.382, else: +0.152

And f(x) = f0 + f1. This results in a total function which approximates our initial unscaled sqrt like this:

Step 1 - Part 2, Build residuals again

We can now do Step 1 again. But this time we build the residuals between our new function approximation f(x) = f1 + f2 and y. So we get a picture like this:

 

In this iteration of the algorithm we try to fit this functions of residuals to get a better overall function approximation.

Step 2  and 3- Part 2, Build the tree and calculate a prediction.

Again we built a tree which splits up our data into to parts. If you look at the picture above, we do search for the point on the x-axis, where we can draw two lines parallel to the a-axis, which have a minimum distance to the residuals.

The tree is telling us, that the splitting point is performed at 0.107 and the two parallel lines g21 and g22 are -0.193 and +0.023.


Step 4 - Updating the function

The last step is to update our function. It is now:

f(x) = f0 + f1+f2

or in full words:

f1 = 0.667

-0.254+ if x < 0.382

+0.152 if x >= 0.382

- 0.193 if x < 0.107

+ 0.023 if x >= 0.107

 

 Or in a drawn form it looks like this:

 

 Please be aware that the second tree is also raising the right hand side of the function by a little bit.

 

With this knowledge we can now move on and start with Step 1 again. That's it.

 

 

 

 

 

Naming

  • y is the value of the label. E.g. the true value you want to predict
  • x is the dependent variable. E.g. one (or many) regular attributes
  • f(x) is the derived formula to predict y. E.g. the model
  • Rij is the j-th leaf in the i-th iteration of the algorithm.
  • gij the constant added to our function for all examples who are in the i-th iteration and the j-th leaf. Note: Often called gamma in literature.

How to run Rapidminer Studio without installing it locally or, without admin rights

by RMStaff on ‎02-02-2017 03:03 PM

1. Go to https://my.rapidminer.com/nexus/account/index.html#downloads and download Rapidminer Studio for Linux. It will be a rapidminer-studio-7.X.X.zip file.

2. Unzip the rapidminer-studio-7.3.1.zip file to a convenient location on your system. It will create a folder named:  rapidminer-studio

3. Open the rapidminer-studio folder.

4. double-click RapidMiner-Studio.bat This is a batch file (.bat) that will perform the necessary operations and start your Rapidminer Studio

 

P.S. You can also download rapidminer-studio-7.3.1-win64-install.exe, right-click it and extract it with 7-zip or some other app just like above and follow the same steps. however, some security policies may not allow you to download .exe files.

 

Caveat: Your Java environment variables should be set correctly on your system or this will not work. If you don't have Java installed or those variables set, you will need admin privileges to install it and set the variables. You can find instructions on the internet on how to do this but as an example, I am going to paste an example here:

 

Start cmd as admin

Type setx JAVA_HOME "C:\Program Files\Java\jre1.8.0_121" instead of 1.8.xxxx please use your version.

you should get: SUCCESS: Specified value was saved.
Then type: setx PATH "%PATH%;%JAVA_HOME%\";
again: SUCCESS: Specified value was saved

Indico Text and Image Analysis

by on ‎01-09-2017 06:18 AM - edited on ‎01-25-2017 07:14 AM by Community Manager

 

This is the second of several articles to help people use external APIs from within RapidMiner.  Here I will show how to access the Indico APIs (indico.io) which is a huge collection of tools for text and image analysis:

 

Text Analysis: Text Input Format, Sentiment, Sentiment HQ, Text Tags, Language Predictor, Political Analysis, Keywords, People, Places, Organizations, Twitter Engagement, Personality, Personas, Text Features, Relevance, Emotion, Intersections, Analyze Text, and Sentence Splitting

 

Image Analysis:  Image Input Format, Facial Emotion Recognition, Image Features, Facial Features, Facial Localization, Content Filtering, Image Recognition, and Analyze Image

 

I will show below an example of one text analysis API, Text Tags, and one image analysis, Facial Emotion Recognition, and you should be able to adapt easily to any of the others.  The full API documentation is here: https://indico.io/docs

 

INDICO TEXT TAGS

 

Here I am going to show how to use the Indico.io API “Text Tags” to take text and extract the likelihood that the text contains one or more of 111 possible topics (tags). You can of course change this to whatever you want.  I then add a short RapidMiner process to reduce this down to the top three tags.

 

1. You will need to create a free Indico account to get an API key.  You do this on https://indico.io/  The key should look like a long string of alphanumeric characters.  Keep this key secure as it is the way Indico authenticates and allocates the billing.  As of Dec 2016, Indico’s “Pay-as-you-Go” account allows up to 10,000 free API calls per month.  After that it is $0.006 per call up to 250,000 calls, and so forth (see https://indico.io/dashboard/plans for more info on pricing).

 

2. If you have not already done so, download the Web Mining extension in RapidMiner Studio.

 

3. Build a process that sends a text attribute (called “text”) to the Enrich Data by Webservice operator (found in the Web Mining extension) and then connect to the results.  I have included below sample process if you want to use mine as a starting point (you will need to insert your own API key).

 

4.  The only hard part here (and the only thing that changes from API to API) is how you set the “Enrich Data via Webservice” operator.  This is very similar to the Google Cloud API set-up (see previous post) but with the following changes:

 

query type: JSON path

attribute type: Numerical

JSONpath queries:

Anime $..Anime

Anthropology $..Anthropology

etc… 

 [There are 111 of these tags if you want all of them.  If you grab the XML from the sample process, you can save yourself a lot of work typing them all in manually.]

 

Request method: POST

Body: {"data":"<%text%>"}

URL: https://apiv2.indico.io/texttags

 

That’s it.  Results should look like this:

 

 

INDICO IMAGE FACIAL EMOTION RECOGNITION

 

Here I am going to show how to use the Indico.io API “Facial Emotion Recognition” to take a image [containing a human face] and extract the likelihood that the image contains one or more of the six possible emotions: happy, sad, angry, fear, surprise, neutral. You can of course change this to whatever you want.

 

1. You will need to create a free Indico account to get an API key and get the Web Mining Extension (see above).

  

2. Build a process that sends an image URL text attribute (called “URL”) to the Enrich Data by Webservice operator.

 

3.  Parameters for Enrich Data via Webservice:

 

query type: JSON path

attribute type: Numerical

JSONpath queries:

Happy $..happy

Sad $..sad

Angry $..angry

Fear $..fear

Surprise $..surprise

Neutral $..neutral

 

Request method: POST

Body: {"data”:”<%URL%>”}

URL: https://apiv2.indico.io/fer

 

 

That’s it.  If you use this image (https://pbs.twimg.com/profile_images/796243884636512260/zHVoWqKV.jpg), you should see these results:

 

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.4.000-BETA">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.4.000-BETA" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="false" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess" width="90" x="112" y="34">
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
            <parameter key="text" value="Democratic candidate Hillary Clinton is excited for the upcoming election."/>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="7.3.000" expanded="true" height="82" name="Documents to Data" width="90" x="179" y="34">
            <parameter key="text_attribute" value="text"/>
            <parameter key="add_meta_information" value="false"/>
          </operator>
          <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="313" y="34">
            <parameter key="query_type" value="JsonPath"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries"/>
            <list key="regular_region_queries"/>
            <list key="xpath_queries"/>
            <list key="namespaces"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries">
              <parameter key="Libertarian" value="$..Libertarian"/>
              <parameter key="Green" value="$..Green"/>
              <parameter key="Liberal" value="$..Liberal"/>
              <parameter key="Conservative" value="$..Conservative"/>
            </list>
            <parameter key="request_method" value="POST"/>
            <parameter key="service_method" value="foo"/>
            <parameter key="body" value="{&quot;data&quot;:&quot;&lt;%text%&gt;&quot;}"/>
            <parameter key="url" value="https://apiv2.indico.io/political"/>
            <list key="request_properties">
              <parameter key="X-ApiKey" value="foo"/>
            </list>
          </operator>
          <operator activated="true" class="parse_numbers" compatibility="7.4.000-BETA" expanded="true" height="82" name="Parse Numbers (2)" width="90" x="447" y="34">
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="regular_expression" value="[A-Z].*"/>
          </operator>
          <operator activated="true" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (4)" width="90" x="581" y="34">
            <process expanded="true">
              <operator activated="true" class="de_pivot" compatibility="7.4.000-BETA" expanded="true" height="82" name="De-Pivot (2)" width="90" x="45" y="34">
                <list key="attribute_name">
                  <parameter key="Probability" value="[A-Z].*"/>
                </list>
                <parameter key="index_attribute" value="Politics"/>
                <parameter key="create_nominal_index" value="true"/>
              </operator>
              <operator activated="true" class="sort" compatibility="7.4.000-BETA" expanded="true" height="82" name="Sort (2)" width="90" x="179" y="34">
                <parameter key="attribute_name" value="Probability"/>
                <parameter key="sorting_direction" value="decreasing"/>
              </operator>
              <operator activated="true" class="filter_example_range" compatibility="7.4.000-BETA" expanded="true" height="82" name="Filter Example Range (2)" width="90" x="313" y="34">
                <parameter key="first_example" value="1"/>
                <parameter key="last_example" value="1"/>
              </operator>
              <connect from_port="in 1" to_op="De-Pivot (2)" to_port="example set input"/>
              <connect from_op="De-Pivot (2)" from_port="example set output" to_op="Sort (2)" to_port="example set input"/>
              <connect from_op="Sort (2)" from_port="example set output" to_op="Filter Example Range (2)" to_port="example set input"/>
              <connect from_op="Filter Example Range (2)" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">choose highest probability</description>
          </operator>
          <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
          <connect from_op="Documents to Data" from_port="example set" to_op="Enrich Data by Webservice" to_port="Example Set"/>
          <connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_op="Parse Numbers (2)" to_port="example set input"/>
          <connect from_op="Parse Numbers (2)" from_port="example set output" to_op="Subprocess (4)" to_port="in 1"/>
          <connect from_op="Subprocess (4)" from_port="out 1" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Politics</description>
      </operator>
      <operator activated="false" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (2)" width="90" x="112" y="187">
        <process expanded="true">
          <operator activated="false" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document (3)" width="90" x="45" y="34">
            <parameter key="text" value="Je m'appelle Scott."/>
          </operator>
          <operator activated="false" class="text:documents_to_data" compatibility="7.3.000" expanded="true" height="103" name="Documents to Data (2)" width="90" x="179" y="34">
            <parameter key="text_attribute" value="text"/>
            <parameter key="add_meta_information" value="false"/>
          </operator>
          <operator activated="false" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice (2)" width="90" x="313" y="34">
            <parameter key="query_type" value="JsonPath"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries">
              <parameter key="foo" value=".*"/>
            </list>
            <list key="regular_region_queries"/>
            <list key="xpath_queries"/>
            <list key="namespaces"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries">
              <parameter key="Spanish" value="$..Spanish"/>
              <parameter key="French" value="$..French"/>
              <parameter key="English" value="$..English"/>
              <parameter key="Portuguese" value="$..Portuguese"/>
            </list>
            <parameter key="request_method" value="POST"/>
            <parameter key="service_method" value="foo"/>
            <parameter key="body" value="{&quot;data&quot;:&quot;&lt;%text%&gt;&quot;}"/>
            <parameter key="url" value="https://apiv2.indico.io/language"/>
            <list key="request_properties">
              <parameter key="X-ApiKey" value="foo"/>
            </list>
          </operator>
          <operator activated="false" class="parse_numbers" compatibility="7.4.000-BETA" expanded="true" height="82" name="Parse Numbers" width="90" x="447" y="34">
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="regular_expression" value="[A-Z].*"/>
          </operator>
          <operator activated="false" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (3)" width="90" x="581" y="34">
            <process expanded="true">
              <operator activated="true" class="de_pivot" compatibility="7.4.000-BETA" expanded="true" height="82" name="De-Pivot" width="90" x="45" y="34">
                <list key="attribute_name">
                  <parameter key="Probability" value="[A-Z].*"/>
                </list>
                <parameter key="index_attribute" value="Language"/>
                <parameter key="create_nominal_index" value="true"/>
              </operator>
              <operator activated="true" class="sort" compatibility="7.4.000-BETA" expanded="true" height="82" name="Sort" width="90" x="179" y="34">
                <parameter key="attribute_name" value="Probability"/>
                <parameter key="sorting_direction" value="decreasing"/>
              </operator>
              <operator activated="true" class="filter_example_range" compatibility="7.4.000-BETA" expanded="true" height="82" name="Filter Example Range" width="90" x="313" y="34">
                <parameter key="first_example" value="1"/>
                <parameter key="last_example" value="1"/>
              </operator>
              <connect from_port="in 1" to_op="De-Pivot" to_port="example set input"/>
              <connect from_op="De-Pivot" from_port="example set output" to_op="Sort" to_port="example set input"/>
              <connect from_op="Sort" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
              <connect from_op="Filter Example Range" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">choose highest probability</description>
          </operator>
          <connect from_op="Create Document (3)" from_port="output" to_op="Documents to Data (2)" to_port="documents 1"/>
          <connect from_op="Documents to Data (2)" from_port="example set" to_op="Enrich Data by Webservice (2)" to_port="Example Set"/>
          <connect from_op="Enrich Data by Webservice (2)" from_port="ExampleSet" to_op="Parse Numbers" to_port="example set input"/>
          <connect from_op="Parse Numbers" from_port="example set output" to_op="Subprocess (3)" to_port="in 1"/>
          <connect from_op="Subprocess (3)" from_port="out 1" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Language Detection</description>
      </operator>
      <operator activated="false" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (5)" width="90" x="246" y="34">
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document (2)" width="90" x="45" y="34">
            <parameter key="text" value="Democratic candidate Hillary Clinton is excited for the upcoming election."/>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="7.3.000" expanded="true" height="82" name="Documents to Data (3)" width="90" x="179" y="34">
            <parameter key="text_attribute" value="text"/>
            <parameter key="add_meta_information" value="false"/>
          </operator>
          <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice (3)" width="90" x="313" y="34">
            <parameter key="query_type" value="JsonPath"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries">
              <parameter key="foo" value=".*"/>
            </list>
            <list key="regular_region_queries"/>
            <list key="xpath_queries"/>
            <list key="namespaces"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries">
              <parameter key="SentimentScore" value="$..results"/>
            </list>
            <parameter key="request_method" value="POST"/>
            <parameter key="service_method" value="foo"/>
            <parameter key="body" value="{&quot;data&quot;:&quot;&lt;%text%&gt;&quot;}"/>
            <parameter key="url" value="https://apiv2.indico.io/sentiment"/>
            <list key="request_properties">
              <parameter key="X-ApiKey" value="foo"/>
            </list>
          </operator>
          <operator activated="true" class="parse_numbers" compatibility="7.4.000-BETA" expanded="true" height="82" name="Parse Numbers (3)" width="90" x="447" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="SentimentScore"/>
            <parameter key="regular_expression" value="[A-Z].*"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="7.4.000-BETA" expanded="true" height="82" name="Generate Attributes" width="90" x="581" y="34">
            <list key="function_descriptions">
              <parameter key="Sentiment" value="if(SentimentScore&gt;0.67,&quot;Positive&quot;,&#10;if(SentimentScore&lt;0.33,&quot;Negative&quot;,&quot;Neutral&quot;))"/>
            </list>
            <description align="center" color="transparent" colored="false" width="126">Sentiment</description>
          </operator>
          <connect from_op="Create Document (2)" from_port="output" to_op="Documents to Data (3)" to_port="documents 1"/>
          <connect from_op="Documents to Data (3)" from_port="example set" to_op="Enrich Data by Webservice (3)" to_port="Example Set"/>
          <connect from_op="Enrich Data by Webservice (3)" from_port="ExampleSet" to_op="Parse Numbers (3)" to_port="example set input"/>
          <connect from_op="Parse Numbers (3)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Sentiment</description>
      </operator>
      <operator activated="false" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (6)" width="90" x="380" y="34">
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document (4)" width="90" x="45" y="34">
            <parameter key="text" value="Democratic candidate Hillary Clinton is excited for the upcoming election."/>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="7.3.000" expanded="true" height="82" name="Documents to Data (4)" width="90" x="179" y="34">
            <parameter key="text_attribute" value="text"/>
            <parameter key="add_meta_information" value="false"/>
          </operator>
          <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice (4)" width="90" x="313" y="34">
            <parameter key="query_type" value="JsonPath"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries">
              <parameter key="foo" value=".*"/>
            </list>
            <list key="regular_region_queries"/>
            <list key="xpath_queries"/>
            <list key="namespaces"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries">
              <parameter key="SentimentScore" value="$..results"/>
            </list>
            <parameter key="request_method" value="POST"/>
            <parameter key="service_method" value="foo"/>
            <parameter key="body" value="{&quot;data&quot;:&quot;&lt;%text%&gt;&quot;}"/>
            <parameter key="url" value="https://apiv2.indico.io/sentimenthq"/>
            <list key="request_properties">
              <parameter key="X-ApiKey" value="foo"/>
            </list>
          </operator>
          <operator activated="true" class="parse_numbers" compatibility="7.4.000-BETA" expanded="true" height="82" name="Parse Numbers (4)" width="90" x="447" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="SentimentScore"/>
            <parameter key="regular_expression" value="[A-Z].*"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="7.4.000-BETA" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="581" y="34">
            <list key="function_descriptions">
              <parameter key="Sentiment" value="if(SentimentScore&gt;0.67,&quot;Positive&quot;,&#10;if(SentimentScore&lt;0.33,&quot;Negative&quot;,&quot;Neutral&quot;))"/>
            </list>
            <description align="center" color="transparent" colored="false" width="126">Sentiment</description>
          </operator>
          <connect from_op="Create Document (4)" from_port="output" to_op="Documents to Data (4)" to_port="documents 1"/>
          <connect from_op="Documents to Data (4)" from_port="example set" to_op="Enrich Data by Webservice (4)" to_port="Example Set"/>
          <connect from_op="Enrich Data by Webservice (4)" from_port="ExampleSet" to_op="Parse Numbers (4)" to_port="example set input"/>
          <connect from_op="Parse Numbers (4)" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Sentiment High Quality</description>
      </operator>
      <operator activated="false" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (7)" width="90" x="246" y="187">
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document (5)" width="90" x="45" y="34">
            <parameter key="text" value="Democratic candidate Hillary Clinton is excited for the upcoming election."/>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="7.3.000" expanded="true" height="82" name="Documents to Data (5)" width="90" x="179" y="34">
            <parameter key="text_attribute" value="text"/>
            <parameter key="add_meta_information" value="false"/>
          </operator>
          <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Indico API Text Tags" width="90" x="313" y="34">
            <parameter key="query_type" value="JsonPath"/>
            <list key="string_machting_queries"/>
            <parameter key="attribute_type" value="Numerical"/>
            <list key="regular_expression_queries"/>
            <list key="regular_region_queries"/>
            <list key="xpath_queries"/>
            <list key="namespaces"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries">
              <parameter key="Anime" value="$..anime"/>
              <parameter key="Anthropology" value="$..anthropology"/>
              <parameter key="Archery" value="$..archery"/>
              <parameter key="Architecture" value="$..architecture"/>
              <parameter key="Art" value="$..art"/>
              <parameter key="Astronomy" value="$..astronomy"/>
              <parameter key="Atheism" value="$..atheism"/>
              <parameter key="Aviation" value="$..aviation"/>
              <parameter key="Baseball" value="$..baseball"/>
              <parameter key="Beer" value="$..beer"/>
              <parameter key="Bicycling" value="$..bicycling"/>
              <parameter key="Biology" value="$..biology"/>
              <parameter key="Books" value="$..books"/>
              <parameter key="Boxing" value="$..boxing"/>
              <parameter key="Buddhism" value="$..buddhism"/>
              <parameter key="Business" value="$..business"/>
              <parameter key="Cars" value="$..cars"/>
              <parameter key="Christianity" value="$..christianity"/>
              <parameter key="Climbing" value="$..climbing"/>
              <parameter key="Comedy" value="$..comedy"/>
              <parameter key="Comics" value="$..comics"/>
              <parameter key="Conspiracy" value="$..conspiracy"/>
              <parameter key="Cooking" value="$..cooking"/>
              <parameter key="Crafts" value="$..crafts"/>
              <parameter key="Cricket" value="$..cricket"/>
              <parameter key="Design" value="$..design"/>
              <parameter key="Dieting" value="$..dieting"/>
              <parameter key="Diy" value="$..diy"/>
              <parameter key="Drugs" value="$..drugs"/>
              <parameter key="Economic_Discussion" value="$..economic_discussion"/>
              <parameter key="Education" value="$..education"/>
              <parameter key="Electronics" value="$..electronics"/>
              <parameter key="Energy" value="$..energy"/>
              <parameter key="Entertainment_News" value="$..entertainment_news"/>
              <parameter key="Environmental" value="$..environmental"/>
              <parameter key="Fashion" value="$..fashion"/>
              <parameter key="Fiction" value="$..fiction"/>
              <parameter key="Film" value="$..film"/>
              <parameter key="Fishing" value="$..fishing"/>
              <parameter key="Fitness" value="$..fitness"/>
              <parameter key="Gaming" value="$..gaming"/>
              <parameter key="Gardening" value="$..gardening"/>
              <parameter key="Gender_Issues" value="$..gender_issues"/>
              <parameter key="General_Food" value="$..general_food"/>
              <parameter key="Golf" value="$..golf"/>
              <parameter key="Guns" value="$..guns"/>
              <parameter key="Health" value="$..health"/>
              <parameter key="History" value="$..history"/>
              <parameter key="Hockey" value="$..hockey"/>
              <parameter key="Hunting" value="$..hunting"/>
              <parameter key="Individualist_Politics" value="$..individualist_politics"/>
              <parameter key="Investment" value="$..investment"/>
              <parameter key="Islam" value="$..islam"/>
              <parameter key="Jobs" value="$..jobs"/>
              <parameter key="Judaism" value="$..judaism"/>
              <parameter key="Left_Politics" value="$..left_politics"/>
              <parameter key="Lgbt" value="$..lgbt"/>
              <parameter key="Math" value="$..math"/>
              <parameter key="Medicine" value="$..medicine"/>
              <parameter key="Military" value="$..military"/>
              <parameter key="Music" value="$..music"/>
              <parameter key="Nba" value="$..nba"/>
              <parameter key="News" value="$..news"/>
              <parameter key="Nfl" value="$..nfl"/>
              <parameter key="Nostalgia" value="$..nostalgia"/>
              <parameter key="Nutrition" value="$..nutrition"/>
              <parameter key="Parenting" value="$..parenting"/>
              <parameter key="Personal" value="$..personal"/>
              <parameter key="Personal_Care_And_Beauty" value="$..personal_care_and_beauty"/>
              <parameter key="Personalfinance" value="$..personalfinance"/>
              <parameter key="Pets" value="$..pets"/>
              <parameter key="Philosophy" value="$..philosophy"/>
              <parameter key="Photography" value="$..photography"/>
              <parameter key="Poetry" value="$..poetry"/>
              <parameter key="Poker" value="$..poker"/>
              <parameter key="Political_Discussion" value="$..political_discussion"/>
              <parameter key="Programming" value="$..programming"/>
              <parameter key="Psychology" value="$..psychology"/>
              <parameter key="Realestate" value="$..realestate"/>
              <parameter key="Relationships" value="$..relationships"/>
              <parameter key="Religion" value="$..religion"/>
              <parameter key="Right_Politics" value="$..right_politics"/>
              <parameter key="Romance" value="$..romance"/>
              <parameter key="Rugby" value="$..rugby"/>
              <parameter key="Running" value="$..running"/>
              <parameter key="Sailing" value="$..sailing"/>
              <parameter key="School" value="$..school"/>
              <parameter key="Science" value="$..science"/>
              <parameter key="Scuba" value="$..scuba"/>
              <parameter key="Singing" value="$..singing"/>
              <parameter key="Skateboarding" value="$..skateboarding"/>
              <parameter key="Soccer" value="$..soccer"/>
              <parameter key="Sports" value="$..sports"/>
              <parameter key="Startups_And_Entrepreneurship" value="$..startups_and_entrepreneurship"/>
              <parameter key="Surfing" value="$..surfing"/>
              <parameter key="Swimming" value="$..swimming"/>
              <parameter key="Tattoo" value="$..tattoo"/>
              <parameter key="Technology" value="$..technology"/>
              <parameter key="Television" value="$..television"/>
              <parameter key="Tennis" value="$..tennis"/>
              <parameter key="Travel" value="$..travel"/>
              <parameter key="Ultimate" value="$..ultimate"/>
              <parameter key="Vegan" value="$..vegan"/>
              <parameter key="Vegetarian" value="$..vegetarian"/>
              <parameter key="Weather" value="$..weather"/>
              <parameter key="Wedding" value="$..wedding"/>
              <parameter key="Weight_Training" value="$..weight_training"/>
              <parameter key="Wine" value="$..wine"/>
              <parameter key="Wrestling" value="$..wrestling"/>
              <parameter key="Writing" value="$..writing"/>
              <parameter key="Yoga" value="$..yoga"/>
            </list>
            <parameter key="request_method" value="POST"/>
            <parameter key="service_method" value="foo"/>
            <parameter key="body" value="{&quot;data&quot;:&quot;&lt;%text%&gt;&quot;}"/>
            <parameter key="url" value="https://apiv2.indico.io/texttags"/>
            <list key="request_properties">
              <parameter key="X-ApiKey" value="foo"/>
            </list>
          </operator>
          <operator activated="true" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (8)" width="90" x="447" y="34">
            <process expanded="true">
              <operator activated="true" class="de_pivot" compatibility="7.4.000-BETA" expanded="true" height="82" name="De-Pivot (3)" width="90" x="45" y="34">
                <list key="attribute_name">
                  <parameter key="Probability" value="[A-Z].*"/>
                </list>
                <parameter key="index_attribute" value="TextTag"/>
                <parameter key="create_nominal_index" value="true"/>
              </operator>
              <operator activated="true" class="sort" compatibility="7.4.000-BETA" expanded="true" height="82" name="Sort (3)" width="90" x="179" y="34">
                <parameter key="attribute_name" value="Probability"/>
                <parameter key="sorting_direction" value="decreasing"/>
              </operator>
              <operator activated="true" class="filter_example_range" compatibility="7.4.000-BETA" expanded="true" height="82" name="Filter Example Range (3)" width="90" x="313" y="34">
                <parameter key="first_example" value="1"/>
                <parameter key="last_example" value="3"/>
              </operator>
              <operator activated="true" class="pivot" compatibility="7.4.000-BETA" expanded="true" height="82" name="Pivot" width="90" x="447" y="34">
                <parameter key="group_attribute" value="text"/>
                <parameter key="index_attribute" value="TextTag"/>
                <parameter key="consider_weights" value="false"/>
                <parameter key="skip_constant_attributes" value="false"/>
              </operator>
              <operator activated="true" class="rename_by_replacing" compatibility="7.4.000-BETA" expanded="true" height="82" name="Rename by Replacing" width="90" x="581" y="34">
                <parameter key="attribute_filter_type" value="regular_expression"/>
                <parameter key="regular_expression" value="Prob.*"/>
                <parameter key="replace_what" value="Probability_"/>
              </operator>
              <connect from_port="in 1" to_op="De-Pivot (3)" to_port="example set input"/>
              <connect from_op="De-Pivot (3)" from_port="example set output" to_op="Sort (3)" to_port="example set input"/>
              <connect from_op="Sort (3)" from_port="example set output" to_op="Filter Example Range (3)" to_port="example set input"/>
              <connect from_op="Filter Example Range (3)" from_port="example set output" to_op="Pivot" to_port="example set input"/>
              <connect from_op="Pivot" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
              <connect from_op="Rename by Replacing" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">choose highest 3 probabilities</description>
          </operator>
          <connect from_op="Create Document (5)" from_port="output" to_op="Documents to Data (5)" to_port="documents 1"/>
          <connect from_op="Documents to Data (5)" from_port="example set" to_op="Indico API Text Tags" to_port="Example Set"/>
          <connect from_op="Indico API Text Tags" from_port="ExampleSet" to_op="Subprocess (8)" to_port="in 1"/>
          <connect from_op="Subprocess (8)" from_port="out 1" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Text Tags</description>
      </operator>
      <operator activated="false" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (9)" width="90" x="380" y="187">
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document (6)" width="90" x="45" y="34">
            <parameter key="text" value="Democratic candidate Hillary Clinton is excited for the upcoming election."/>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="7.3.000" expanded="true" height="82" name="Documents to Data (6)" width="90" x="179" y="34">
            <parameter key="text_attribute" value="text"/>
            <parameter key="add_meta_information" value="false"/>
          </operator>
          <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="FOOFOOFOO (2)" width="90" x="313" y="34">
            <parameter key="query_type" value="Regular Expression"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries">
              <parameter key="foo" value=".*"/>
            </list>
            <list key="regular_region_queries"/>
            <list key="xpath_queries"/>
            <list key="namespaces"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries">
              <parameter key="Result" value="$..results"/>
            </list>
            <parameter key="request_method" value="POST"/>
            <parameter key="service_method" value="foo"/>
            <parameter key="body" value="{&quot;data&quot;:&quot;&lt;%text%&gt;&quot;,&quot;threshold&quot;:0.1,&quot;top_n&quot;:2}"/>
            <parameter key="url" value="https://apiv2.indico.io/keywords"/>
            <list key="request_properties">
              <parameter key="X-ApiKey" value="foo"/>
            </list>
          </operator>
          <operator activated="true" class="multiply" compatibility="7.4.000-BETA" expanded="true" height="103" name="Multiply" width="90" x="447" y="34"/>
          <operator activated="true" class="generate_id" compatibility="7.4.000-BETA" expanded="true" height="82" name="Generate ID (2)" width="90" x="581" y="136"/>
          <operator activated="true" class="text:data_to_documents" compatibility="7.3.000" expanded="true" height="68" name="Data to Documents" width="90" x="581" y="34">
            <parameter key="select_attributes_and_weights" value="true"/>
            <list key="specify_weights">
              <parameter key="foo" value="1.0"/>
            </list>
          </operator>
          <operator activated="true" class="text:json_to_data" compatibility="7.3.000" expanded="true" height="82" name="JSON To Data" width="90" x="715" y="34"/>
          <operator activated="true" class="rename_by_replacing" compatibility="7.4.000-BETA" expanded="true" height="82" name="Rename by Replacing (2)" width="90" x="849" y="34">
            <parameter key="replace_what" value="results."/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="7.4.000-BETA" expanded="true" height="82" name="Generate ID" width="90" x="983" y="34"/>
          <operator activated="true" class="join" compatibility="7.4.000-BETA" expanded="true" height="82" name="Join" width="90" x="1117" y="34">
            <list key="key_attributes"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="7.4.000-BETA" expanded="true" height="82" name="Select Attributes" width="90" x="1251" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="foo|id"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <connect from_op="Create Document (6)" from_port="output" to_op="Documents to Data (6)" to_port="documents 1"/>
          <connect from_op="Documents to Data (6)" from_port="example set" to_op="FOOFOOFOO (2)" to_port="Example Set"/>
          <connect from_op="FOOFOOFOO (2)" from_port="ExampleSet" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Data to Documents" to_port="example set"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Generate ID (2)" to_port="example set input"/>
          <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Data to Documents" from_port="documents" to_op="JSON To Data" to_port="documents 1"/>
          <connect from_op="JSON To Data" from_port="example set" to_op="Rename by Replacing (2)" to_port="example set input"/>
          <connect from_op="Rename by Replacing (2)" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Join" from_port="join" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Keywords</description>
      </operator>
      <operator activated="true" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (10)" width="90" x="112" y="340">
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document (7)" width="90" x="45" y="34">
            <parameter key="text" value="https://pbs.twimg.com/profile_images/796243884636512260/zHVoWqKV.jpg"/>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="7.3.000" expanded="true" height="82" name="Documents to Data (7)" width="90" x="179" y="34">
            <parameter key="text_attribute" value="URL"/>
            <parameter key="add_meta_information" value="false"/>
          </operator>
          <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice (5)" width="90" x="313" y="34">
            <parameter key="query_type" value="JsonPath"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries">
              <parameter key="foo" value=".*"/>
            </list>
            <list key="regular_region_queries"/>
            <list key="xpath_queries"/>
            <list key="namespaces"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries">
              <parameter key="Happy" value="$..Happy"/>
              <parameter key="Sad" value="$..Sad"/>
              <parameter key="Angry" value="$..Angry"/>
              <parameter key="Fear" value="$..Fear"/>
              <parameter key="Surprise" value="$..Surprise"/>
              <parameter key="Neutral" value="$..Neutral"/>
            </list>
            <parameter key="request_method" value="POST"/>
            <parameter key="service_method" value="foo"/>
            <parameter key="body" value="{&quot;data&quot;:&quot;&lt;%URL%&gt;&quot;}"/>
            <parameter key="url" value="https://apiv2.indico.io/fer"/>
            <list key="request_properties">
              <parameter key="X-ApiKey" value="foo"/>
            </list>
          </operator>
          <connect from_op="Create Document (7)" from_port="output" to_op="Documents to Data (7)" to_port="documents 1"/>
          <connect from_op="Documents to Data (7)" from_port="example set" to_op="Enrich Data by Webservice (5)" to_port="Example Set"/>
          <connect from_op="Enrich Data by Webservice (5)" from_port="ExampleSet" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Image Facial Emotion Recognition</description>
      </operator>
      <connect from_op="Subprocess (10)" from_port="out 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

 

How to Build a Dictionary Based Sentiment Model in RapidMiner

by RMStaff on ‎01-23-2017 07:02 AM - edited on ‎01-23-2017 10:13 AM by Community Manager

When you want to extract a sentiment from a text you usually have three options to go

  1. Use a prelearned model like the sentiment tools from Aylien and Rosette
  2. Use a supervised learning method on annotated texts to built your own sentiment scorer
  3. Use a predefined dictionary where each word has a weight

Every method has it's pros and cons. In this post we will focus on #3 and built a model. For english language you can use Wordnet, which has it's own extension. For german you can use SentiWS

This post describes a generic way to implement a custom dictionary based scoring.

 

In this example we assume, that you have a dictionary with two coloums:

 

Word        Weight

good 1.0 bad -1.5

 

 

Where a negative Weight means a negative sentiment. From this table we would like to built a scoring function like this:

 

score = 1.0 * good - 1.5 * bad

As we can see, this is a simple linear equation. We can use simple linear regression archive our results. To do so we need to prepare the table. First of all we need to invert all weights - this can be done using a Generate Attributes operator.

 

The next step is to bring the table into a form like this

good    bad
1.0        0
0            -1.5

This is in pinciple a task for the Pivot operator. We combine this with a GenerateID operator to get a unique group key and with a Rename by Replacing to get the correct naming conventions. A Replace Missing Values operator allows us to replace all missing values with zeros.

 

 

 

 The next step is to generate a label attribute. For this task we use a Generate Attributes and a Set Role operator. The resulting example set looks like this.

On this example set we learn a Vector Linear Regression to get a model with our desired equation.

 

 

This model can be used on texts. These texts can be transformed into the right shape using Process Documents (from Data), Tokenize and Transform Cases. An example is shown in the attached process.

R Output Logs Knowledge Base

by RMStaff on ‎01-19-2017 10:06 AM

R Output Logs

 

The execute R operator allows the user to publish results from the implementd R code into the Rapidminer Studio Log. Here are the few easy steps required to do so.

 

Step 1: Open the Log view.

 

Navigate to the menu bar and go to veiw. Next select show panel and select Log. This will open a new window in the UI for the log in Studio. You can now move the log around just like all of the default veiws.

 

Step 2: Execute R

 

Place an Execute R operator onto the process window and connect some data to its input port. We will use the default code as an example in this case. Note the lines:

We will use the default code as an example in this case. Note the lines:

 

"print('Hello, world!')
# output can be found in Log View
print(str(data))"

 

The print option allows the user to publish to the Studio Log. Now the two objects we've printed from our R code show up in our Log.

 

 

How to Use Annotations to Attach Performances and Timestamps to Models

by RMStaff on ‎01-17-2017 08:32 AM

While storing an object like a Model or an ExampleSet you might also want to store additional information about it. This might include the source of the file, a timestamp or the predictive performance of a model. Doing this in a seperate file might be unwieldy. RapidMiner offers a functionality called Annotation to add these information directly to the object.

 

Adding Timestamps to Models

A common use case to for annotations is to timestamp models. This can in principal also be done using a timestamp in the filename, but having a comment with the timestamp is often easier.

The operator to create an Annotation is called Annotate. The timestamp can be added by using the predefined macro %{t}.

 

 To view this you can go into the results view. Every object in results view has an tab called Annotation, where you can see your new comment.

 

 

If you would like to have a different time stamp format you can create one with the Generate Macro operator. To get the annotation as an ExampleSet you can use the Annotation To Data operator.

 

Adding Performances To Models

 

Another connected use case is to add the performance of a model to the model itself. That can also be done using the Annotate Operator. The major difference is that we need to prepare our performance value first. In fact we need to store it in a macro. This can be done using a Performance to Data in combination with Extract Macro.

 

 

 

To add more than one performance measure you can use an Aggregate operator to get a |-seperated list. You can replace the | with a Replace operator.

How to interact with Google Cloud APIs with the Web Mining extension

by on ‎12-20-2016 12:02 PM - edited on ‎12-20-2016 12:03 PM by Community Manager

This is the first of several articles to help people use external APIs from within RapidMiner.  There are some APIs that actually do not need any effort at all because there are extensions that make your life very easy.  A good example is the NamSor extension by Elian Carsenat that will predict gender and nationalities based on first and last names.  I strongly encourage people to take advantage of this resource if you work with names.

 

Unfortunately there are very few APIs that are as easy to use in RapidMiner as NamSor.  As all APIs work differently, I'd like to show how to use some common ones and if you need to use a different one, you can likely figure it out from here.

 

GOOGLE CLOUD API

 

Google has an amazing suite of APIs that are available to developers at relatively low cost:

  • Google Cloud Storage
  • Google Cloud Translation
  • Google Drive
  • Google Maps Directions
  • Google Maps JavaScript
  • Google Picker

Here I am going to show how to use Google Cloud Translation to take text and use the Google Translate API to detect the language.  You can of course change this to whatever you want.

 

1. You will need to create a Google Cloud developer account to get an API key.  You do this on https://cloud.google.com.  The key should look like a long string of alphanumeric characters.  Keep this key secure as it is the way Google authenticates and allocates the billing.

 

2. If you have not already done so, download the Web Mining extension in RapidMiner Studio.

 

3. Build a process that sends a text attribute to the Enrich Data by Webservice operator (found in the Web Mining extension) and then connect to the results.

 

4.  The only hard part here (and the only thing that changes from API to API) is how you set up this operator.  For Google Cloud APIs, you will set it up like this:

 

 

The url is cut-off here but is https://translation.googleapis.com/language/translate/v2/detect?key=<your Google Cloud key>.

 

Note I am using POST rather that GET requests.  This is due to the character limit on GET requests and, most likely, your text will exceed this character limit.  Also note that I am putting the API key in the url, rather than in the header of the request ("request properties").  Usually you can do this either way, but sometimes it does not work in the header.  Go figure.

 

 

 

 

 

In the body, you will create this small JSON file (assuming your attribute is named "text"):

 

{
'q': '<%text%>'
}

 

In the jsonpath queries, you select which part you want.  For Language Detection, you would enter $..language as the query expression.  You can name the attribute anything you like.

 

5. Run your process.  It should work nicely UNLESS your data, like mine, has strange values in it and hence at some point will cause an error.  You will want to skip over this example and keep going.  Otherwise if you have 10,000 examples and your API works until the 9996th example and then finds an error, you will lose all of those API results (but still pay for them).  So it is more prudent to do something like this:

 

 

That's about it.  For reference, here are two other useful applications, and the changes you would need to make:

 

To translate text from one language to another

 

url: https://translation.googleapis.com/language/translate/v2?key=<your Google Cloud key>

body:

{
'q': '<%text%>',
'target': 'en',
'format': 'text'
}

 

jsonpath query: $..translatedText

 

To calculate the driving distance between two addresses (I do this via a GET request because the text is short)

 

url: https://maps.googleapis.com/maps/api/distancematrix/xml?units=imperial&origins=<%LocationAddress%>&d...>

request properties:        key           <your Google Cloud API key>

query type: XPath

attribute type: nominal

xpath query: //distance/text/text()

 

Happy Googling.

 

 

 

Feature Weighting Tutorial

by RMStaff on ‎12-14-2016 10:57 AM - edited on ‎12-19-2016 06:55 AM by Community Manager

In some use cases you may be interested in figuring out which attribute(s) is important to predict a given label. This attribute performance can be a result by itself, because it can tell you what reasons make someone or something behave in this way. In this article we will discuss common techniques to find these feature weights.

 

Filter Methods

One of the most used methods to find an attribute importance is to use a statistical measure to define importance. Often used measures are Correlation, Gini Index or Information gain. In RapidMiner you can calculate these values using the Weight by Operators.

 

 

 The resulting object is a weight vector. The weight vector is the central object for all feature weightening operations. If we have a look at it, it looks like this:

 

 There are two operators which are important for the use of weight objects. Weights to Data converts this table into an example set. This can then be exported into Excel or a database.

Select by Weights allows you to select attributes using this weights. You can for example select attributes only having higher weights than 0.1 or take the k top ones.

 

Including Non Linear Attributes and Combinations

 

The Filter methods above have the problem to not incorperate non-linearities. A technique to overcome this is to generate non-linear combinations of the same attribute. The operator Generate Function Set can be used to generate things like pow(Age,2) or sqrt(Age) and combination between these. This operator is usually combined with Rename by Construction to get readable names.

 

Handling Dependencies

Another known issue with the filter methods are dependencies between the variables. If you data set contains Age, 2xAge,3xAge and 4xAge all of them might get a hight feature weight. A technique which overcomes this issue would be MRMR. MRMR is included in RapidMiner's Feature Selection Extension.

 

Model Based Feature Weights

 

Another way to get feature weights is to use a model. Some models are able to provide a weight vector by themselves. These values are telling you how important an attribute was for the learner itself. The concrete calculation scheme is different for all learners. Weight vectors are provided by these opertors:

  • Linear Regression
  • Generalized Linear Model
  • Gradient Boosted Tree
  • Support Vector Machine (only with linear kernel)
  • Logistic Regression
  • Logistic Regression (SVM)

It is generally advisable to tune the parameters (and choice) of these operators for a maximal prediction accuracy before taking the weights.

 

A special case is the Random Forest operator. A Random Forest model can be feeded into a Weight By Tree Importance operator to get a feature weight.

 

 

Feature Selection Methods

 

Besides Feature Weighting you can also use a Feature Selection Techniques. The difference is, that you only get a weight vector with 1 if it in the set of chosen attributes and a 0 if it is not in. The most common techniques for this are also wrapper methods namely Forward Selection and Backwards Elemination.

 

 

Polynominal Classification Problems and Clustering

In polynominal Classification problems it is often useful to do this in a one vs all fashion. This answers the question "what makes group A different from all the other"? A variation of this method includes to apply it on cluster labels to get cluster descriptions.

 

Evolutionary Feature Generation

 

Another sophistacted approach to incorparate non-linearities and also interaction terms (e.g. find a dependency like sqrt(age-weight)) is to use a evolutionary feature generation approach. The operators in RapidMiner are Yagga and Yagga2. Please have a look at the operator Info for more details.

 

Prespreptive Analytics

 

Instead of generation a general feature weight you can also find individual dependencies for a single example. In this case you would vary the variables in an example and check the influence predicted by your model. A common use case is to check wether its worth to call a customer or not by checking his individual scoring result with and without a call.

 

 

Ressources

Daitch-Mokotoff Soundex for Word / Name Matching

by on ‎11-18-2016 05:00 AM

I find it quite common when doing text processing to match names or words that may not be spelled exactly the same.  For example, "SCHWARZ" can be spelled "SCHWARTZ", and "JENNIE" can be spelled "JENNY" or "JENI" and so forth.  A common technique to match words that sound the same is to use a "soundex" system.  It converts words to a code by the way the word sounds.  In the second example, "JENNIE", "JENNY", and "JENI" would all have the exact same Soundex code.

 

There are many Soundex systems that have been used over the years; my preference is the Daitch-Mokotoff Soundex (created by Randy Daitch and Gary Mokotoff in 1985 as a revised version of the Russell / NARA Soundex system developed in 1918).  It is the same soundex system used to search for names in the famous Ellis Island Immigrant Database.

 

I have written the RapidMiner code needed to take a nominal attribute (it must only have one word in it) and output its D-M Soundex code.  You can find it attached to this KB article as a .buildingblock file (if you use it as-is, you will need to name your attribute "att1").  I hope others find it as useful as I do.  There are probably ways to make this code more efficient; I welcome contributions any time.

 

Scott

 

Parallelized Cross Validation & Revamped Data Core

by Community Manager ‎11-16-2016 12:08 PM - edited ‎11-16-2016 12:09 PM

 

 

There have been some great backend updates to RapidMiner version 7.3 that are aimed at making your predictive analytics faster.

 

With version 7.3 a new parallelized Cross Validation operator was introduced that completely uses your available cores on your machine (per your license) and improves the memory management. Check out Tobias's post on the subject here. This new Cross Validation is the first of many planned improvements in memory management.

 

More parallelized operators are planned to be released in upcoming releases and are all a part of RapidMiner's Data Core initiative. We're busy reworking the memory management guts of RapidMiner to make huge improvements for our users.  Tobias writes about how we're doing just that in his latest blog post.

 

We are asking our community to check out these new enhancements and give us feedback. Let us know if you see a speed up in your analytics!

 

 

th.jpg

Extracting OpenStreetMap Data in RapidMiner

by Community Manager on ‎11-14-2016 09:40 AM

Visit the Neural Market Trends blog for more on this and a free example process to download. Thanks @thomas_ott

 

http://www.neuralmarkettrends.com/Extracting-OpenStreetMap-Data-In-RapidMiner/

D3.js Visualization with RapidMiner Studio and Server

by on ‎11-03-2016 10:12 AM - edited on ‎11-03-2016 10:16 AM by Community Manager

Hello all RapidMiner users.  I would like to share with you how to use the amaScreen Shot 2016-11-02 at 1.43.16 PM.pngzing capabilities of the D3.js data visualization libraries, in combination with RapidMiner Studio and RapidMiner Server, to view your data in a dynamic and visually appealing interface.  If you want to try it out for yourself before reading this KB article, please go to http://www.genzerconsulting.com/atanga-public-beta and/or watch this enthralling demonstration video.

 

For those who have not heard of D3, my guess is that you have seen D3 visualizations without even realizing it.  Many of the data visualizations seen on media websites use this Javascript library.  You can visit the D3.js website or the D3.js GitHub page to learn more.  This application is based on the work done by Thomas Ott at RapidMiner, as shown in a video published in January 2016.  I would like to thank Tom for both his initial work on this technique and for answering endless questions afterwards.

 

This implementation starts with both RapidMiner Studio and Server running, either locally or virtually.  In my example I am using RapidMiner Studio 7.1 installed locally on a Mac Pro, and RapidMiner Server 7.1 installed virtually, with a standard SQL database, on an Amazon Web Services (AWS) EC2 linux (Ubuntu) t2.medium instance.  In my experience, the t2.medium is the minimum computing power/memory needed to run RapidMiner Server on an EC2 linux instance.

 

Once you have both RapidMiner Studio and RapidMiner Server running, you will need to open the server repository in RM Studio to create a simple process.  You can learn how to do this here.

 

Next, create a process in RM Studio that will generate the D3.js code needed to run the visualization.  This is virtually identical to Tom Ott's work and basically looks like this (note you will need the "Text Processing" extension installed on RM Studio in order to see some of these operators):

 

Screen Shot 2016-11-02 at 1.34.37 PM.png

 

Next go to RM Server and create a new "app" to show off your amazing visualization.  The basics can be found here; I am just going to show you the few tweaks that Tom Ott created to port the code into RM Server:

 

- create a new visualization component by clicking on the gear icon in the menu bar.

- at the BOTTOM of the page, you should see three tabs: General settings, Data and Format, and Interaction.

- with the "Data and Format" tab, use these settings:

 

Screen Shot 2016-11-02 at 1.39.39 PM.png 

 

That's about it!  If you have played your cards right, you should be able to see your amazing visualization.  If not, let me or Tom know and we can try to help.  Good luck and have fun!

 

Scott

 

 

 

 

 

 

 

 

 

 

 

 

 

How to Get Rules from a Decision Tree?

by RMStaff on ‎10-21-2016 09:53 AM - edited on ‎11-03-2016 07:35 AM by Community Manager

Question

Decision Trees are well known for their understandability. How can I get the rules a Decision Tree provides in a more handy format?

Answer

To convert Decision Trees into Rules you can use the Tree to Rule operator. The Tree to Rule operator is a nested operator, means you can put the Decision Tree inside.

The Decision Tree is parsed into a rule format which is easier to understand

 

RuleModel.png

 

 

This Rule Model can be parsed into a example set with the following Groovy script. The Groovy script needs to be used in an Execute Script operator. An example is attached as a process.

 

import com.rapidminer.operator.learner.rules.RuleModel;
import com.rapidminer.tools.Ontology;
import com.rapidminer.tools.LogService;
import com.rapidminer.operator.learner.rules.Rule;

import java.util.logging.Level
RuleModel ruleModel = input[0]
numberOfAttributes = 4;

Attribute[] attributes= new Attribute[numberOfAttributes];
attributes[0] = AttributeFactory.createAttribute("Full Rule", Ontology.STRING);
attributes[1] = AttributeFactory.createAttribute("Label", Ontology.STRING);
attributes[2] = AttributeFactory.createAttribute("Correct Examples covered by this rule", Ontology.STRING);
attributes[3] = AttributeFactory.createAttribute("Wrong Examples covered by this rule", Ontology.STRING);
MemoryExampleTable table = new MemoryExampleTable(attributes);
DataRowFactory ROW_FACTORY = new DataRowFactory(0);

String[] myvalues = new String[numberOfAttributes]

for(Rule currentRule : ruleModel.getRules()){
    int correct = 0;
    int wrong = 0;
    int label = ruleModel.getLabel().getMapping().getIndex(currentRule.getLabel());
    LogService.root.log(Level.INFO, currentRule.toString())
    int[] frequencies = currentRule.getFrequencies();
    if (frequencies != null) {
        for (int i = 0; i < frequencies.length; i++) {
            if (i == label) {
                correct += frequencies[i];
            } else {
                wrong += frequencies[i];
          }
      }
      myvalues[0] = currentRule.toString()
      myvalues[1] = currentRule.getLabel()
      myvalues[2] = String.valueOf(correct);
      myvalues[3] = String.valueOf(wrong);

      DataRow row = ROW_FACTORY.create(myvalues, attributes)
      table.addDataRow(row);
    }
}

return table.createExampleSet();

The result looks like the screen shot below:

 

Example Set from Tree.png

 

 

 

Best Practices for Folder Structures in Repositories

by RMStaff on ‎10-26-2016 08:53 AM - edited on ‎10-26-2016 09:56 AM by Community Manager

RapidMiner Repositories give you the option to store anything in folders. Here is a ‘best practice’ on how to organize the folders to make them easier to use.

 

There should be one folder per project.  This can be either at the top-level of your Local Repository or in a projects folder on the top level of a Server repository. Our proposed folder structure would be:

 

  • app
    • View 1
    • View 2
  • data
  • debug
  • models
  • processes
    • subprocesses
  • results
  • webservices

Note: Italic folders are not mandatory

 

app

The app folder contains all processes related to an app. In larger processes it makes sense to use subfolders for each View on the app – View 1, View 2, above. Only the global processes (like !Initialize) would be on the top level.

 

data

Simply contains all data used in the analysis.

 

debug

From time to time it is needed to have debug data - mostly to test things during the design of the process. A common example would be a data base sample which might be used instead of the real, full database.

 

processes

This is the main folder holding all processes of your analysis. It often makes sense to create a subprocess folder which contains function-like processes which are used throughout the main processes via Execute Process.

 

results / models

The results folder contains all results of the modelling process. Usually there are performance and models. In the case of multiple models - either because there are many different types of models you want to try, or because you want to predict many labels - it makes sense to have a dedicated folder for each model.

 

webservices

contains all processes which are used to offer a webservice. In rare cases a subprocess folder might be of use.

I got a new license, why won't it update inside of Studio?

by Moderator ‎08-16-2016 11:51 AM - edited ‎10-12-2016 05:25 PM

Symptoms

 

You've gotten a new liLicenseSnip.PNGcense for Rapidminer but when you go to Settings -> Manage Licenses    you do not see the new license that you should.  

Diagnosis

It's likely that Rapidminer Studio is reading your old license instead of your new one, this can happen if there are multiple valid licenses inside of your .Rapidminer\licenses folder.  This is likely inside of your user folder - ie C:\Users\USERNAME.  

Solution

Deleting all of the licenses inside of  \.RapidMiner\licenses\rapidminer-studio\licenses and restarting Rapidminer Studio will require you enter your correct license.  You can also do a targeted deletion of your older license based on the license term - the term dates will be included in the license file name.  LicenseFileDateConventions.PNG

Save Outputs without Overwriting previous outputs stuff

by RMStaff ‎10-03-2016 11:21 AM - edited ‎10-06-2016 08:36 AM

Many a times you may want to run workflows repeatedly and may have to save the result, if you would like to ensure that such outputs are not overwritten so that you can go back and review the results. The obvious solution here is to ensure that the store path is changed before every run.

But this option makes it difficult to manage the paths as well as error prone, where if you run without changing, you may end up overwriting previous results

 

To solve this issue we recommend you should use a process that can create new time stamp based folders or paths

We recommend using a macro as a  folder name rather than the final entry name, since that will automaticall group items underone time stamp named folder

e.g /path/to/%{t}/modelname is better than  /path/to/%{t}_modelname

 

1) Using Repository Manipulation related operator. 

RapidMiner provides several operator to work with Repository Entries like Rename, Move, copy etc

Using an Operator like "Copy Repository Entry" where the source path points to top level folder /ProjectName/outputs and destination points to something like

/Projectresults/%{t}/ will copy the outputs folder and all its subfolder, items etc to a new path, the %{t} will be replaced by the system time stamp itself.

 

 

2) Use In-built macro directly This option is generally good when there is only one store operation in the entire workflow. RapidMiner provides an in-built macros which provides system time. The current system time is available as %{t}. So dropping in %{t} in the store path will automatically add a timestamp in the path and ensuring you are creating new folders. The time stamp are down to seconds level, so any process running more than a second should be fine.

 

 

 

2016-10-03 16_08_58-__Local Repository_do not overwrite_my workflow_ – RapidMiner Studio Large 7.2.0.png

 

3) Use in-built macro to set another macro

Since  %{t1} always gives current system time down the second level, having multiple store operators which execute even a second apart will create new folder.

And also for long running process you may not exactly when the store happened leaving you to guess the right path. However in most cases you will know the time when you triggered of the process. So the solution here is the capture the start time of the process into another macro.

This can be done using the "set Macro" operator.

Use the set macro as one of the first operator and capture the start time into another macro. e.g below t1 will have the start time.

Then use t1 wherever you want to replace the path with process start time. This way t1 will be unique thru the whole time,ensuring your outputs are saved under the same folder for the same run

2016-10-03 16_17_51-__Local Repository_do not overwrite_my workflow_ – RapidMiner Studio Large 7.2.0.png

 

 

 

 

 

 

Generating PMML models using Rapidminer

by RMStaff ‎09-28-2016 11:57 AM - edited ‎09-28-2016 11:58 AM

PMML is an XML based format allowing interchange between various modeling tools and platforms. You can find details about PMML here http://dmg.org/pmml/v4-1/GeneralStructure.html or on the  Wikipedia article here https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language

 

RapidMiner provides ability to export certain models in PMML format. For doing so you will need to download the PMML extension from the marketplace.

Here are the instructions for installing extensions

http://docs.rapidminer.com/studio/installation/adding-extensions.html

 

The marketplace link for the extension is here https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml;jsessionid=A2F633CDF273B... in case you are not able to get it directly from the marketplace.

 

Once installed the extension should add a new operator under the Data Access>>Files>>Write>>Write PMML path.

2016-09-28 16_53_20-Start.png

 

 

This operator will save the given model to an XML file of PMML 4.0 format

 

This operator will write the given model to an XML file of PMML 4.0 format. This format is a standard for data mining models and is understood by many data bases. It can be used for applying data mining models directly in the database. This way it can be applied on a regular basis on huge amounts of data. This operator supports the following models:

  • Decision Tree Models
  • Rule Models
  • Naive Bayes models for nominal attributes
  • Linear Regression Models
  • Logistic Regression Models
  • Centroid based Cluster models like models of k-means and k-medoids

 

The operator needs two parameters input i.e a A location to where the PMML should be stored and what version of PMML to use

The operator takes in one of the above mentioned models as an INPUT

Read Access Tutorial with Example Access File

by RMStaff ‎09-22-2016 04:59 PM - edited ‎09-22-2016 05:14 PM

Symptoms

Diagnosis

Solution

1. Download the sample accdb file from here  https://www.dur.ac.uk/cis/docs/guides/files/access/

and pick the first link to download 'A Sample Database.accdb'

accesssample.png

 

2. Open the downloaded sample database in Access, here is what I got from M$ Access 2016 in my local computer 

The Asset Items table and some Queries/Forms/Reports are available in my Access view

access_screenshot1.png

 

3. As long as you can view the table from Access, you can load it into RapidMiner studio using 'Read Access' connector

access_screenshot2.PNGaccess_screenshot3.PNG

 

4. You can also run a query to load Access views, for example, let's copy and paste the query below to RapidMiner Studio

 

SELECT `Asset Items`.`Asset No`, `Asset Items`.Make, `Asset Items`.Model, `Asset Items`.Acquired, `Asset Items`.Cost, `Cost`/1.22 AS `Ex Tax`, `Cost`-(`Cost`/1.22) AS GST

FROM `Asset Items`;

access_screenshot4.png

Then in the results view, it should return the quried data table:

access_screenshot5.png

 

5. For more details, please refer to the documentation 

http://docs.rapidminer.com/studio/operators/data_access/files/read/read_access.html

and the attached RapidMiner process.

Read SAS data

by RMStaff on ‎08-17-2016 11:54 AM - edited on ‎09-21-2016 10:40 AM by Community Manager

Question

I have a SAS data, can I load it into RapidMiner? I encountered an error when I tried to read in a SAS data set in rapidminer.

Answer

'Read SAS' operator is an advanced data connector in rapidminer. http://docs.rapidminer.com/studio/operators/data_access/files/read/read_sas.html

 

If your process failed for some known parsing issues with 'Read SAS' operator, you can try this workaround solutions uses R to read SAS files and RapidMiner's R connector to easily get a handy data set out of R.

 

In order to apply it you need to:

- download r scripting extension from our marketplace via RapidMiner Studio or directly via this site: https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_r_scriptin...
- install a local copy of R and open RapidMiner Studio preferences which will after a restart hold an additional "R scripting" tab – here enter the local path to RScript.exe of your local R installation
- open R command line and run install.packages("sas7bdat") and install.packages("data.table") to add these libraries to your setup
- now you can open the attached process (SAS Alternate Reader URL)  and configure the operator "Generate Macro" to open any SAS file from a gvien URL

- for testing purposes, you can have access to a list of example sas7bdat files from here:

https://github.com/ppham27/sas_to_csv/tree/master/test_files

- to make the attached process work for your own LOCAL SAS data, simply modify the function expression of the macro 'source' to update the value of source, for instance, set macro 'source' as "C:/Users/HOMEFOLDER/Downloads/YOUROWNDATA.sas7bdat". Refer to the attached process (SAS Alternate Reader Local) to make some modifications accordingly.

 

 

 

 

How to Add Timestamps

by RMStaff on ‎08-25-2016 07:47 AM - edited on ‎09-21-2016 10:39 AM by Community Manager

Question

I would like to use a Write or Store operator but tag the result with a timestamp, how do I do this?

Answer

In order to add a timestamp we will use macros. If you do not yet know about macros, please have a look at this article.

 

Using Generate Macro

You can use the Generate Macro operator to generate a new macro named e.g. timestamp. To do this we use the equation:

 

date_str_custom(date_now(),"dd-MM-yy")

 

the format of the date can be changed easily to include for example the second or the minute. This can then be used in the path of store or Write Excel/CSV/... with the standard macro notation. For a Store operator it looks like this:

 

//Local Repository/my data %{timestamp}

 

The corresponding process is attached to this article.

Using Built-In Macros

Instead of creating your own macro, you can also use the built-in macro t. This macro is always available. For the Store example it would be used like this

 

 //Local Repository/my data %{t}

 

This method is faster than the previously mentioned but you cannot change the date format of %{t} without using Generate Macro. The attached process is also showing the usage of t.

How to Connect a Teradata Database to Studio?

by RMStaff on ‎07-13-2016 09:30 AM - edited on ‎09-12-2016 06:09 AM by Moderator

Question

How to connect a Teradata database to RapidMiner Studio?

 

Note: This article does not cover Teradata Aster

Answer

The Teradata JDBC driver package includes two jar files, one containing the driver and another one with a config file in it. The driver only works when both files are linked via the classpath.

Step-by-step guide

Add the steps involved:

  1. Please download the appropriate JDBC drivers from Teradata and copy all its jar files to the folder "lib/jdbc" inside your RapidMiner Studio installation directory. This ensures that the driver classes are part of the java classpath.
  2. Next go to the database driver management dialog.
  3. For the jar file dialog manually enter both path statements comma separated to address both files (e.g. /path-to-rapidminer-folder/lib/jdbc/terajdbc4.jar,/path-to-rapidminer-folder/lib/jdbc/tdgssconfig.jar).
  4. Teradata JDBC drivers seem to offer no support for port settings. Please leave the port field blank, otherwise a connection cannot be established. 
  5. The driver setup should look similar to the setup shown in this image:

 

  1. Teradata_connection.png

 

How to Limit the Ressources Taken by RapidMiner Studio

by RMStaff on ‎09-08-2016 05:55 AM

RapidMiner Studio usually takes as much ressources (threads and RAM) as it can (given the license and hardware). In some cases you would like to restrict the amount of memory taken by Studio.

 

Limit the Number of Threads

To limit the number of cpu threads used by RapidMiner you can go into the Preferences via Settings -> Preferences. In the General tab you can find a entry Number of threads in the Miscellaneous category. Change this setting to restrict the number of threads.

 

threads.png

 

 

 

 

 

Limit the Amount of RAM used

 

You can also limit the amount of ram used. This is also possible in the Preferences but in the System tab. You need to scroll down to Miscellaneous again to find the option Maximum amout of memory. You can enter a number in MB there.

 

memory.png

 

 

 

How to Export the Table of Linear Regression

by RMStaff on ‎09-06-2016 09:49 AM

Question

Linear Regression creates a very nice table with t-Statistics, p-Value etc. Is there any way to export this?

Answer

There is currently (RM 7.2) no operator to convert this, but you can use the execute script below to do it by hand. The attached process demonstrates how to use this on the golf sonar data.

We recommend to add this to your building blocks if you use it regulary.

My Decision Tree Shows only one Node

by RMStaff on ‎09-05-2016 05:39 AM - edited on ‎09-06-2016 04:42 AM by Community Manager

Question

I've taught a decision tree, but i do not get a real tree, but only one node/leaf. Why?

Answer

Decision Trees are known to be vulnerable against overtraining/overfitting. To prevent your tree becoming overtrained they have the option to  (pre)prune themselves. Pruning is a way to cut away leaves which are not statistically meaningful, after the tree was built. Prepruning prevents that such leaves being built at all.

 

If you have only one node in your tree, it is very likely that the standard pruning options are preventing the tree growing. A drastic way to change this is to deactivate pruning and prepruning. Another way is to loosen the requierements for the cuts.

 

The two most important settings here are:

 

minimal_gain: specifies how good a cut needs to be that it is really executed. A 0 means all cuts are made while a 1 means only cuts which purify the two leafs are executed. The standard setting of 0.1 is a hard requierement. Common values for minimal gain are between 0.1 - 0.001.

 

confidence: This is a statistical measure based on binominal distribution which branches should be pruned away after building the tree. 0.25 is a reasonable number, but you might reduce it further to have a bigger tree.

 

As usual you should use proper validation (e.g X-Validation with a hold-out sample) to measure performance.

How to Use Processes Like Functions

by RMStaff on ‎08-26-2016 08:22 AM

Question

I have a process and want to run it with various settings, how can i do this without copying the processes?

Answer

The answer is to use Execute Process in combination with macros. If you are not familiar with macros, please read this article on macros.

 

As a first step you take your process and define a macro in the context panel. Use this macro to set the setting you would like to iterate on. Afterwards you can create a (meta) process where you embed your process you want to run more than once. You can simply drag and drop the process into the new process or use the Excecute Process operator.

 

After doing so you can hand over the macros to the process using the macros setting in the parameters of execute process.

 

To run the process more than once you can either use several execute process or combine this with a loop.

If Statements - Getting Started with the Rapidminer Expression Editor

by Moderator ‎07-24-2016 05:45 PM - edited ‎08-24-2016 02:52 PM

Where can I input a function??

 

Being able to generate attributes based on your own specifications and logic is one of the more important things to get started with in Rapidminer. The expression editor allows for a lot of logical coding, arithmetic, and even text searching.  

 

ExpressionEditorFunction.PNG

 

 

Lets start with a simple if then statement: 

 

Using the Titanic sample data included in your Rapidminer Studio we want to add an attribute called Age_Status.  We will create two categories, minor and adult.  

 

Inside of the Generate Attributes operator you will see "Edit List" this is your list of attributes that you would like to add.  Under "function expressions" is our expression editor.  

 

 

 

 

 

 

 

 

PropedMeta.PNG 

 

 

 

 

 


When you pass information in the generate attribute expression it will actually propagate into the expression editor to allow for easy input of your attribute names.

 

 

 

 

 

 

 

 

SearchForIf.PNG

 

Don't be afraid to search!

Whether it be for inputs or function types, search is your friend.

 

We want an if statement - so lets look for if

Searching for If gives us the logical if statements as well as other pertinent operators.  

 

 

 

 

IfStatement.PNG

Next to each entry inside of our function list there is an (i) icon - selecting it will expand the information that you are interested and will give you an example of how to apply that function to your own operators

 

 

 

IfStatement1.PNG

 

 

For our example we want to make the Age_Status equal to Minor if it is not less than 18.  If it is greater than 18 Age_Status should return Adult.  

 

As you can see the "<" will deliver true if the first term is less than the second. This can be a static comparison or it can be a comparison to an attribute.  For this example we are comparing it to "18"

 

 

 

 

 

 

When you get your function correct syntactically, it will tell you in the info message.  Below is our completed expression.

 

 

IfStatement3.PNG

 

 Learn more about our Generate Attributes operator here: Generate Attributes

Feature Selection mandatory columns

by RMStaff on ‎08-23-2016 11:38 AM

 

 

RapidMiner provides various feature selection techniques like forward selection, backward elimination, weight guided, evolutionary etc.

Very rarely there is a need  to incorporate certain set of features(columns/attributes) always when you are trying various combination. This article demonstrates one of the ways to always have certain set of columns as part of feature selection.

 

Supposed you had columns like this and you wanted to ensure that columns a1 and a2 are always considered during your optimization steps.

 

2016-08-23 16_26_24-Settings.png

To force RapidMiner workflow to do so, we can use the Set Role operator to let the optimization step ignore it first and then during the model building reincorporate it first

 

We will introduce a set role operator just outside the optimization step like seen below

2016-08-23 16_29_23-Settings.png

 

Then in the parameter section we will select attribute name a1 and type in target role with any arbitrary string (Ignoreme in the screen shot).

If you have additional columns that you want to always use, then you can specify them using the set additional roles dialog.

2016-08-23 16_31_37-Settings.png

Please note that the target role used is a different string. So you will need to come up wiht unique string for each column, simple solution will be to use ignoreme1, ignoreme2 , igmoreme3 .and so on

 

By setting up this meta data the optimization step basically always ignore this column, however the model operators etc will also ignore it.

 

Hence to counter this effect we need to add an additional step inside the "Optimize" operator.

We will add an additional Set Role inside the optimize step

2016-08-23 16_35_00-Settings.png

And then change the role back to regular for the two attributes that we had given special role earlier.

As the data moves to the validation step, it will be included in the model building as well as validation step.

 

Please find attached example process too.

Hopefully you find this article helpful, Feel free to post comments or questions on community regarding these or other topics.

 

 

 

 

 

 

 

 

How to Use Macros

by RMStaff ‎08-19-2016 05:04 AM - edited ‎08-23-2016 10:53 AM

In complex processes or projects with several processes, you often require to parametrize them using variables. Process variables in RapidMiner are called Macros. Macros are a powerful asset, which can be used to fully operationalize your analytics processes. Macros store what can be called primitive types. You can also store objects running through a connection. This can typically be done using Remember and Recall operators.

 

How to Set Macros

In general, there are two ways to set macros. The first way is using the context panel, the other is using operators.

The Context Panel

You can activate the panel by going to view->show panel and activating it. A common place to place this panel is next to the Parameter panel. In the context panel you can set new macros, by clicking on the small "+" button.

Macros1.png

If you think about a single process like a programming function, this panel gives you the options to define the arguments of the function.

 

Best Practice: As a best practice we recommend to use a small letter in the beginning macros and than camel case, to identify macros easier.

 

Generating Macros using Operators

 

Besides setting macros in the context panel, you can also set and modify macros. If you search for Macro in the operator tree you can see a few operators handling macros. We will discuss the three most important ones.

Macros2.png

Set Macro sets a macro very similar to the context menu to a constant value.

 

Generate Macro gives you the option to generate a macro with the interface you know from Generate Attributes. Using this operator you are for example able to generate a macro based on the current date (using the date_now() function).

 

Extract Macro extracts a macro from an example set. Often used options are to extract the number of examples of an example set, statistics like an average or a maximum or even single cell values of your example set.

How to Use Macros in General Operators

To use a macro anywhere in your process you can type %{myMacro}, which will be replaced by the current value of the macro. This is a real direct replacement and works in any value field in your process.

Macros3.png

 

 

How to Use Macros in Generate Attributes

In Generate Attributes and Generate Macros you have more options than just the %-Notation. Namely:

 

%-Notation

 

%{myMacro} inserts the current macro value as a string. If you have a string like foo stored in your macro you can do operations like

 

concat(%{myMacro},"bar"}

prefix(%{myMacro,1}

 

and so on. Keep in mind that you always interpret it as a string. If you store a 1 in you macro

 

concat(%{myMacro},"bar")

 

returns you 1bar.  Operations like

 

%{myMacro} + 1

 

do not work.

 

Eval

The eval() evaluates the string of myMacro. If you have a 1 stored in your Macro you can do

 

eval(%{myMacro}+1

 

you get a two.

 

You can also put whole equations into the macro. If you store a sqrt(2) in you macro and calculate

 

eval(%{myMacro})

 

you get back a 1.41....

 

#-Notation

 The #{attribute_macro} notation is in principle a shortcut for writting eval(%{attribute_macro}), which allows you to access the values of a given attribute.
But there are two importantant difference between the two:
* #{} will fail when the macro does not contain a valid attribute name
on the otherhand
* eval(%{attribute_macro}) will evaluate whatever is contained in the macro, which might fail e.g., if the attribute name conatins a "-"

 

The difference between the notations are shown in this process:

 

Spoiler
<?xml version="1.0" encoding="UTF-8"?><process version="7.2.002-SNAPSHOT">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.2.002-SNAPSHOT" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.2.002-SNAPSHOT" expanded="true" height="68" name="Labor-Negotiations" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
</operator>
<operator activated="true" class="set_macros" compatibility="7.2.002-SNAPSHOT" expanded="true" height="82" name="Set Macros" width="90" x="179" y="34">
<list key="macros">
<parameter key="attribute_macro" value="working-hours"/>
</list>
</operator>
<operator activated="true" breakpoints="after" class="select_attributes" compatibility="7.2.002-SNAPSHOT" expanded="true" height="82" name="Select Attributes" width="90" x="380" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="working-hours"/>
</operator>
<operator activated="true" class="multiply" compatibility="7.2.002-SNAPSHOT" expanded="true" height="103" name="Multiply" width="90" x="514" y="34"/>
<operator activated="true" class="handle_exception" compatibility="7.2.002-SNAPSHOT" expanded="true" height="82" name="Handle Exception" width="90" x="715" y="136">
<parameter key="exception_macro" value="execpt"/>
<process expanded="true">
<operator activated="true" class="generate_attributes" compatibility="7.2.002-SNAPSHOT" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="246" y="34">
<list key="function_descriptions">
<parameter key="res1" value="#{attribute_macro} +4"/>
<parameter key="res2" value="eval(%{attribute_macro}) +4"/>
</list>
</operator>
<connect from_port="in 1" to_op="Generate Attributes (4)" to_port="example set input"/>
<connect from_op="Generate Attributes (4)" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="generate_attributes" compatibility="7.2.002-SNAPSHOT" expanded="true" height="82" name="Generate Attributes (5)" width="90" x="246" y="34">
<list key="function_descriptions">
<parameter key="res2" value="%{execpt}"/>
</list>
</operator>
<connect from_port="in 1" to_op="Generate Attributes (5)" to_port="example set input"/>
<connect from_op="Generate Attributes (5)" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.2.002-SNAPSHOT" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="715" y="34">
<list key="function_descriptions">
<parameter key="res1" value="#{attribute_macro} +4"/>
</list>
</operator>
<connect from_op="Labor-Negotiations" from_port="output" to_op="Set Macros" to_port="through 1"/>
<connect from_op="Set Macros" from_port="through 1" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Generate Attributes (3)" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_op="Handle Exception" to_port="in 1"/>
<connect from_op="Handle Exception" from_port="out 1" to_port="result 2"/>
<connect from_op="Generate Attributes (3)" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Add Comment Collap

 

Macros and Execute Process

You can use Macros to parametrize ex

Provided Macros

There are some macros already present which can be used throughout the process:

  • %{process_name}: will be replaced by the name of the process (without path and extension)
  • %{process_file}: will be replaced by the file name of the process (with extension)
  • %{process_path}: will be replaced by the complete absolute path of the process file
  • %{execution_count}: will be replaced by the number of times the current operator was applied.
  • %{operator_name}: will be replaced by the name of the current operator.
  • %{t}: will be replaced by the current time

Advanced Use Cases for Macros

 

 

 

How to Share Repositories

by RMStaff on ‎07-29-2016 04:03 AM - edited on ‎08-09-2016 11:46 AM by RMStaff

Question

RapidMiner Repositories are a versatile and useful place to store things. With a RapidMiner Server you can easily share your processes, data, models and many more. From time to time you also want to sent around a full repository as a zip file. This article will show you how to do this.

Answer

Sending it

Local repositories are always flat files. Those files are stored in your hard disc. The default location of the repositories is in your user folder at .RapidMiner/repositories. If you are using Window the full default path is

 

C:\Users\$USERNAME\.RapidMiner\repositories\

 

using macOS or linux it is

 

/home/$USERNAME/.RapidMiner/repositories/

 

Please be aware that this folder is hidden on Linux and macOS.

 

To share the repository of choice, you can simply zip it and sent it to your colleague.

 

If you are not using the default location, simply right click on the repository name and select on the "open in file browser " like shown below. You can then zip the contents of this folder and share it.

 

2016-08-09 16_43_28-C__!!PreSales_customers_firstam_clean process.rmp – RapidMiner Studio Developer .png

 

 

 

Adding it to your local RapidMiner Studio

 

To add the repository you need to unzip it again. Afterwards you can go back to RapidMiner Studio and click on Create Repositories :

 

Create Repo.png

 

You need to choose "New Local Repository" in the opening dialogue:

 

 Dialogue1.png

 

and choose the path to the unzipped directory:

 Dialogue2.png

Note that the Alias can be anything you like.

 

 That's it. Happy Mining!

Capture intermediate results during optimization

by RMStaff ‎07-16-2016 11:06 AM - edited ‎07-27-2016 06:55 AM

Rapidminer provides various optimization techniques including for Paramter Optimization, Feature Selection & Feature Generation. Rapidminer automatically delivers the best models, best parameter sets as output of these operators in most cases, however sometime the user may want to capture the intermediatary results that were obtained after every iteration. This is easily possible with the Log operator.

 

In case you are looking to actuall save intermediate models and results look at the discussion here http://community.rapidminer.com/t5/RapidMiner-Studio/Backup-and-retrieve-modeling-Training-status/m-... 

 

 

The following example shows how to capture several key parameters during "Parameter Optimization" 

The overall process is attached and the top level Process looks like below 

optimization logs.png

The optimization operator is trying various combinations of C & gamma values, for the SVM learner as you will see in the settings by clicking on "Edit Parameter Settings"

optimization logs inside.png

As you notice we are spliting the data sets into two parts (50%) each and then building model on one and testing on the other.

 

The key operator where we capture results at end of every combination is in the log operator. By clicking on the "Edit List" the user can specify what metrics to capture.

log settings.png

Once the process is completed, you should see an output tab called as Log with results like below. These are the intermediate results after every iterations.

log output.png

 

As you will notice on the log operator you can also write these outputs to a file, by specifying the "filename" parameter. 

Additonally if you need to store intermediate models or performance vectors you can also save them as needed using the store operator and dynamically changing paths using macros.

 

Parameterizing SQL Query

by RMStaff ‎07-21-2016 06:59 AM - edited ‎07-21-2016 07:01 AM

RapidMiner read database operator provides ability to read data from most data sources that provide JDBC/ODBC connectivity. It is also possible to parametrize the queries using RapidMiner  Macros.The video shows a simple example of using RapidMiner  macros along with  "Read Database" operator. The operation also supports use of "prepared queries" .


The following video demonstrates the use of macros to parametrize queries.

 

How To Interpret the Results of Create Association Rules

by RMStaff on ‎07-19-2016 05:39 AM

Question

The Create Association Rules Operator is creating various statistical measures on the rules. What does they tell me?

Answer

The most important criteria are already documented in the operators help

  • confidence: The confidence of a rule is defined conf(X implies Y) = supp(X ∪Y)/supp(X) . Be careful when reading the expression: here supp(X∪Y) means "support for occurrences of transactions where X and Y both appear", not "support for occurrences of transactions where either X or Y appears". Confidence ranges from 0 to 1. Confidence is an estimate of Pr(Y | X), the probability of observing Y given X. The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset.
  • lift: The lift of a rule is defined as lift(X implies Y) = supp(X ∪ Y)/((supp(Y) x supp(X)) or the ratio of the observed support to that expected if X and Y were independent. Lift can also be defined as lift(X implies Y) =conf(X implies Y)/supp(Y). Lift measures how far from independence are X and Y. It ranges within 0 to positive infinity. Values close to 1 imply that X and Y are independent and the rule is not interesting.
  • conviction: conviction is sensitive to rule direction i.e. conv(X implies Y) is not same as conv(Y implies X). Conviction is somewhat inspired in the logical definition of implication and attempts to measure the degree of implication of a rule. Conviction is defined as conv(X implies Y) =(1 - supp(Y))/(1 - conf(X implies Y))

There is a great paper available on http://www4.di.uminho.pt which explains all parameters in depth. The metric called PS (for Piatesky-Shaprio) is called leverage in the document.

Writing Association Rules to Exampleset or file

by RMStaff on ‎07-12-2016 05:21 AM - edited on ‎07-15-2016 08:53 AM by Community Manager

RapidMiner provides the ability to mine frequent itemsets as well as "Generate Association rules" from frequent item sets. Sometimes there is a need to export the association rules to datasets, files or other formats, but since assosciation rules are not example set, you need to work around to convert it to exampleset to write into other formats.

 

The attached example shows how to do it.

The key thing here is we use the RapidMiner "write" operator. Most objects in RapidMiner are XML's under the hood, and hence we can use the "Read XML" operator to parse the object and convert it to data set. We also utilized the "Import Configuration Wizard" for the read xml to find the right nodes in the XML to parse.

association.png

 

Once we read from XML, we then rename the columns to make it easier for follow on processing.

 

Getting the table via Execute Script

Another option to convert the rules item is to use an Execute Script operator. This operator gets a groovy script to convert the rules to a table.

The script looks like this.

 

 

import com.rapidminer.tools.Ontology;

import com.rapidminer.operator.learner.associations.*;


AssociationRules rules = input[0];



// construct attribute set
Attribute[] attributes= new Attribute[11];
attributes[0] = AttributeFactory.createAttribute("Premise", Ontology.STRING);

attributes[1] = AttributeFactory.createAttribute("Premise Items", Ontology.INTEGER);
attributes[2] = AttributeFactory.createAttribute("Conclusion", Ontology.STRING);
attributes[3] = AttributeFactory.createAttribute("Conclusion Items", Ontology.INTEGER);
attributes[4] = AttributeFactory.createAttribute("Confidence", Ontology.REAL);
attributes[5] = AttributeFactory.createAttribute("Conviction", Ontology.REAL);
attributes[6] = AttributeFactory.createAttribute("Gain", Ontology.REAL);
attributes[7] = AttributeFactory.createAttribute("Laplace", Ontology.REAL);

attributes[8] = AttributeFactory.createAttribute("Lift", Ontology.REAL);
attributes[9] = AttributeFactory.createAttribute("Ps", Ontology.REAL);


attributes[10] = AttributeFactory.createAttribute("Total Support", Ontology.REAL);



MemoryExampleTable table = new MemoryExampleTable(attributes);
DataRowFactory ROW_FACTORY = new DataRowFactory(0);

String[] strings= new String[11];

for (AssociationRule rule : rules) {
		// construct example data
        strings[0]=rule.toPremiseString();
        strings[1]=rule.premise.size().toString();
        strings[2]=rule.toConclusionString();
        strings[3]=rule.conclusion.size().toString();
        strings[4]=rule.getConfidence().toString();
        strings[5]=rule.getConviction().toString();
        strings[6]=rule.getGain().toString();
        strings[7]=rule.getLaplace().toString();
        strings[8]=rule.getLift().toString();

        strings[9]=rule.getPs().toString();
        strings[10]=rule.getTotalSupport().toString();

        // make and add row
        DataRow row = ROW_FACTORY.create(strings, attributes); 
        table.addDataRow(row);	
		}

ExampleSet exampleSet = table.createExampleSet();
return exampleSet;

 The resulting process looks like thisToData.png

 

 

Install Extensions Manually for RapidMiner Studio

by RMStaff ‎07-12-2016 04:59 AM - edited ‎07-12-2016 05:01 AM

RapidMiner studio provides a very easy way to connect and download extensions from the marketplace. This is a recommended way and the steps are detailed here http://docs.rapidminer.com/studio/installation/adding-extensions.html

 

In certain cases like when extensions are delivered from third parties, who are not leveraging marketplace or access to market place is restricted due to proxy settings etc, it may be necessary to install extensions in an alternative fashion.

 

The follow article describes the steps you need to install without going thru the marketplace wizard.

    • Thru your browser go to http://marketplace.rapidminer.com/
    • Register and Login on the website (The login link is the at the top right corner)
    • Once logged in, search for the extension using the search dialog at the top or browser thru the links provided

marketplace.png

    • One you find the extension of choice, you should see a download link,
    • When prompted read thru the license agreement and if you agree check the "I have read...." checkbox
    • A download link will appear,  click on that
    • In most cases this will be a jar file. 
    • One the download of jar file is complete, you can then copy to <rapidminer home>/extensions folder on your desktop.
    • RapidMiner home folder is typically C:\Users\<username>\.RapidMiner\extensions\ on windows machine. and similar structure under Linux machine under user/usr  folders.
    • Once the files are copied, restart your RapidMiner Studio.
    • Verify that the extension is available in the right place. For e.g if you expect new operators they should be available under the Extensions grouping in the operator panel

extension dialog.png

Cross- Validation of R, Python Models

by RMStaff on ‎07-11-2016 06:14 AM

 

RapidMiner provides ability to work with many learners in a code-free manner. However if needed one can also extend the capabilities of RapidMiner  using R & Python scripts. Even when using R or Python many of core RapidMiner's operators can still be utilized to get the best of the both worlds.

 

The following example showcases how to use Cross-Validation operator and performance operators along with R Script. Please use the attached example process file to try it yourself

 

Step 1 ) Prepare your data and pass it to "tra" port on the cross validation operator as seen below

process.png

 

As you may be aware inside the X-validation operator has two sub process. We will write our Model Training script in the training part using the execute R operator. On the testing part we will write the "Apply model" R script and use a performance operator from RapidMiner  to capture performance.

 

 

inside cross validation.png

 

 

Please note that the training script returns a model, that is passed along to the "mod" part. It is then passed to the second testing side to the first "inp" port. When using the R Script operator, the parameters are passed by order in which they are connected.

So you will notice that on training side, since we are passing model first, and then 'tes" which is your test data, the script needs to receive them in the same order. Hence the first parameter in rm_main on testing side is model and second is the data.

 

You are then using the regular RapidMiner  performance operator to capture the performance.

 

 

The training and test scripts looks like below

Training

training script.png

 

Testing

testing scripts.png

Similar techniques can be applied to combine scripting operators with other operators.

 

 

Rapidminer Date Time formatting with the Nominal to Date Operator

by Moderator ‎07-08-2016 11:52 AM - edited ‎07-08-2016 11:54 AM

What on earth is “MM dd, yyyy”?

 

Parsing your date and/or times from the default nominal text string can be a daunting task.  It doesn’t have to be.

 

Using our Nominal to Date operator can be very simple if you know where to start.

The Nominal to Date operator converts the selected nominal attribute of the input ExampleSet into the selected date and/or time type. The attribute is selected by the attribute name parameter. The type of the resultant date and/or time attribute is specified by the date type parameter. The nominal values are transformed into date and/or time values. This conversion is done with respect to the specified date format string that is specified by the date format parameter.

It is important to note that the old nominal attribute will be removed and replaced by a new date and/or time attribute if the keep old attribute parameter is not set to true.

 

The given date and time are 2001-07-04 12:08:56 local time in the U.S. Pacific Time time zone, here are some examples for parsing the ways that date can be formatted in your data set  

  • 'yyyy.MM.dd G 'at' HH:mm:ss z': 2001.07.04 AD at 12:08:56 PDT
  • 'EEE, MMM d, yy': Wed, Jul 4, '01
  • 'h:mm a': 12:08 PM
  • 'hh 'oclock' a, zzzz': 12 oclock PM, Pacific Daylight Time
  • 'K:mm a, z': 0:08 PM, PDT
  • 'yyyy.MMMMM.dd GGG hh:mm aaa': 2001.July.04 AD 12:08 PM
  • 'EEE, d MMM yyyy HH:mm:ss Z': Wed, 4 Jul 2001 12:08:56 -0700

 

To learn more about the Nominal to Date Operator check out our documentation site - Nominal To Date Operator Documentation

Manage Rapidminer Wisdom of Crowds

by RMStaff on ‎06-15-2016 07:10 AM - edited on ‎07-07-2016 06:50 AM by Community Manager

There are more than 250,000 RapidMiner users worldwide, brilliant minds who have had tasks similar to yours – building some analytical process to come up with a predictive model to support business, just as an example. You don’t know any of them? No worries, that’s where technology innovation comes in: just like Amazon shows recommendations for products based on what other people have bought, RapidMiner delivers recommendations to you on what to do next based on what other RapidMiner users have done in similar situations.

 

Since version 6.1, RapidMiner Studio has shown recommendations on what operator to add to your process as a next step. This was only the beginning as we have committed to adding more and more features that leverage the Wisdom of Crowds, i.e. the knowledge, experiences and best practices of RapidMiner users. We continue on that path in the next release of RapidMiner Studio which will feature an improvement of the operator recommender introduced in 6.1. By focusing on the particular part of the process a user is working on, we have drastically improved the recommendations. The recommendations are not only extremely helpful for beginners but also for advanced users allowing them to build their processes much faster. In addition, we have also compiled a completely new feature that guides users in setting the parameters for an operator by recommending what others have specified.

 

However for whatever reasons you do not wish to leverage the community, there is an option to opt-out of this program.

The steps below describe how you can disable Wisdom of Crowds

 

  1. From your Studio Client go to Settings Menu
  2. Click on Preferences
  3. In the preferences Dialogs switch to "Recommender" Tab (Highlighted in Square Below)
  4. Uncheck the "Enable Operator Recommendations" (1 below) to disable operator Recommendations
  5. Uncheck the "Operator Recommender initiliazed" (2 below) which controls if "Wisdom of Crowds" has been initialized.
      1. Leaving it checked will then not prompt you to activate again.

    disablewisdowofcrowds.png

How To Use Building Blocks

by RMStaff ‎07-01-2016 06:53 AM - edited ‎07-01-2016 07:07 AM

In your everyday life as a data scientists and RapidMiner user you will encounter several tasks which you need to do frequently. To make your life easier you can use building blocks for those actions.

 

Building blocks allow you to store one preconfigured operator (e.g. a Subprocess operator with some contents) as a kind of template, that you can insert into another process. It is a little bit like copy/paste, just that what you copy can be accessed any time again.

 

Creating a Building Block

 To create a building block you can simply right click on the operator you would like to turn into a building block. If you would like to use more than one operator in the building block you can group them together using a Subprocess operator.

1.png

After right clicking you can choose Save as Building Block and give it a name and a description

2.png

After clicking on OK you are already done.

 

Expert Tip: The building blocks are stored in your .RapidMiner folder as XML and can be shared by sharing the XMLs.

Using a Building Block

 If you would like to use the building block in the process you can either hit ctrl+b or go to Edit->Insert Building Block. In the resulting window you simply choose your building block and hit OK

3.png

Calling a Process like a Function


If you want to "call" a process from another process, use the Execute Process operator. You can even pass data and macros around.
To pass data: on the inner process, connect the process input ports on the *right* of the process to get input, and connect those on the *left* to pass output to the calling process.
In the outer process, connect the data to the ports as you would do with any other operator.

 

Change the Execution Order of Processes

by RMStaff on ‎06-30-2016 10:49 AM

Question

RapidMiner is always executing one operator at a time. How can I change the order?

Answer

Changing the exectution order is usually not necessary. There are only a few cases where you need to do it:

 

  • An operator needs the result of a former operator which cannot or should not be connected (e.g. Remember or Extract Macro)
  • You would like to inspect the result of one operation first (using e.g. breakpoints)


To do this you can click on the small blue icon on the upper right most edge of the Process Panel

1.png

Once you clicked on it you see the real execution ordering. You can not right click on the numbers

3.png

to make the execution of the operator as early as possible.

 

You can also change the ordering manually by clicking on the first operator (here: Multiply) and then on the operator which should be excecuted next (here: Validation).

Text Mining and the Word List

by RMStaff on ‎06-28-2016 07:22 AM

Symptoms

Using Process Documents (from Data) you are able to generate a tokenized example set from a given set of documents. If you use one Process documents for your training and another for the testing you might get the error incompatible number of attributes if you apply the model.

Diagnosis

The problem probably that you did not transfer the word list from one Process Documents to the other. The wordlist contains mainly two information:

  •  Which attribute to generate
  • The normalization

If you do not transfer the wordlist over, words which do not occur in your document won't create a attribute. In case of pruning different words will be deleted from your bag of words. Another effect is of course that even if you create the same attributes, your normalization (of TF/IDF) might be different.

 

Solution

 Wordlist.png

Transfer over the wordlist created in your training stream over to the application stream. Thus you create

How do I configure Mac OS to use Java 7 instead of Java 6

by RMSupport on ‎05-26-2016 06:03 AM - edited on ‎06-23-2016 10:51 AM by Moderator

Important: This article is only relevant for old RapidMiner Studio versions which were not yet bundled as an App!

 

Problem:

After installing Java 7 there are at least two Java installations available on Mac OS. When running Java the operating system follows a link to the version currently marked as default. In some cases this link still points to the Java 6 installation despite having a functional Java 7 installation.

Solution:

Open a terminal and navigate to the directory /System/Library/Frameworks/JavaVM.framework/Versions by typing:

      cd /System/Library/Frameworks/JavaVM.framework/Versions
    

Make sure that you have a valid Java 7 installation before changing the link to the current version. You can do this by typing:

      ./A/Commands/java -version
    

The output should start with “java version 1.7…“ . If it does not, do not continue with the following steps but rather see “What is Java and how do I install it?” for instructions on how to install Java 7 on Mac.

To modify the link to point at the Java 7 installation type in the following two commands (you may have to enter your password):

      sudo mv Current CurrentBackUp    
sudo ln -s A Current
    

Mac OS should now use Java 7 by default.

Hint: To switch back to Java 6 type in:

      sudo mv CurrentBackUp Current
    

 

Where do I find log files in case there is a problem with RapidMiner Studio

by RMSupport on ‎05-26-2016 06:03 AM - edited on ‎06-13-2016 10:50 AM by RMStaff

To find log files for RapidMiner Studio, follow these steps:

  1. Go to your user home folder and switch to the /.RapidMiner subfolder.
    1. On a windows machine this will be typically C:\Users\<username\.Rapidminer
  2. Open the rapidminer-studio.log file. It is generated each time RapidMiner Studio is started. The file for the pre-startup phase is called launcher.log .
  3. Please note that there will also be some files like rapidminer-studio.log.1, rapidminer-studio.log.2 These are old historical files.

 

You can also view most of the logs generated by studio directly within the studio interface. 

To find logs from Studio Interface,

  1. click on the View Menu. 
  2. Then click on the "Show Panel" menu.
  3. Select "Logs" under the list of available views.

You can then drag and dock the view to appropriate location that suits your style.

 The context menu of the Log panel provides additional capabilities like changing log level, clearing exiting log or saving log to files.

Importing Large Data Sets into the Cloud

by RMSupport on ‎05-26-2016 06:04 AM

When reading in large files to the cloud inside of a process you may run out of the local memory on your machine which will cause the reading of this data to fail.  You may see this error " main memory limit reached This is happening because the system is loading the file into local memory before streaming it to the cloud.

This is not the way this process should be completed, instead you should load the file into your cloud repository manually. 

This is can be done by selecting file > import data> import whichever type is more appropriate

The use of the import wizard should be the same until Step 5 where the cloud repository should be selected instead of the local.

Once the data is in the cloud repository you will be able to execute processes without error.

What is Java and how do I install it

by RMSupport on ‎05-26-2016 06:03 AM

Problem:

RapidMiner Studio 6.x depends on the Java Runtime Environment (JRE) version 7. On Windows, it is shipped with RapidMiner Studio, on other operating systems you have to install it manually (unless it is shipped with your operating system).

Solution:

It is sufficient to download the Java Runtime Environment version 7. The development kit is not required.

Downloads are available for all major operating systems, and it is possible to install both versions on the same machine. Note that the Windows version of RapidMiner ships its own copy of the Java 7 runtime, so no installation is required there.

Most Linux distributions come with recent Java versions. E.g., on Ubuntu you can install them with

      sudo apt-get install openjdk-7-jre
    

If both versions are installed, the default can be configured by using

      sudo update-alternatives --config java
    

After the installation, the java executable should be on the pat, i.e. if you type

      java -version
    

on the command prompt you should see the installed version number.
In order to tell RapidMiner Studio where Java is installed, set the environment variable JAVA_HOME to point to your Java installation directory. You can set this variable globally or inside the start scripts. If you cannot set it globally, go to your RapidMiner Studio installation folder and follow these steps:

  1. Edit the file RapidMiner-Studio.sh

  2. Remove/Comment out everything in the Searching for Java block

  3. add the following line instead (adapt folder to your desired java installation):

              JAVA="/opt/bin/java7"
            
  4. Start RapidMiner Studio again via this script.

How do I upgrade from RapidMiner 5 to RapidMiner Studio 6 and keep my settings

by RMSupport on ‎05-26-2016 06:03 AM

Problem:

It is not possible to update directly from RapidMiner 5 to RapidMiner Studio 6, so you need to download it from the website and install it separately. Also RapidMiner Studio 6 uses a new folder to store settings and default repositories.

Solution:

All processes created with RapidMiner 5 can still be used in RapidMiner Studio 6. However, RapidMiner Studio 6 reads configuration settings from a different location than RapidMiner 5 does. Therefore, when starting RapidMiner Studio 6, you will not see your old repositories, preferences, and other configuration options.

To keep your old settings, copy the entire folder .RapidMiner5 to .RapidMiner in your user folder.

To restore your RapidMiner 5 repositories, follow these steps:

  1. Locate the folders .RapidMiner5 and .RapidMiner in your user folder. The first is used by RapidMiner 5 whereas RapidMiner Studio 6 uses the second.

  2. Move or copy the repository folders in .RapidMiner5/repositories/ to a place where you want to store data, e.g. .RapidMiner/repositories/.

  3. Start RapidMiner 6 Studio.

  4. In the toolbar of the “Repositories” view, click “Add Repository” (first icon, showing a server with a “+“).

  5. In the dialog, select “New local repository” and press “Next”.

  6. Assign a name in the “Alias” text field.

  7. Uncheck “Use standard location” (the standard location is inside the folder .RapidMiner6/repositories).

  8. Select the location where you copied your repository and click Finish.

Repeat steps 4-8 for each repository you want to re-add.

How do I start RapidMiner Studio regardless, even if the .exe and the scripts fail

by RMSupport on ‎05-26-2016 06:03 AM

To start RapidMiner Studio regardless, you can open the command line, go to the location where you installed RapidMiner Studio, go to the /lib folder and type in:

      java -jar launcher.jar
    

Note that if using this method to start RapidMiner Studio, updating via the marketplace in the application will never work. Also note that this may allocate not enough / too much main memory for the current license, so possibly

      -Xmx1024m
    

needs to be added behind the above command to specify the maximum amount of main memory RapidMiner Studio will use in megabytes.

Furthermore, if using a start script, updating RapidMiner Studio will not work if it is installed in a protected location on Windows like C:\Program Files.

What is the Safe Mode

by RMSupport on ‎05-26-2016 06:03 AM

RapidMiner Studio prompts the user whether to start in Safe Mode if the last startup failed. Safe Mode disables all extensions in the hope that RapidMiner Studio can successfully start without them. If starting RapidMiner Studio fails without Safe Mode, backup all extensions (see “Where are plugins for RapidMiner Studio stored?” for the two possible locations of extensions) and move them to a backup folder. Then re-add them one by one and start RapidMiner Studio each time to see when it fails to identify the broken extension.

Why does RapidMiner Studio fail to update

by RMSupport on ‎05-26-2016 06:03 AM

Problem:

RapidMiner Studio needs correct privileges to access the RapidMiner Studio installation folder and it must have been started with one of the GUI scripts or the .exe file.

Solution:

To correctly update RapidMiner Studio, follow these steps:

On Windows:

  1. Start RapidMiner Studio via the RapidMiner Studio.exe and download the update.

  2. Restart RapidMiner Studio when prompted to do so.

  3. Grant admin privileges to the updater when prompted

If using the RapidMiner-Studio.bat instead, make sure RapidMiner Studio is not installed in a protected location like C:\Program Files, otherwise the update will always fail.

On Linux:

  1. Start RapidMiner Studio via the RapidMiner-Studio.sh script and download the update.

  2. Restart RapidMiner Studio when prompted to do so.

On Mac:

  1. Start RapidMiner Studio via double-clicking “RapidMiner Studio.app” and download the update.

  2. Restart RapidMiner Studio when prompted to do so.

Note that the update will never work when launcher RapidMiner Studio via the rapidminer.jar or launcher.jar directly.

Why does RapidMiner Studio fail to start on Windows

by RMSupport on ‎05-26-2016 06:03 AM

Problem:

The automatically determined startup settings may be wrong or some other unknown problem exists.

Solution:

The exact cause may be difficult to determine because there is no debug output of the .exe itself. To try and locate the problem, follow these steps:

  1. Go to your user home folder and switch to the /.RapidMiner subfolder.

  2. Open the launcher.log file. It is generated each time prior to RapidMiner Studio startup.

  3. Check if there are any errors or the parameters given to the JVM are broken.

To start RapidMiner Studio regardless, you can use the RapidMiner-Studio.bat file. If that also fails, please start the .bat file from the command line to observe possible errors.

Note that if using said script, updating RapidMiner Studio will not work if it is installed in a protected location like C:\Program Files.

How do I use Windows authentication for Microsoft SQL Server

by RMSupport on ‎05-26-2016 06:03 AM

Microsoft SQL Server can use integrated Windows authentication to connect without explicitly specifying a username and password, but rather authenticating via the logged in Windows user. This authentication mechanism can also be configured in RapidMiner Studio, e.g. as a regular JDBC driver property, as is described in this article: “How do I configure properties of database connections defined in RapidMiner Studio?”

If the JTDS driver shipped with RapidMiner Studio for Microsoft SQL Server is used, the respective property name is integratedSecurity . This property must be set to “true”. Setting driver properties is possible via the “Advanced” button in RapidMiner Studio's connection editor.

In order for this functionality to work properly, a platform dependent native library file needs to be installed. It can be obtained from http://sourceforge.net/projects/jtds/files/jtds/. Download the ZIP file, locate the file x86/SSO/ntlmauth.dll or x64/SSO/ntlmauth.dll, depending on your architecture, and copy it to a directory on the path, i.e. listed in the Windows %PATH% environment variable. A good candidate is also the lib folder of your Java Runtime installation. If you cannot copy it into a directory on the %PATH%, you need to specifiy the location as parameter to the RapidMiner-Studio.bat startup script.

To do so, call the aforementioned .bat file with the following parameter:

      -Djava.library.path=C:\Your\Path\To\ntlmauth.dll
    

How do I use Windows authentication for Microsoft SQL Server

by RMSupport on ‎05-26-2016 06:03 AM

Symptom:

The authentication for MSSQL databases does not work.

Problem:

Microsoft SQL Server can use integrated Windows authentication to connect without explicitly specifying a username and password, but rather authenticating via the logged in Windows user. This authentication mechanism can also be configured in RapidMiner Studio.

Solution:

This behavior can be configured as a regular JDBC driver property, as is described in this article: “How do I configure properties of database connections defined in RapidMiner Studio?”

If the JTDS driver shipped with RapidMiner Studio for Microsoft SQL Server is used, the respective property name is integratedSecurity . This property must be set to “true”. Setting driver properties is possible via the “Advanced” button in RapidMiner Studio's connection editor.

In order for this functionality to work properly, a platform dependent native library file needs to be installed. It can be obtained from http://sourceforge.net/projects/jtds/files/jtds/. Download the ZIP file, locate the file x86/SSO/ntlmauth.dll or x64/SSO/ntlmauth.dll, depending on your architecture, and copy it to a directory on the path, i.e. listed in the Windows %PATH% environment variable. A good candidate is also the lib folder of your Java Runtime installation. If you cannot copy it into a directory on the %PATH%, you need to specifiy the location as parameter to the RapidMiner-Studio.bat startup script.

To do so, call the aforementioned .bat file with the following parameter:

      -Djava.library.path=C:\Your\Path\To\ntlmauth.dll
    

How do I start RapidMiner Studio without GUI in command line mode

by RMSupport on ‎05-26-2016 06:03 AM

RapidMiner Studio comes with a command line version which can be used to execute processes from the command line or from batch files (Windows). To do so, you need to have saved a working RapidMiner process in your repository. You can then open the command line, navigate to your RapidMiner Studio installation folder, and go to the scripts subfolder. Call the rapidminer-batch.bat / rapidminer-batch.sh file (.bat on Windows, otherwise .sh) and pass the absolute repository location of your process as an argument. See the following example to execute the process called “Testprocess” in the default “Local Repository” on Windows:

      Rapidminer-batch.bat "//Local Repository/Testprocess"
    

Using Store and Retrieve operators inside such a process provides a good basis for automation of process execution. For further automation and much greater flexibility, the usage of RapidMiner Server is necessary.

Which are the most important settings files

by RMSupport on ‎05-26-2016 06:03 AM

See “Where does RapidMiner Studio store its settings?” to find out where the settings files are kept.

Important files:

  • connections.xml � contains database entries

  • gui.properties � contains size and location of the main user interface window

  • jdbc_properties.xml � contains jdbc driver settings for custom databases

  • rapidminer-studio-settings.cfg � contains all settings which can be set via the preferences menu

  • repositories.xml � contains the location of all user defines repositories

  • secrets.xml � contains credentials for connections

Where are the RapidMiner Studio licenses stored

by RMSupport on ‎05-26-2016 06:03 AM

RapidMiner Studio stores the licenses inside the .RapidMiner/licenses/rapidminer-studio folder in the user home directory. Licenses are simply text files which contain the license string.

Note that the naming convention for them needs to be kept, otherwise RapidMiner Studio will not recognize the license file.

How do I configure properties of database connections defined in RapidMiner Studio

by RMSupport on ‎05-26-2016 06:03 AM

To define a database connection in RapidMiner Studio, go to Tools > Manage database connections and create a database connection. On the configuration panel, click on “Advanced” to see a list of available properties.

Please consult the documentation of your JDBC driver and database which properties can be configured and what meaning they have.

Where does RapidMiner Studio store its settings

by RMSupport on ‎05-26-2016 06:03 AM

RapidMiner Studio stores all settings inside .RapidMiner folder in the user home directory. To move RapidMiner Studio to a different computer, just create a backup of said folder and copy it to the other machine. Note that this folder also contains the licenses, so be mindful to not accidentally distribute them to other people.

This folder also contains the default location for repositories.

Why does RapidMiner Studio fail to start on Mac OS

by RMSupport on ‎05-26-2016 06:03 AM

Problem:

RapidMiner Studio 6.x depends on the Java programming language version 7 which the user has to install himself. This in turn requires at least Mac OS 10.7.3 or newer.

Solution:

For downloading and installing Java 7, see the following article: “What is Java and how do I install it?”
If you have Java 6 and 7 installed in parallel, make sure you are using the correct one. To that end, open a terminal and type

      java -version
    

Java 7 will report a version number of 1.7.0. You can then start RapidMiner Studio 6 by just double-clicking “RapidMiner Studio.app”. If that fails, try running RapidMiner-Studio.sh from a terminal. If you see the error message “Unsupported major.minor version 51.0”, this means that you are still using Java 6.

The GUI of RapidMiner Studio appears broken, what can I do

by RMSupport on ‎05-26-2016 06:03 AM

Sometimes it can happen that a perspective becomes broken. There are two ways to rectify that:

  • In RapidMiner Studio, go to the broken perspective, and select “View” -> “Restore default perspective” in the menu bar.
  • Go to the RapidMiner Studio settings folder (see “Where does RapidMiner Studio store its settings?” for the location) and delete the vlperspective-predefined-xyz.xml files and restart RapidMiner Studio.

How do I add database drivers for new databases in RapidMiner Studio

by RMSupport on ‎05-26-2016 06:03 AM

To add a driver for a database not supported out of the box, follow these steps:

  1. Download the JDBC driver .jar file for it first. Google should help in that regard.

  2. Place the file inside the RapidMiner Studio installation folder in the lib/jdbc subfolder.

  3. Start RapidMiner Studio, and go to “Tools” -> “Manage Database Drivers”.

  4. Click “Add” and enter the properties of that specific database (again, Google will help there). Point it to the driver .jar file you copied into the aforementioned folder and select the correct Driver class (most of the time, the automatically selected one should be correct).

  5. Restart RapidMiner Studio.

You should now see the driver as available in the “Tools” -> “Show Database Drivers” dialog.

If that is not the case, you can also manually add a driver. To do so, locate the jdbc_properties.xml file (see “Where does RapidMiner Studio store its settings?” ) and edit it as follows:

      <drivers>
<driver dbnameseparator="DRIVER_SEPARATOR" defaultport="DRIVER_PORT" driver_jar="C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\DRIVER_JAR.jar" drivers="package.structure.for.DriverClass" name="DB_NAME" urlprefix="JDBC_URL_PREFIX"/>
</drivers>
    

Between the outer <drivers> element, there may be multiple <driver> elements (one for each additional driver). Edit the above to match the required settings for the desired JDBC driver. Restarting RM Studio afterwards is necessary.

Where are plugins for RapidMiner Studio stored

by RMSupport on ‎05-26-2016 06:03 AM

RapidMiner Studio stores plugins installed via the marketplace from within RapidMiner Studio in the .RapidMiner/managed folder in the user home directory. You can install new plugins via the marketplace inside the application or by copying the plugin into the lib/plugins folder in your RapidMiner Studio installation directory. To uninstall a plugin, go to “View” -> “Manage Extensions” and select the red 'X' to remove it.

Note that to uninstall a plugin installed from the marketplace manually, the extensions.xml inside the folder has to be updated as well by removing the <extension> ... </extension> element for said extension.

I have funny characters in my example sets. I suspect an encoding problem.

by RMSupport on ‎05-26-2016 06:03 AM

Problem:

Encoding settings of the database, the settings of a database connection configured in RapidMiner Studio or Server, or the JBoss instance that hosts RapidMiner Server are incorrect. Many file input operators can also specify an encoding.

Solution:

You should use the utf8 encoding wherever possible. Database settings can be made per

  • Database : In MySQL, use “ALTER DATABASE xxx DEFAULT CHARACTER SET utf8”

  • Table : Newly created tables will inherit from the default character set and can be otherwise specified in the CREATE statement.

  • RapidMiner Studio JDBC connection : Set the appropriate connection properties (see below for a list). In RapidMiner Studio this is possible via Tools > Manage Database Connections > Advanced.

The encodingName you want to use is almost always utf8. What exactly the name of the JDCB property is, depends on the database. Known values are:

  • MySQL: characterEncoding

  • MS SQL Server via JTDS driver: CHARSET

  • Oracle: charset

Processes can configure the encoding via parameters of input operators.

My results view won't load! What do I do?

by RMSupport on ‎05-26-2016 06:03 AM

In version 7.0.0 and 7.0.1 in some rare cases an issue arises that can prevent the results from showing when you finish a process.



Here is a workaround to restore the view.

1) Shutdown Studio
2) Enter “.RapidMiner” folder and delete vlperspective-predefined-result.xml from it.
3) Start Studio and test

How do I configure a proxy in RapidMiner Studio

by RMSupport on ‎05-26-2016 06:03 AM

In RapidMiner Studio, proxy settings can be made via Tools > Preferences > System. Enable proxies for the individual protocols, HTTP, HTTPS, and FTP independently. Ask your network administrator for proxy host, port, and credentials. Note that oftentimes it is not necessary to use a proxy in Intranets. That means that if you to connect RapidMiner Studio to RapidMiner Server you may want to bypass the proxy. In that case, enter the hostname running RapidMiner Server into the field http.nonProxyHosts and https.nonProxyHosts. RapidMiner Studio needs to be restarted for these settings to take effect.