Options

Release of the version 0.6.1 of the Operator Toolbox

tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
edited December 2018 in Knowledge Base

Dear Community,

 

We are glad to announce the release of the version 0.6.1 of the Operator Toolbox extension (Marketplace Link).

Two new Operators are included in the new version and two existing Operators are provided with an enhanced functionality.

 

Smote Upsampling

 

There are situtations when you only have a small number of Examples of one of your classes and you want to upsample your Examples to provide your machine learning algorithm with a larger number of Examples or with an equalized class distribution. ntil now you can do this by using the Sample (Boostrapping) Operator. But if you want to upsample with similar but not the same Examples you can now use the new Smote Upsampling Operator.

 

The Operator implements the Synthetic Minority Over-sampling Technique, as proposed by Chawla et. al., Journal of Artificial Intelligence Research 16 (2002), 321 - 357.

 

The Operator only samples up the minority class. A new Example is generated by using a random Example of the minority class. Than the k nearest neighbors (also from the minority class) of this Example are calculated and one of them is randomly chosen. The new Example is created on the line between the two Examples.

 

Figure 1 illustrates the principle functionality.

 

 

Smote.pngFigure 1: Illustration of the principle functionality of Smote Upsampling algorithm. All Examples are from the minority class.

Generate Univariate Series

 

The new Operator Generate Univariate Series is an enhanced version of the old Generate Date Series Operator of the Operator Toolbox extension .Besides the option of generating an equaly spaced date series, the new Operator is also capable to generate an equidistant real valued series. By the use of the parameter data_type the user can now specify if he wants a numeric (linear spaced) series or a date series. For the real valued series the min and max value and the step size can be specified. In case of a date series the already known parameters of the Generate Date Series Operator can be used to generate the series.

 

Figure 2 shows a process in which the new Operator is used to generate a series with values between -pi (-180°) and 2*pi (360°) with a stepsize of pi/9 (10°).The Generate Attributes Operator is used to calculates the sinus of these values.

 

Generate_Univariate_Series_Result.pngFigure 2: Process to generate a real valued series (called x) between -pi and 2*pi with a stepsize of pi/9. The parameters of the Generate Univariate Series Operator is shown, as well as the result of the sinus(x) calculation in Generate Attributes.

The shown process is also provided as a tutorial process in the Operator help. The old Generate Date Series Operator is now deprecated and will be removed in the future from the extension.

 

Enhancements

 

The Generate ExampleSet Operator is now capable to parse all data as nominal Attributes.

The Group Into Collection Operator now keeps the special roles, set for Attributes in the input ExampleSet, also in the ExampleSets of the Collection.

Comments

  • Options
    ManarManar Member Posts: 9 Newbie
    Thank you so much, but I have a question, please.
    Can we implement the smote with the TF-IDF in text classification?
     
  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    of course you can add it on tf-idf'ed texts
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.