RapidMiner

Community Manager
‎08-15-2017 12:38 PM
463 Views
1 Comment

RapidMiner 7.6 is out - improvements in Radoop, Mail Security and more

Read more...

Hi all -

 

Yes, exciting news today with the release of RapidMiner 7.6.  Here are some highlights:

 

  • Studio: Sending notification emails now support all modern connection security and authentication mechanisms like TLS 1.2 + PFS, help panel text for the most used operators has been fully reviewed and explanations are now clearer and more useful, update Java for Windows and Mac to 8u141, lots of improvements in missing / error data handling

 

  • Server: Allow admin to set recursive folder permissions, improved performance of web services under heavy load, improved logging of installation process, implementation of secure email notifications (like Studio), update for Windows and Mac to 8u141, lots of improvements in missing / error data handling

 

  • Radoop: Support for standard and premium Microsoft Azure HDInsight, container re-use support for Hive-on-Tez as well as Hive-on-Spark, support for HiveServer2 High Availability, upgrade to Hadoop 2.8.1

 

AND an amazing new Studio extension: KERAS Deep Learning.  Check out @jpuente's great new KB article and share your notes on @Thomas_Ott's new thread.

 

Enjoy!


Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
  • Keras
  • mail
  • radoop
  • server
  • Studio
RM Certified Expert
‎05-18-2017 02:16 PM
153 Views
1 Comment

If you happen to work at WeWork Tysons in VA or in the general Washington DC area, come down and say hi! I'm going to be speaking at the Spark DC Meetup group this coming Tuesday (5/23) at 6PM. You can get more Meetup details here.

Read more...

If you happen to work at WeWork Tysons in VA or in the general Washington DC area, come down and say hi! I'm going to be speaking at the Spark DC Meetup group this coming Tuesday (5/23) at 6PM. You can get more Meetup details here.

  • Meetup
  • radoop
  • Spark
RM Certified Expert
‎02-20-2017 08:31 AM
1399 Views
0 Comments

SparkRM is a new Radoop operator - but not just any new operator to be added to the 70+ collection that the Radoop extension includes - it’s an operator that opens a wealth of new use cases for exploiting and analyzing Hadoop data with RapidMiner.

Read more...

By: Jesus Puente

 

SparkRM is a new Radoop operator - but not just any new operator to be added to the 70+ collection that the Radoop extension includes - it’s an operator that opens a wealth of new use cases for exploiting and analyzing Hadoop data with RapidMiner.

 

SparkRM is a meta-operator, which means that you can double-click on it and a new canvas is open where you can design a new process (similar to what you would find in the “Split Validation”, for instance). What’s special in SparkRM is that, even though it is a Radoop operator, the inner process has to be designed using non-Radoop, regular RapidMiner operators. And, whatever operator or sub-process one places inside SparkRM, they will be packaged and pushed to Hadoop for execution in a parallel way.

 

le. Let’s imagine you have a lot of text data in your Hadoop environment and you want to analyze it using RapidMiner’s Text Processing Extension. Well, now you can. You can read them and feed them into the SparkRM operator.

The data will be passed onto the non-Radoop sub-process inside. You can process, tokenize, create word lists, find expressions, n-grams, etc. and everything within the Hadoop cluster.

 

A typical process would look like:

 SparkRM ProcessSparkRM Process

 

And this is what you would have inside the SparkRM operator:

 

SparkRM Text Mining ExtensionSparkRM Text Mining Extension

 

Some typical parameters of SparkRM include the file format (textfile or parquet) and the partitioning mode.

 

SparkRM ParametersSparkRM Parameters

 

Once the task is finished, the result is returned as usual through the output ports. The first output port is for data sets, and it can be merged. If the data coming from the different partitions is consistent (same metadata), the operator simply appends everything together. If not, then there is an option to “resolve schema conflicts” and add the necessary missing values so that the full dataset contains all the information from all the partitions. This is especially useful when analyzing text, because the word-list of a certain text will not probably be the same as that of another.

I have described an example for text processing, but you can imagine any other extension or algorithm that’s not in Radoop: Series Forecasting, Deep Learning, Neural Networks, Process Mining, etc.

 

 

 

  • radoop
  • SparkRM