RapidMiner

Keras Deep Learning extension

by RMStaff on ‎08-15-2017 12:01 PM - edited Saturday by Community Manager Community Manager

Keras is a high level neural network API, supporting popular deep learning libraries like Tensorflow, Microsoft Cognitive Toolkit, and Theano.


The RapidMiner Keras extension provides a set of operators that allow an easy visual configuration of Deep Learning network structures and layers. Calculations are pushed into the Python-based backend libraries, so you can leverage the computing power of GPUs and grid environments. 

The extension makes use of an existing Keras installation. This article shows how to do a simple deployment of Keras and how to configure the Keras extension to connect to it.

 

Let's review several options:

 

Anaconda on MacOS

 

Warning: As of version 1.2, TensorFlow no longer provides GPU support on macOS. 

  1. Download and install Anaconda from: https://www.continuum.io/downloads#macos
  2. Create a new environment by typing in command line: conda create –n keras
  3. Activate the created environment by typing in the command line: source activate keras
  4. Install pandas by typing in the command line: conda install pandas
  5. Install scikit-learn by typing in the command line: conda install scikit-learn
  6. Install keras by typing in the command line: conda install -c conda-forge keras
  7. Install graphviz by typing in the command line: conda install –c anaconda graphviz
  8. Install pydotplus by typing in the commandline conda install –c conda-forge pydotplus
  9. In RapidMiner Studio Keras and Python Scripting panels in preferences, specify the path to your new conda environment Python executable.

 

You’re good to go!


Anaconda on Windows

 

Warning: Due to issues with package dependencies, it is not currently possible to install graphviz and pydot in a conda environment on Windows, and consequently to visualise the model graph in the results panel.

 

  1. Download and install Anaconda from: https://www.continuum.io/downloads#windows
  2. Create a new environment with Python 3.5.2 by typing in command line: conda create –n Python35 python=3.5.2
  3. Activate the created environment by typing in the command line: activate Python35
  4. Install pandas by typing in the command line: conda install pandas
  5. Install scikit-learn by typing in the command line: conda install scikit-learn
  6. Install keras by typing in the command line: conda install -c jaikumarm keras=2.0.4
  7. In RapidMiner Studio Keras and Python Scripting panels in preferences, specify the path to your new conda environment Python executable.

 

You’re good to go!

 

 

Windows

 

  1. Download and install Python 3.5.2 from: https://www.python.org/downloads/release/python-352/

Only python 3.5.2 works for windows.

  1. Install numpy with Intel Math Kernel library.
  1. Install pandas from the command line: pip3 install pandas
  2. Install graphviz from the command line: pip3 install graphviz
  3. Install pydot from the command line: pip3 install pydot
  4. Install TensorFlow.
    • If you would like to install TensorFlow with GPU support, please see the instructions here: https://www.tensorflow.org/install/install_windows
    • If you would like to install TensorFlow only with CPU support, from the command line run: pip3 install –upgrade tensorflow
  5. Install Keras from the command line: pip3 install keras

 

You’re good to go!

 

 

 

RapidMiner extension

 

  1. Install the Keras extension from the RapidMiner Marketplace
  2. Install RapidMiner Python Scripting extension from the marketplace if not already installed.
  3. Restart RapidMiner Studio.
  4. Inside your Studio Client go to Settings (Menu) > Preferences and navigate to “Python Scripting” tab/page on the left. Provide path to Python executable and click test to ensure it is successful.
  5. Inside your Studio Client go to Settings (Menu) >Preferences and navigate to “Keras” tab/page on the left. Provide path to Python executable and click test to ensure it is successful.

 

Try out a few sample processes from the “Keras Sample” in the repository view.

 Capture.PNG

 

 

Comments
Moderator
Moderator

I'm getting some "script terminated" early errors. Running on Ananconda for Windows and followed the install. Happens to all the Keras examples I try.

jacobcybulski
Regular Contributor

I have tried it on Anaconda / Tensorflow / Keras with Ubuntu 16.04. Works very well, except that placing breakpoints interferes with the closing of Keras processes, which results in turning numeric predictions into nominal and ends up in error (only in the s&p example).

jacobcybulski
Regular Contributor

Worth reminding the potential users that if you follow the Tensorflow installation instructions then you end up installing Keras in a Tensorflow environment. It is then imperative that the RapidMiner preferences select the location of Python in that environment rather than in the default Anaconda ROOT location.

jacobcybulski
Regular Contributor

Note that most of the issues listed here have been addressed in Keras 1.0.1, which was released only one week after that original post!

 

Some preliminary observations on the first release of Keras extension, which I've tested on Ubuntu 16.04 with Anaconda 4.3, Python 3.5, Keras with Tensorflow back end. I have used King County data set from Kaggle to test it all out.

 

  • This extension is a fantastic start and so the following critical comments are here only to assist in further improvement of the extension
  • Only Keras sequential models are available
  • RM would be ideal for functional models, which are currently not supported
  • Keras Model does not allow to take separate training and validation data sets (commonly practiced in Python community) and forces you to rely on the validation split option, what it means is that many pre-defined data sets cannot be tested on in RM (such as those from Kaggle)
  • Keras Model has no local random seed to be set but this would be very useful for repeatability
  • Keras Model seems to be running on a CPU or on one GPU only, there is no way of controlling which GPU is to be used and to switch to another at any point in processing
  • Keras Model predefined metrics are suitable for classification only, however, you need to include metrics for regression style measurements, e.g. "mae" which works well when it is "injected" directly into the ".rmp" file
  • Keras Model default metric "None" causes a syntax error
  • Keras Model allows only one metric to be defined
  • Keras Model should output the history of training and validation loss and of any other metrics defined (obtained from model.fit, history variable)
  • Apply Keras Model randomly changes numeric predictions into nominal, so you need to pass the data through Guess Types operator, which seems to work well so far
  • Standard performance measures handle all output from Apply Keras Model
  • Tensorboard callback hangs when it has no write access to the directory (on Linux RM runs as a separate process, so it will not have access to the created folders)
  • Tensorboard callback works great, just remember to use validation split to get validation metrics displayed
  • ModelCheckpoint works just fine
  • EarlyStopping callback also works well
  • There is no Model Save / Model Save Weights or Model Load / Model Load Weights operators (also any of the model to / from json operators for inter-connectivity)
  • There is no way to utilise models saved as HDF5 checkpoints, I could not even read the HDF5 checkpoint files back into RM anyway
  • However, the Keras model can be stored and then retrieved back (as KerasIOModel) to be used by Apply Keras Model operator
  • No Keras text pre-processing, I assume RM text processing is to be used instead
  • No Keras image pre-processing, and no obvious replacement in RM for this
  • No Keras batch by batch processing (model.train_on_batch), which could take advantage of multiple GPUs or appending of batches to the current GPU memory
  • Also not sure if Keras Model could accept anything but a data frame, if so it is curious why the model requires the input shape (the shape for a data frame is always standard)
  • It would be great for Build Keras Model to guess the shape for a data frame input rather than being constantly caught by incorrect number of attributes passed
  • Currently all parameters are lines to be passed into Python, if there is any mistake made a syntax error in Python is generated
  • If the package is to stay as a Python interface then it would be great to include a field to accommodate own Python code which could be loaded prior to Keras operators

Great work Keras team -- Jacob

 

P.S. Results on the King County house price prediction are not that great but plenty of scope for further improvement, for your reference: RMSE=$76,966, MAE=$54,851 +/- $53,991, Corr=0.921, which was unexpectedly a better result than (perhaps) better suited for the task H2O Gradient Boosted Trees. Try to improve if you can! 

hughesfleming68
Super Contributor

Thanks Jacob for the detailed write up. FYI to anyone experimenting.... I did dive into this with a working Tensorflow 1.3 and Python 3.6.1 Anaconda installation. I have had no problems or errors running the examples from Rapidminer. 

 

There may be other issues that show up later but so far so good. New users will have to look at this or they will be lost.

https://keras.io/getting-started/sequential-model-guide/

 

Alex

RMStaff
RMStaff

@jacobcybulski i was the person in charge of developing the keras extension. thank you for the feedback! it'll be very useful--we're getting to work on your suggestions. let me also answer a couple of your comments on here.

 

Keras Model seems to be running on a CPU or on one GPU only, there is no way of controlling which GPU is to be used and to switch to another at any point in processing

 

there is indeed no way to choose a GPU as of yet but in our test all the available gpu's were being used. are you sure you could only use one?

 

There is no way to utilise models saved as HDF5 checkpoints, I could not even read the HDF5 checkpoint files back into RM anyway

 

the Python Scripting Extension, which the Keras extension relies on, currently doesn't support serialising to HDF5, so this will have to wait.

 

Also not sure if Keras Model could accept anything but a data frame, if so it is curious why the model requires the input shape (the shape for a data frame is always standard)

 

indeed, the superoperator can only handle examplesets. however, different input layers require different input shapes. if you start with a dense layer, then the input shape could be easily deduced. on the other hand convolutional or recurrent layers require specifying an input shape different than the simple number of features. for example, when using a conv1d layer, the input_shape needs to be (batch_size, timesteps, input_dim) and the pre-processing is done automatically by rapidminer. this is shown in the sample processes

jacobcybulski
Regular Contributor

Thank you so much @dgrzech for your feedback. I will check the multiple GPUs but this will have to wait as my multi-GPU system is currently out. I understand now the need for input shape, my mind was set on the super operator with the dense layer first. Excellent work, I am very keen on using Keras in RapidMiner as it is greatly simplifying access to this excellent deep architecture.

Jacob

jacobcybulski
Regular Contributor

Looks like we have a very speedy update on Keras 1.0.1, this is a great effort @dgrzech. I have quickly tested the new release and can confirm that lots of previous issues have been fixed very effectively! This time all tests were done in Win 10, same versions of Anaconda, Tensorflow and Keras as before.

 

  • Now Keras Models take separate training and validation data sets
  • We can set the local random seed for shuffle repeatability
  • Variety of metrics can be defined for both classification and regression
  • Keras history is now produced for charting the training performance
  • So far my tests show that Keras Model correctly assigns label types
  • All callbacks work great (e.g. TensorBoard, ModelCheckpoint, EarlyStopping)
  • ModelCheckpoint works just fine (not sure how to read it back as yet)

This was terrific! A couple of new issues that you could possibly untangle for me:

  • Is it possible to call your own Python code in callbacks or optimizers - it seems, no problem here. However, I am not sure how to sneak the code in (would Keras communicate with Python extension somehow? at the moment it seems like a new process is launched)
  • Also at this point in time, I am not sure how to pass multi-label examples for training or create and test auto-encoders - the old RM trick of looping through the labels makes no sense for deep learning models

Jacob

RMStaff
RMStaff

again thank you for your feedback @jacobcybulski! let me answer your questions.

 

Is it possible to call your own Python code in callbacks or optimizers

 

not yet, but it should be made possible in the next update of the extension

 

I am not sure how to pass multi-label examples

 

as you correctly noticed rapidminer doesn't currently support multiple label columns so this isn't possible for now but we're working on it