RapidMiner

‎09-16-2017 07:15 PM

Keras is a high level neural network API, supporting popular deep learning libraries like Tensorflow, Microsoft Cognitive Toolkit, and Theano.


The RapidMiner Keras extension provides a set of operators that allow an easy visual configuration of Deep Learning network structures and layers. Calculations are pushed into the Python-based backend libraries, so you can leverage the computing power of GPUs and grid environments. 

The extension makes use of an existing Keras installation. This article shows how to do a simple deployment of Keras and how to configure the Keras extension to connect to it.

 

Let's review several options:

 

Anaconda on MacOS

 

Warning: As of version 1.2, TensorFlow no longer provides GPU support on macOS. 

  1. Download and install Anaconda from: https://www.continuum.io/downloads#macos
  2. Create a new environment by typing in command line: conda create –n keras
  3. Activate the created environment by typing in the command line: source activate keras
  4. Install pandas by typing in the command line: conda install pandas
  5. Install scikit-learn by typing in the command line: conda install scikit-learn
  6. Install keras by typing in the command line: conda install -c conda-forge keras
  7. Install graphviz by typing in the command line: conda install –c anaconda graphviz
  8. Install pydotplus by typing in the commandline conda install –c conda-forge pydotplus
  9. In RapidMiner Studio Keras and Python Scripting panels in preferences, specify the path to your new conda environment Python executable.

 

You’re good to go!


Anaconda on Windows

 

Warning: Due to issues with package dependencies, it is not currently possible to install graphviz and pydot in a conda environment on Windows, and consequently to visualise the model graph in the results panel.

 

  1. Download and install Anaconda from: https://www.continuum.io/downloads#windows
  2. Create a new environment with Python 3.5.2 by typing in command line: conda create –n Python35 python=3.5.2
  3. Activate the created environment by typing in the command line: activate Python35
  4. Install pandas by typing in the command line: conda install pandas
  5. Install scikit-learn by typing in the command line: conda install scikit-learn
  6. Install keras by typing in the command line: conda install -c jaikumarm keras=2.0.4
  7. In RapidMiner Studio Keras and Python Scripting panels in preferences, specify the path to your new conda environment Python executable.

 

You’re good to go!

 

 

Windows

 

  1. Download and install Python 3.5.2 from: https://www.python.org/downloads/release/python-352/

Only python 3.5.2 works for windows.

  1. Install numpy with Intel Math Kernel library.
  1. Install pandas from the command line: pip3 install pandas
  2. Install graphviz from the command line: pip3 install graphviz
  3. Install pydot from the command line: pip3 install pydot
  4. Install TensorFlow.
    • If you would like to install TensorFlow with GPU support, please see the instructions here: https://www.tensorflow.org/install/install_windows
    • If you would like to install TensorFlow only with CPU support, from the command line run: pip3 install –upgrade tensorflow
  5. Install Keras from the command line: pip3 install keras

 

You’re good to go!

 

 

 

RapidMiner extension

 

  1. Install the Keras extension from the RapidMiner Marketplace
  2. Install RapidMiner Python Scripting extension from the marketplace if not already installed.
  3. Restart RapidMiner Studio.
  4. Inside your Studio Client go to Settings (Menu) > Preferences and navigate to “Python Scripting” tab/page on the left. Provide path to Python executable and click test to ensure it is successful.
  5. Inside your Studio Client go to Settings (Menu) >Preferences and navigate to “Keras” tab/page on the left. Provide path to Python executable and click test to ensure it is successful.

 

Try out a few sample processes from the “Keras Sample” in the repository view.

 Capture.PNG

 

 

Comments
RM Certified Expert
RM Certified Expert

I'm getting some "script terminated" early errors. Running on Ananconda for Windows and followed the install. Happens to all the Keras examples I try.

Contributor II jacobcybulski
Contributor II

I have tried it on Anaconda / Tensorflow / Keras with Ubuntu 16.04. Works very well, except that placing breakpoints interferes with the closing of Keras processes, which results in turning numeric predictions into nominal and ends up in error (only in the s&p example).

Contributor II jacobcybulski
Contributor II

Worth reminding the potential users that if you follow the Tensorflow installation instructions then you end up installing Keras in a Tensorflow environment. It is then imperative that the RapidMiner preferences select the location of Python in that environment rather than in the default Anaconda ROOT location.

Contributor II jacobcybulski
Contributor II

Note that most of the issues listed here have been addressed in Keras 1.0.1, which was released only one week after that original post!

 

Some preliminary observations on the first release of Keras extension, which I've tested on Ubuntu 16.04 with Anaconda 4.3, Python 3.5, Keras with Tensorflow back end. I have used King County data set from Kaggle to test it all out.

 

  • This extension is a fantastic start and so the following critical comments are here only to assist in further improvement of the extension
  • Only Keras sequential models are available
  • RM would be ideal for functional models, which are currently not supported
  • Keras Model does not allow to take separate training and validation data sets (commonly practiced in Python community) and forces you to rely on the validation split option, what it means is that many pre-defined data sets cannot be tested on in RM (such as those from Kaggle)
  • Keras Model has no local random seed to be set but this would be very useful for repeatability
  • Keras Model seems to be running on a CPU or on one GPU only, there is no way of controlling which GPU is to be used and to switch to another at any point in processing
  • Keras Model predefined metrics are suitable for classification only, however, you need to include metrics for regression style measurements, e.g. "mae" which works well when it is "injected" directly into the ".rmp" file
  • Keras Model default metric "None" causes a syntax error
  • Keras Model allows only one metric to be defined
  • Keras Model should output the history of training and validation loss and of any other metrics defined (obtained from model.fit, history variable)
  • Apply Keras Model randomly changes numeric predictions into nominal, so you need to pass the data through Guess Types operator, which seems to work well so far
  • Standard performance measures handle all output from Apply Keras Model
  • Tensorboard callback hangs when it has no write access to the directory (on Linux RM runs as a separate process, so it will not have access to the created folders)
  • Tensorboard callback works great, just remember to use validation split to get validation metrics displayed
  • ModelCheckpoint works just fine
  • EarlyStopping callback also works well
  • There is no Model Save / Model Save Weights or Model Load / Model Load Weights operators (also any of the model to / from json operators for inter-connectivity)
  • There is no way to utilise models saved as HDF5 checkpoints, I could not even read the HDF5 checkpoint files back into RM anyway
  • However, the Keras model can be stored and then retrieved back (as KerasIOModel) to be used by Apply Keras Model operator
  • No Keras text pre-processing, I assume RM text processing is to be used instead
  • No Keras image pre-processing, and no obvious replacement in RM for this
  • No Keras batch by batch processing (model.train_on_batch), which could take advantage of multiple GPUs or appending of batches to the current GPU memory
  • Also not sure if Keras Model could accept anything but a data frame, if so it is curious why the model requires the input shape (the shape for a data frame is always standard)
  • It would be great for Build Keras Model to guess the shape for a data frame input rather than being constantly caught by incorrect number of attributes passed
  • Currently all parameters are lines to be passed into Python, if there is any mistake made a syntax error in Python is generated
  • If the package is to stay as a Python interface then it would be great to include a field to accommodate own Python code which could be loaded prior to Keras operators

Great work Keras team -- Jacob

 

P.S. Results on the King County house price prediction are not that great but plenty of scope for further improvement, for your reference: RMSE=$76,966, MAE=$54,851 +/- $53,991, Corr=0.921, which was unexpectedly a better result than (perhaps) better suited for the task H2O Gradient Boosted Trees. Try to improve if you can! 

Contributor II hughesfleming68
Contributor II

Thanks Jacob for the detailed write up. FYI to anyone experimenting.... I did dive into this with a working Tensorflow 1.3 and Python 3.6.1 Anaconda installation. I have had no problems or errors running the examples from Rapidminer. 

 

There may be other issues that show up later but so far so good. New users will have to look at this or they will be lost.

https://keras.io/getting-started/sequential-model-guide/

 

Alex

RM Staff
RM Staff

@jacobcybulski i was the person in charge of developing the keras extension. thank you for the feedback! it'll be very useful--we're getting to work on your suggestions. let me also answer a couple of your comments on here.

 

Keras Model seems to be running on a CPU or on one GPU only, there is no way of controlling which GPU is to be used and to switch to another at any point in processing

 

there is indeed no way to choose a GPU as of yet but in our test all the available gpu's were being used. are you sure you could only use one?

 

There is no way to utilise models saved as HDF5 checkpoints, I could not even read the HDF5 checkpoint files back into RM anyway

 

the Python Scripting Extension, which the Keras extension relies on, currently doesn't support serialising to HDF5, so this will have to wait.

 

Also not sure if Keras Model could accept anything but a data frame, if so it is curious why the model requires the input shape (the shape for a data frame is always standard)

 

indeed, the superoperator can only handle examplesets. however, different input layers require different input shapes. if you start with a dense layer, then the input shape could be easily deduced. on the other hand convolutional or recurrent layers require specifying an input shape different than the simple number of features. for example, when using a conv1d layer, the input_shape needs to be (batch_size, timesteps, input_dim) and the pre-processing is done automatically by rapidminer. this is shown in the sample processes

Contributor II jacobcybulski
Contributor II

Thank you so much @dgrzech for your feedback. I will check the multiple GPUs but this will have to wait as my multi-GPU system is currently out. I understand now the need for input shape, my mind was set on the super operator with the dense layer first. Excellent work, I am very keen on using Keras in RapidMiner as it is greatly simplifying access to this excellent deep architecture.

Jacob

Contributor II jacobcybulski
Contributor II

Looks like we have a very speedy update on Keras 1.0.1, this is a great effort @dgrzech. I have quickly tested the new release and can confirm that lots of previous issues have been fixed very effectively! This time all tests were done in Win 10, same versions of Anaconda, Tensorflow and Keras as before.

 

  • Now Keras Models take separate training and validation data sets
  • We can set the local random seed for shuffle repeatability
  • Variety of metrics can be defined for both classification and regression
  • Keras history is now produced for charting the training performance
  • So far my tests show that Keras Model correctly assigns label types
  • All callbacks work great (e.g. TensorBoard, ModelCheckpoint, EarlyStopping)
  • ModelCheckpoint works just fine (not sure how to read it back as yet)

This was terrific! A couple of new issues that you could possibly untangle for me:

  • Is it possible to call your own Python code in callbacks or optimizers - it seems, no problem here. However, I am not sure how to sneak the code in (would Keras communicate with Python extension somehow? at the moment it seems like a new process is launched)
  • Also at this point in time, I am not sure how to pass multi-label examples for training or create and test auto-encoders - the old RM trick of looping through the labels makes no sense for deep learning models

Jacob

RM Staff
RM Staff

again thank you for your feedback @jacobcybulski! let me answer your questions.

 

Is it possible to call your own Python code in callbacks or optimizers

 

not yet, but it should be made possible in the next update of the extension

 

I am not sure how to pass multi-label examples

 

as you correctly noticed rapidminer doesn't currently support multiple label columns so this isn't possible for now but we're working on it

Learner III Montse
Learner III

I have had some problems installing the Keras extension into RapidMiner that I have already solved.
Following Anaconda Windows steps, after installing the Python&Keras Extension, I have had a problem trying to test Keras into RapidMiner Studio Settings(Menu) >Preferences >Keras tab with the following messages:
.Graphviz not installed
.Pydot not installed

 

To solve that:

1. Go to Anaconda prompt

2. Activate the last created environment by typing in the command line: activate Python35

3. Install pip by typing in the command line: conda install pip

4. Install graphviz by typing in the command line: pip install graphviz

5. Install pydot by typing in the command line: pip install pydot

6. Inside your Studio Client go to Settings (Menu) >Preferences and navigate to “Keras” tab/page on the left. Provide path to Python executable (the path to your new conda environment Python executable) and click test to ensure it is successful.

 

Good luck!

RM Certified Analyst
RM Certified Analyst

Thanks to all for the very helpful information in the posts above.  I'm a Newbie to using the Keras extension and Tensorboard, but I was able to get through the setup and visualize the Callback (loss) in the "Boston Housing Prices" sample RM process.

 

Can anyone please point me in the right direction as to visualizing other metrics (accuracy, etc.) in Tensorboard?  I have experimented a bit, but it appears I have a bit to learn about possible syntax variations for the "callbacks" parameter of the Keras operator that relate specifically to Tensorboard.  I am assuming that I need to adapt the syntax from the provided use of calling Tensorboard from the "callback" parameter of the Keras operator.

 

For example, I have used the following callback for the "Boston Housing Prices" (with 1024 epochs) sample process:

 

TensorBoard(log_dir="c:\TensorboardLogDir", histogram_freq=256, write_graph=True, write_images=True, embeddings_freq=256, embeddings_layer_names=None, embeddings_metadata=None)

 

Upon looking at the callback output in Tensorboard, there are no Histograms.  As a syntax reference, I looked at https://keras.io/callbacks/#tensorboard.   When I added the write_grads=True arguement to the callabck, this threw an error in RM Studio. 

 

 

Any guidance much appreciated.   Best wishes, Michael

Learner III Montse
Learner III

Hi Michael,

 

Do you know which version of Keras you have? Maybe 2.0.4 (if you have followed this steps)

You need to upgrade Keras to the latest release (2.0.5 or higher). Previous versions do not support the write_grads argument.

 

To upgrade Keras:

0. Go to Anaconda prompt

1. Make a clone of your environment Python35 (for security): conda create --name Python35_Old --clone Python35

2. Take care you have cloned this new environment for security: conda info --envs

2. Activate Python35 environment: activate Python35

3. Uninstall Keras: conda uninstall keras

4. Install new version of Keras: conda install -c conda-forge keras 

 

And now you can use the he write_grads argument as you can.

RM Certified Analyst
RM Certified Analyst

Thanks very much for your advice - and yes, I have 2.0.4.  Will follow the steps you suggested.  

 

Do you know of any resource(s) that explain the various syntatical possibilities for the "callbacks" parameter of the "Keras Model" operator - as well as how one could visualise metrics other than loss?   Best wishes, Michael

RM Certified Analyst
RM Certified Analyst

Hi:

 

After installing Keras 2.0.6, I get the following error messages from a callback (Keras operator):

TensorBoard(log_dir="c:\TensorboardLogDir", histogram_freq=32, write_graph=True,  write_images=True, embeddings_freq=32, embeddings_layer_names=None, embeddings_metadata=None)

This callback didn't throw an error using Keras 2.0.4.   There is a reference to saving a script - does this mean I need to include a complete path to a file that would contain the script (i.e. output)?  Any suggestions?    Thanks for considering this if possible and best wishes, Michael

 

Keras_Operator_Error_Message.jpg

Learner III Montse
Learner III

Do you have Python & Keras extensions into RapidMiner in the same path that you have installed this new version of Keras?

RM Certified Analyst
RM Certified Analyst

Yes - within RapidMiner Preferences for Keras and Python Scripting, the path points to the Python Environment with Keras 2.0.0.6.  

 

As per your suggestion, I cloned the original environment (with Keras 2.0.0.4).  I then uninstalled 2.0.4 from that environment and installed 2.0.06.  Interestingly enough, despite the error, the process generates an output file to my log directory - but only Graph Images, no scalr values.  Best wishes, Michael

RM Certified Analyst
RM Certified Analyst

Realized I should have tried the callback referenced above using Keras 2.0.4 and python.exe from within the cloned environment.  The callback works as it did before, outputting scalar values and a graph image.  Would like to measure accuracy as well as loss, but am not sure how to configure the callback to output both metrics.  Perhaps the callback syntax is different with Keras 2.0.6?    Best wishes, Michael

Learner III Montse
Learner III

I'm sorry but I don't know which is the problem in your case...

I've upgraded Keras (2.0.6) and I've added into callback  write_grads=True. The sintax is the same:

TensorBoard(log_dir='./logs', histogram_freq=0, write_graph=True, write_images=False, embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None,write_grads=True )

...and it works fine.

Maybe Keras has not been installed fine?

You can try to list all the packages installed in this environment with: conda list.

Keras has to be supported with Python 3.5

 

 

 Keras package.png

 

 

RM Certified Analyst
RM Certified Analyst

Thanks for your message. 

 

I had no problems installing 2.0.6 and the condo list command showed that 2.0.6 installed on my system without any problems.

 

I used your callback:

 

TensorBoard(log_dir="C:\TensorboardLogDir", histogram_freq=0, write_graph=True, write_images=False, embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None,write_grads=True )

 

and it works fine on my system under 2.0.6 as well.  I think my issue occured becuase my callback was slightly different:

 

TensorBoard(log_dir="C:\TensorboardLogDir", histogram_freq=32, write_graph=True, write_grads=True,  write_images=False, embeddings_freq=32, embeddings_layer_names=None, embeddings_metadata=None)

 

My callback uses the the value of 32 for the "histogram_reg" and "embedding_freq" parameters as per the guidance from https://keras.io/callbacks/#tensorboard (see below), and the "write_grads" parameter comes earlier in my callback than it does in yours.  You have "write_grads" as the last item in your callback.

 

Guidance from keras.io is below:

 

 callback_params.jpg

I was hoping to see gradiant histograms in my Tensorboard visualisation which is why I set "histogram_freq" and "embeddings_freq" to 32 as per the screen shot above - to see gradiant histograms, "histogram_freq" has to be greater than 0.  I think that setting those parameters to 32 caused the issue I wrote about.  

 

Using your callback, I am able to see scalar values and graphs, but no histograms, which I could also see before I started exerimenting with setting "histogram_freq" and "embeddings_freq" to non zero values and adding "write_grads" to me callback.  

 

Perhaps this is because I am using Python 3.6.1, which is the version of Python that Anaconda for 64 bit Windows currently installs.

 

I would also like to viaulize the accuracy ("acc") metric - any idea how I can visualize a metric other than "loss"?   Thanks for reading this and thanks for any further suggestions - and best wishes, Michael

RM Certified Analyst
RM Certified Analyst

Wanted you to know that this callback works fine:

 

TensorBoard(log_dir="C:\TensorboardLogDir", histogram_freq=32, write_graph=True, write_grads=True, write_images=True, embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None)

 

but no Histograms.  Setting the "embeddings_freq" parameter to 0 prevents any RM error messages, but might be preventing the creation of the gradiant histograms.    Best wishes, Michael

Contributor II jacobcybulski
Contributor II

@Montse, In my case installing Keras for RapidMiner on Win10 was a bit more tricky, and the main issue are pydot and graphviz. When you install them both from Anaconda, RapidMiner will see them but the necessary software is still not on the system and will later fail. Here is my advice, assuming you have the current version of Anaconda, Tensorflow and Keras.

 

Installing Keras 

Make sure you install Keras in the previously defined Tensorflow environment, in most cases this was Python 3.5 however recently I have tried Python 3.6 with success. Then activate the environment.

  • From the command line execute: activate tensorflow 

Then install a number of packages, I.e. (the last two for visualisation only) 

  • cuDNN (if using GPU, see above) 
  • conda install HDF5 
  • conda install h5py (the previous may be downgraded at this point in time) 
  • conda install graphviz (you may be lucky) 
  • pip install pydot (you may be lucky) 

Now you should be able to finish installing Keras.

  • pip install keras 

You can now try running RapidMiner, install Keras plugin and configure it by pointing to the python within "tensorflow" environent. Most likely RapidMiner will be happy with the installation but may not be able to display the graphs or it will fail with an error later on.

 

If you were not lucky then graphviz and / or pydot need to be properly installed, try these: 

  • Download "graphviz-2.38.msi" from  
    http://www.graphviz.org/Download_windows.php 
  • Execute the "graphviz-2.38.msi" file 
  • Add the graphviz bin folder to the PATH system environment variable  
    (Example: "C:\Graphviz2.38\bin") 
  • You may need one more step, I.e. install a python-graphviz package: 
    conda install python-graphviz 

You should be able to see the graphs prodused by Keras within RapidMiner.

 

However, I have found on a few systems this was not enough and I had to do the following (I have no idea why but I have found this remedy).

  • Go to Anaconda Prompt using start menu (Make sure to right click and select "Run as Administrator". We may get permission issues if Prompt as not opened as Administrator) 
  • Execute the command: conda install graphviz 
  • Execute the command: pip install git+https://github.com/nlhepler/pydot.git (You will need GIT for this)

In all cases, you may wish to check if things worked out.

  • Execute the command "conda list" and make sure pydot and graphviz modules are listed. 

Good luck -- Jacob

RM Certified Analyst
RM Certified Analyst

Hi Jacob:

 

Many thanks for your detailed post with many helpful points to try.  

 

I did the same thing that you did re: Graphviz - I downloaded the .msi, installed it, set the path statement, and was able to add it to my Python environment.  I should also add that I installed the CPU version of Tensorflow, not the GPU version, as the GPU version has many dependencies.  Does your setup use CPU or GPU?

 

To be clear, I can currently see Scalar values and Graphs in Tensorflow.  What I can't see are Histograms and embeddings.  I have very little idea as to how to confogure a callback to generate histograms or embeddings.  Perhaps I cannot see histograms or embeddings with a CPU install, perhaps I need GPU.  What do you think?

 

Also, the only metric I can see scalar values for is "loss".  I would also like to see other metrics, such as accuracy. Do you have any suggestions as to how the callback should be changed in order to show accuracy (or any other metrics)?

 

To make sure that I understand your suggestions correctly:

 

I should set up a Tensorflow environment

I should activate that tensorflow environment

then Install Tensorflow  (perhaps also the Microsoft CNTK and Theano)

then Install HDF5 

then Install h5py 

then Install graphviz 

then Install pydot

then Install keras (2.0.6)

then Install python-graphviz 

 

Thanks for confirming that I have understood you corretly, and I hope I will get a chance to return your kindess some day.

 

Best wishes, Michael ;-)

 

Learner I potto
Learner I
After trying all the suggested steps to install Keras on a Windows 10 computer, I am giving up. The documentation to install the RapidMiner extension is very poor and the suggestions on how to fix the numerous errors you may experience during the installation are scattered in various forum. I have used RapidMiner over the last few years in the courses I teach and for research but conclude that the Keras installation is not ready for primetime. If someone from RapidMiner is reading this posting, please provide concise instruction on how to get the Keras extension to work properly. Thanks!
Contributor II jacobcybulski
Contributor II

Hi @potto, when you say this plugin is hard to install, you are not wrong.

 

Having said this, Keras extension is not a self-enclosed package but an interface to a large deep learning stack. To make it work in Win10 with GPUs you need to install: NVIDIA CUDA development kit, NVIDIA cuDNN, Anaconda (Python with Anaconda environments), Tensorflow (with specific libraries, some with CONDA and some in PIP), Keras (with quite a few prerequisite libraries), and some additional Win10 software such as Graphviz. However, if you do succeed in its installation, it frees you from writing 100s lines of Python to develop even the simplest deep learning application. Instead, you can focus on high-level modelling tasks, quickly pre-process your data and later report your results. I see Keras plugin as an opportunity for business developers and researchers to dive into deep learning without the (major) pain of becoming programmers first.

 

At this stage, unless you want to make a major investment in an enterprise infrastructure, Keras is the only tool in the RapidMiner world to turn a (relatively) cheap gaming computer into a deep learning on GPUs machine using the mainstream software (H2O, Tensorflow and Keras). The last sentence, possibly could be generalised somewhat, as this is not yet possible in the majority of other workflow analytics software, e.g. SAS Enterprise Miner and IBM SPSS Modeler (current RapidMiner cohabitants of the Gartner leadership qudrant). KNIME has implemented its deep learning with GPUs using Deeplearning4j, which is yet to provie itself on the market and it sits on top of Keras for Python integration.

 

If you want to avoid the hassle of installing the Keras stack, RapidMiner has a nice H2O interface and its deep learning capability, and we are waiting for the GPU support to become available really soon!

 

Jacob

 

P.S. I think at the moment, Keras for the RapidMiner user is hard, however, RapidMiner for the Keras is user is trivial.

Contributor II jacobcybulski
Contributor II

Edit: No problems with frequencies in Tensorboard - need to upgrade your Keras to the current version! See the next post by Michael Martin.

 

@M_Martin, sorry for delay it has been a busy week. I had no luck with histograms and embeddings in Tensorboard callbacks from Keras. In fact, I had no luck of doing this in Python either - worse I managed to crash Python everytime the frequencies for these were non-zero. I have seen a posting to suggest that to get these displayed from Keras you'd need to write your own summary statements within your own implemenation of the fit generator. So we may have to wait for the new version of Keras to deliver this functionality for us and then it'd be available from RapidMiner.

 

However, when it comes to other metrics to be displayed in your Tensoboard scalars, simply add them in the "metrics" list, which is revealed when you click "use metrics" option of the "Build Keras Sequential Model" (currently Keras plugin restricts you to having only one). Make sure that you have validation data available for this, either by supplying it on input or defining the validation split (otherwise you'd get the Python error).

 

And yes, your list of steps is correct, I'd add "install Graphviz.exe (and include its bin in the PATH)".

 

Jacob

RM Certified Analyst
RM Certified Analyst

Thursday

 

Hi Colleagues:

 

Have made some real progress re: the questions I have raised so I would like to share what I have come up with in the hope it will be helpful. Many thanks to the people who answered my posts in this KB thread who helped me get started. ;-)

 

As far as setting up Keras in RapidMiner (and other related Python packages Keras needs), here's what seems to be working for me on several Dev boxes in my shop (caveat: all boxes are running Windows 7 64 bit, with SP1 - all machines have either 16 or 32 GB of RAM and I7 processors).

 

1. Install the Keras extension from the Rapid Miner Extensions Marketplace into RapidMiner Studio

2. Download and install Anaconda 3 (https://www.anaconda.com/)  Get the appropriate version (64 or 32 bit).

3. Run the Anaconda Navigator (should now be in your Windows Program Group)

4. Create a new environment using the Anaconda Navigator.  The Navigator (as of this writing) will suggest using Python 3.6, but there is also an option for Python 3.5,  Tick on Python 3.5 as I understand that the RapidMiner Keras extension was developed using Python 3.5.  I named my Environment py35.

5.  After Anaconda creates the environment, open up the Anaconda Prompt (should be listed as a shortcut from the Start Menu, or within the Anaconda program group.  Though I didn't have to, you could left click on the icon for the Anaconda prompt and select "Run as Administrator"

6. You neeed to activate the new environment you just created.  From the prompt type: activate <environment name>.  If you named your environment py35, you would type: activate py35.   Then hit Enter/Return.  After a few seconds, the Anaconda prompt will return, and the environment will be active.  The new environment name should now be part of the Anaconda prompt.

7. To see a list of existing Python packages in your new environment, type conda list and then Enter/Return.  You should see a short list of packages.

8. You now need to install a few more Python packages.  Type conda install pandas and then Enter.  After a few seconds, you will be asked to confirm that you want to do the installation.  Type y and the downloading and installation of pandas (and other dependent packages) will begin. When the installation has finished, you'll be returned to the Anaconda prompt for your environment.

9.  Type conda install scikit-learn and then Enter.  Confim that you want to do the install, and then it should start.  When the install is done, you'll be returned to the prompt for your Anaconda environment.

10.  You now need to install a package named Graphviz which requires some extra steps.  Go to http://www.graphviz.org/ and download_windows.php and download graphviz-238.msi.  Then run the msi file you have downloaded to install graphviz (which is a Windows Forms application).

11. Then open the Windows Control Panel, select the System App, and then Advanced System Settings  --> Environment Variables.  Add the path to the Graphviz executable to (at the least) the PATH environment variable for your user account.  The value to append to your existing PATH is C:\Program Files (x86)\Graphviz2.38\bin   Type in a semicolon in front of C:\Program Files (x86)\Graphviz2.38\bin in order to seperate it from the previous entry in your PATH statement.   For good measure (though it may not be strictly neccessary, I also added the following directories to my path statement:

 

C:\Users\YourUserName\Anaconda3\envs\py35;C:\Users\YourUserNamel\Anaconda3\Scripts;C:\Users\YourUserName\Anaconda3\envs\py35\Lib\site-packages  (remember to type in a semicolon after C:\Program Files (x86)\Graphviz2.38\bin before typing in another entry).

 

Substitue your Windows User Account name for the YourUserName directly above.

 

11.  To confirm that your PATH environment variable value has been updated, open a Command Window and type path and then enter.  The value of your PATH environment variable will echo to the screen.  If what you see doesn't include the entries you just added, you'll need to re-boot your system and check again.  

12.  Assuming your PATH has been updated, you can install graphviz (from within the Anaconda prompt for your environment - which should now also show up as a shortcut from the Start Menu or from within the anaconda Program Group) by typing conda install graphviz and then Enter.

13.  Then install the pydot package by typing pip install pydot and then Enter

14.  Last but not least, install Keras (recently updated to version 2.0.6) by typing conda install -c conda-forge keras and then Enter.   After confirming that you want to do the install, Keras and numerous dependent packages will be installed, and you'll be back at the Anaconda prompt for your environment.

15.  If you want to use Tensorboard to visualize your models install the latest version of Tensorflow and Tensorboard by typing

           pip install --ignore-installed --upgrade tensorflow   

and then Enter.  Quite a few packages will be installed, and you'll be back at the Anaconda prompt.   

 

For info re: Tensorflow and Tensorboard, visit https://www.tensorflow.org

 

If you type conda list and then Enter, you will see that your environment now contains many more packages.

 

The last configuration step needs to occur within RapidMiner Studio by selecting Settings -> Preferences and telling RapidMiner Studio where python.exe is within your Python environment.  By default, the complete path should be:

 

 C:\Users\YourUserName\Anaconda3\envs\YourEnvironmentName\python.exe  

 

Click on the disk icon to the left of the Test command button and navigate to python.exe within your environment twice - once for the "Keras" option and once for the "Python Scripting" option in the Preferences dialog.  Be sure to click on the "Test" button both times.  If there are no errors, you'll get a message box stating that Python has been detected within Anaconda.  On all my Dev boxes, there were no errors, hopefully there will be no errors on your system.

 

Some or all of the set up commands above may not work with Windows 10, but Windows 10 does allow you to set compatibility mode to run various programs, so perhaps experimenting with compatibility settings would help.

 

The installation described above is a CPU installation as opposed to a GPU installation.  GPU installtions will run keras models quicker, but have hardware requirements and the install is tricky.  For more info on this subject visit https://www.google.ca/search?q=keras+gpu+installation&oq=keras+gpu+installation&aqs=chrome..69i57j69...

 

You should (hopefully) now be able to run the Keras samples provided with RapidMiner which are in the repository under the entry Keras Samples.

 

 

If you want to give Tensorboard a spin, you will need to adjust a few default settings of the Keras Model operator in the process.  If you don't want to try Tensorbaord, you can run the process and see the outputs in the RapidMiner Results panel.

 

I attached some screenshots with some example settings if you want to try Tensorboard.   Start with the screenshot of the process parameters panel below:   

 

Keras Setup

 

Select a loss metric from the dropdown opposite the loss parameter (I selected mean_squared_error). 

 

Tick on the "use metric" label below the decay parameter.  Then then lick on the "Edit enumeration" button opposite the "metric" paramter.  A dialog will open, and if you like, you can select an additional metric to visualize.  Theoretically, you should be able to select additional metrics, but selecting more than one causes (on my systems) the process to crash.  The error message states that the desired metric name (a concatenation of metrics in the enumeration) is invalid, even though the process XML would appear to prevent that type of error message.  After selecting a single metric from the enumeration, (I chose mape) click on OK.

 

Enter .25 for the "validation split" parameter (just below the verbose paramter) - I think that setting the validation split paramter to a non zero value enables the display of histograms and distributions in Tensorboard.  

 

Click on the "Edit enumeration" button opposite the callbacks parameter.  Another dialog will open.  Clicking on the dropdown menu will expose several default template callback statements.  The attached screenshot shows three different callback types (RM allows multiple choices for callbacks) that I set.   I don't fully understand how to construct callbacks, especially what you need to specify in order to see embeddings and checkpoints.  You will see that one of the callbacks creates a checkpoint file, but it doesn't display in Tensorboard, and I'm not sure why. I would appreciate any guidance on this point.  The CSV Logger outputs a csv file containing information re: the values of the metrics you selected during the training process.  The screenshot below shows the three callbacks I configured:

 

Sample Callbacks

 

As configured in the screenshot, you should be able to see scalar values, images, graphs, distributions, and histograms in Tensorboard.  I think that setting the histogram_freq value to 1 and write_grads=True enables histograms and distributions to display in Tensorflow as long as the validation split paramter (see first screen shot above) is set to a non zero value.  

 

The next step is to create a directory using Windows Explorer where the callbacks that Tensorboard will read will be written to disk when RapidMiner executes the process.  On my systems, it's C:\TensorboardLogDir

 

Below is a screenshot of the s&p 500 -regression sample process with a few additions - I added a Performance operator, and I connected the "his" and "exa" ports of the Keras Model operator to process "res" ports.  The his (for history) delivers the same information in the csv file generated by the CSV Logger that is referenced in a callback.  

 

Process Design ViewProcess Design View

 

After configuring the process, you can run it - which will take a few moments.  If you go to your Tensorboard log directory, you should see three files assuming you configured the Tensorboard callback as per the screenshot.   There will be an "events out" file, a csv file, and a checkpoint (.ckpt) file.  If you set the "write_images" flag to True in the Tensorboard callback, the file will be quite large - several hundred megabytes.  If you set it to No, it will be around 100 megabytes (that's what the file sizes were on my systems).

 

Create a sub-directory (a new folder) within your Tensorboard log directory and move the three files to that new folder.

 

The next step is to start Tensorboard.  From within your Anaconda prompt for your Python environment, type:

 

tensorboard --logdir="Your Tensorboard Log Directory Name" --host=127.0.0.1 

 

and then Enter.

 

I needed to put quotes around the log directory name.  On my sytems, I typed:

 

tensorboard --logdir=C:\"TensorboardLogDir" --host=127.0.0.1

 

After a few seconds, Tensorboard will load and display a message to the effect that you should open a browser and navigate to http://127.0.0.1:6006

 

After entering that URL in your browser, Tensorboard should load after a few seconds.   On the left hand side of the screen, near the middle, you'll see the word "runs".  You should see a list that has two items, C\ and C\<subdirectory name>.  On my systems, its C\1.  Click on the SCALARS herading at the top of the browser window and and click on C\<Sun-directory Name> below "runs".  Four visualisations of scalar values should appear on your screen.  On my screen, these four values are loss, mean_absolute_percentage_error, val_loss, and _val_mean_absolute_percentage_error.  I'm not sure why I get four metrics as opposed to two, and I don't understand the difference between "loss" and "val_loss" and "mean_absolute_percentage_error" and "val_mean_absolute_percentage_error".  Any guidance on this would be appreciated.  The values for all of the visualisations you will see in Tensorboard come from the "events out" file.

 

If your callback included images, click on IMAGES at the top of the browser window will show images representing your model.  I think images are most usefull in text mining, but perhaps I may be missing something.  Clicking on GRAPHS shows a Graph realization of the neural network, and clicking on DISTRIBUTIONS and HISTOGRAMS shows metadata related to the training progress (the behaviours or layers in the network) over the 256 epochs the training ran.

 

There is an option called INACTIVE next to HISTOGRAMS.  If you click on the INACTIVE option and then click on Projector, you'll get an error message stating that a checkpoint file has not been saved, even though I configured a callback to write a checkpoint file.  I'm sure I'm missing something, any suggestions appreciated.

 

You can then go back to RapidMiner, change the process in some way, and then run the process again.  Three more files will be written to your Tensorboard logging directory.  Create a new subfolder and move the new files into that subfolder.  

 

Go to Tensorboard in your browser and reload the page.  You will then see both of your sub-folders under "runs" near the middle of lthe left hand side of your browser window. You click single or multiple select these subfolders and the appropriate visualisation(s) will appear.  I haven't found a way to toggle between vizzes except by putting files related to each run in its own sub-directory underneath the main Tensorboard log directory.

 

I hope the above has been helpful.  There is still quite a bit of material I don't really yet understand (espcially about callbacks), but at the very least, I hope the above will help with getting one through the setup so that one can at least get a feel for what's going on.  There's a very informative webinar about Keras and RapidMiner at https://rapidminer.com/resource/state-deep-learning/.  It's also on YouTube.  There are also many good resources on the web regarding Tensorflow and Tensorboard, one of the best being tensorflow.org.

 

Best wishes, Michael

RM Certified Analyst
RM Certified Analyst

Postscript to the previous post:

 

I desribed how one should modify the PATH environment variable as part of installing the Graphviz package.  I also said that for good measure (though it may not be absolutely required) that one should add the following directories to the PATH environment variable:

 

C:\Users\YourUserName\Anaconda3\envs\py35;C:\Users\YourUserNamel\Anaconda3\Scripts;C:\Users\YourUserName\Anaconda3\envs\py35\Lib\site-packages  (remember to type in a semicolon after C:\Program Files (x86)\Graphviz2.38\bin before typing in another entry).

 

The above should substitue your Python environment name for my environment name (py35) in the statement above.  

 

There was one type pertaining to what histograms and distributions show in Tensorflow.  As far as I can tell these elements describe activity in the various layers of the network.

 

Michael

Contributor II jacobcybulski
Contributor II

@M_Martin Excellent summary Michael. I usually install Tensorflow before Keras but it really makes no difference, you can in fact switch the back end of Keras to CNTK or Theano is wished to. Thanks for the pointers on the frequencies, I have upgraded my version of Keras and its RM plugin to the most up-to-date vesrions and bindo no more crashes. Images are very useful when processing images of course as you can see their "compression" in the middle of the neural net. Yet, they are generated for all data.  -- Jacob

Contributor II jacobcybulski
Contributor II

After I have successfully installed RapidMiner with Keras extension on a number of machines, both Win10 and Ubuntu 16.04, I thought I'd add to the instructions by @M_Martin and provide a guide for those who wish to install Keras while running RapidMiner on Linux. Here are the Ubuntu 16.04 guidelines.

 

The following steps are needed to set up Keras on Ubuntu 16.04 LTS

It is important that you install the versions the documentation says to install and not anything "better".

NVIDIA CUDA Toolkit (only needed for GPU support)

  • CUDA 8 web site (at this stage Tensorflow dos not work with CUDA 9):
    https://developer.nvidia.com/cuda-downloads
  • Check that you have a CUDA compatible GPU, I.e.
    See the list on https://developer.nvidia.com/cuda-gpus
  • Download CUDA 8 GA2 x86_64 Deb for Ubuntu 16.04 (1.9Gb) + Patch 2 (128 Mb)
    https://developer.nvidia.com/cuda-80-ga2-download-archive
  • Follow instructions to install drivers and toolkit
    If you have a newer driver, install the toolkit manually and skip the driver installation,
    place it in <cudapath> (e.g. /usr/local/, change to yours)
  • Set CUDA variables in your ~/.profile (or ~/.bash_profile), e.g. to:
    export PATH=
       <cudapath>/cuda/bin${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=
       <cudapath>/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    export LD_LIBRARY_PATH=
       <cudapath>/cuda/extras/CUPTI/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    export CUDA_HOME=<cudapath>/cuda
       (you may need to log out and log in again after editing this file)
  • Then install cuDNN 6, from web site: https://developer.nvidia.com/cudnn
    Note that I have not tested Tensorflow with cuDNN 7 but you could give it a go
    You may need to register for download, place it in <cudnnpath> (change it to yours)
  • Copy the following files into the CUDA Toolkit directory:
    $ cd <cudnnpath>
    $ sudo cp -P include/cudnn.h <cudapath>/cuda/include
    $ sudo cp -P lib64/libcudnn* <cudapath>/cuda/lib64
    $ sudo chmod a+r <cudapath>/cuda/lib64/libcudnn*

Anaconda

  • Web site:
    https://www.continuum.io/
  • Download for Python 3.6+, 64bit, from:
    https://www.continuum.io/downloads
  • Install in easily accessible location, e.g.:
    ~/anaconda3
  • Note that in the process of adding Anaconda, you will be prompted to add it to the PATH. If not add it to your ~/.profile:
    PATH="<condapath>/anaconda3/bin:$HOME/bin:$HOME/.local/bin:$PATH"
    (you may need to log out and log in again after editing this file)
  • Open a new command line and test it, I.e.
    $ conda –version

Tensorflow (It now works with Python 3.6+)

  • Web site: https://www.tensorflow.org/
  • Follow instructions for Anaconda, I.e. from command line (recently updated):
    $ conda create -n tensorflow python=3.6 (or lower, e.g. v3.5)
    $ source activate tensorflow (you switch to Tensorflow environment)
  • Install for CPU only:
    (tensorflow) $
    pip install --ignore-installed --upgrade tensorflow
  • Install for GPU only:
    (tensorflow) $ pip install --ignore-installed --upgrade tensorflow-gpu
  • Test your installation by writing a short program, e.g.
    (tensorflow) $ python
    >>> import tensorflow as tf
    >>> hello = tf.constant('Hello, TensorFlow!')
    >>> sess = tf.Session()
    >>> print(sess.run(hello))
    >>> ^D
  • You may also wish to optionally install in that environment (esp. if you decided to use a version of Python different from that in the root):
    (tensorflow) $ conda install numpy
    (tensorflow) $ conda install scipy
    (tensorflow) $ conda install scikit-learn
    (tensorflow) $ conda install pandas
    (tensorflow) $ conda install matplotlib
    (tensorflow) $ conda install jupyter

Keras

  • Make sure you install Keras in the previously defined Tensorflow environment, i.e. from the command line execute:
    $ source activate tensorflow
  • Then install a number of packages, I.e.
    Already installed cuDNN (if using GPU, see above)
    (tensorflow) $ conda install HDF5
    (tensorflow) $ conda install h5py (the previous may be downgraded)
    (tensorflow) $ conda install graphviz
    (tensorflow) $ conda install python-graphviz
    (tensorflow) $ pip install pydot
    (tensorflow) $ conda list (make sure pydot and graphviz are listed)
  • If you were not lucky with graphviz and / or pydot, you can leave their installation for later as both are needed only for charting of Keras models. If I ever had any issues with the installation of Keras, these two were the main cause of issues. If all fails, try googling around. You may however try this workaround to install graphviz software separately (Anaconda only provides an interface to the software), i.e.
    $ sudo apt-get update
    $ sudo apt-get install graphviz
  • Finish installation with:
    (tensorflow) $ pip install Keras
  • You are now ready to use Keras in Python

RapidMiner Studio

  • Download and install RapidMiner Studio
  • Install Keras extension from the RapidMiner Marketplace
  • In Settings > Preferences > Keras tab,
    set Path to Python executable to Python within the Tensorflow environment, e.g.
    ~/anaconda3/envs/tensorflow/bin/python
  • You can start modeling with Keras in RapidMiner Studio

Jacob

Learner I kakkad2
Learner I

Version:1.0 StartHTML:000000301 EndHTML:000034214 StartFragment:000033065 EndFragment:000034011 StartSelection:000033120 EndSelection:000033995 SourceURL:https://community.rapidminer.com/t5/Education-University-Research/Applying-Convolutional-Neural-Netw...) Applying Convolutional Neural Networks (CNN) on th... - RapidMiner

Applying Convolutional Neural Networks (CNN) on the iris dataset

 

Hi everybody,

After connecting Keras and RapidMiner through Python:

 

I was trying to use the CNN operator within the keras model for classification. I used the "add core layer operatror" and it worked just fine. But when I use a CNN layer, I get all kinds of errors depending on what dataset/parameters are in use. Please suggest alternative actions if you have an idea on what might the problem be. I would like to simply use the CNN operator as one of the layers inside the keras model operator in my neural network. Thank you!

RM Certified Expert
RM Certified Expert

One of the important last steps I want to highlight, which is in the above documentation but I missed.

 

Make sure your Python Scripting and Keras paths point to the same Python environment path in Windows. I had them different and kept crashing. 

 

So if you have a Windows machine and point your Keras to C:\Anaconda3\envs\Python35\python.exe, make sure your Python Scripting path is C:\Anaconda3\envs\Python35\python.exe as well.

Contributor II jacobcybulski
Contributor II

I wonder @dgrzech if anything changed in Keras to allow selection of / splitting the work between different GPUs and / or inclusion of own Python code in the Keras process? We are setting a lab full of machines with multiple GPUs, capable of running RapiMiner models by researchers from our business school, i.e. people who are not very technical.

Jacob

RM Research
RM Research

@potto sorry for the late reply. I'm sorry to hear, that our instructions aren't detailed enough. Maybe you could try using Microsofts Cognitive Toolkit as a backend for Keras. It's more easy to install on Windows and it makes no difference for you when using our Keras extension, since it's only the way stuff is executed behind the curtain that changes.

 

  1. Follow the instructions over at Microsofts installation guide to install their Cognitive Toolkit for Python. Make sure to choose the version matching your python version and selecting the one with GPU support, if you want to execute your process on GPUs as well.
  2. Install Keras by running `conda install keras` if you are using Anaconda or `pip install keras` if not.
  3. Run Keras once. E.g. by opening up a command prompt, starting python and running `import keras`.
  4. Now a `keras.json` file should exists in a hidden folder called `.keras` in your users home directory. It might look like this
    {
        "backend": "tensorflow",
        "floatx": "float32",
        "image_data_format": "channels_last",
        "epsilon": 1e-07
    }
  5. Change the 'backend' value in the json file to 'cntk', save the file.
  6. Point the RapidMiner Keras Extension to the Python you are using for CNTK (Cognitive Toolkit).

Hope this helps.

 

Regards,

Philipp

RM Research
RM Research

Hi @jacobcybulski right now the Keras extension has no capabilities to add own python code to the execution or to change the GPU handling. But maybe the upcoming RapidMiner Server 8.0 release might help configuring your lab to assign given nodes with GPUs to certain queues.

 

Some information about the new architecture:

Regards,

Philipp