Cross Validation: the illusion of reliable performance estimation

by Community Manager ‎02-14-2017 08:47 AM - edited ‎02-14-2017 08:48 AM

A 2010 paper from one of the founders of Radoop about the common pitfalls on mis-using Cross Validation and selecting models. This a must read and thanks for digging this up @mschmitz


I remember this presentation from 2010 in Dortmund!

Walmart Sales Prediction Using Rapidminer

by Community Manager on ‎01-24-2017 10:15 AM

Article from Nagarjun Singharavelu at New Jersey Institute of Technology using Kaggle-sourced data sets. 


The analysis looks at sales data across 45 Walmart stores  and studies the effectiveness of discounting strategies.








Effective CRM using Predictive Ananlytics

by Community Manager ‎12-14-2016 04:44 AM - edited ‎12-14-2016 04:47 AM


By Antonios Chorianopolous @Antonios


A step-by-step guide to data mining applications in CRM.

Following a handbook approach, this book bridges the gap between analytics and their use in everyday marketing, providing guidance on solving real business problems using data mining techniques.

The book is organized into three parts. Part one provides a methodological roadmap, covering both the business and the technical aspects. The data mining process is presented in detail along with specific guidelines for the development of optimized acquisition, cross/ deep/ up selling and retention campaigns, as well as effective customer segmentation schemes.

In part two, some of the most useful data mining algorithms are explained in a simple and comprehensive way for business users with no technical expertise.

Part three is packed with real world case studies which employ the use of three leading data mining tools: IBM SPSS Modeler, RapidMiner and Data Mining for Excel.  Case studies from industries including banking, retail and telecommunications are presented in detail so as to serve as templates for developing similar applications.

Data Mining A Tutorial-Based Primer, Second edition (author Richard Roiger) is now available!

by Community Manager ‎12-08-2016 07:24 AM - edited ‎12-08-2016 08:39 AM



Two of the book’s 14 chapters are devoted to learning how to use RapidMiner Studio for building models to solve complex problems. Five additional chapters use RapidMiner Studio for one or several data mining tutorials on attribute selection, dealing with imbalanced data, outlier analysis, time-series analysis, mining textual data, cluster analysis, and more. Please visit  to examine the table of contents and supplements associated with the text.


"Dr. Roiger does an excellent job of describing in step by step detail formulae involved in various data mining algorithms, along with illustrations. In addition, his tutorials in Weka software provide excellent grounding for students in comprehending the underpinnings of Machine Learning as applied to Data Mining. The inclusion of RapidMiner software tutorials and examples in the book is also a definite plus since it is one of the most popular Data Mining software platforms in use today."


--Robert Hughes, Golden Gate University, San Francisco, CA, USA



How to Steal a Predictive Model

by Community Manager on ‎10-04-2016 06:15 AM

Validation of Data Mining Advanced Technology in Clinical Medicine

by Community Manager on ‎10-04-2016 05:45 AM

Authors: Abd Elrazek M Aly Abd Elrazek(1), and Ahmed Elbanna(2)


1 Department of Hepatology & Gastroenterology, Aswan school of Medicine, Aswan University, Aswan, Egypt.

2 Department of Computer Science and Technology TU Chemnitz, Germany.


Published June 2016



A Little Light Reading

by Community Manager ‎05-26-2016 04:15 PM - edited ‎07-19-2016 11:36 AM

An Introduction to Data Science

   Jeffrey Stanton, 2013

School of Data Handbook

   School of Data, 2015

Data Jujitsu: The Art of Turning Data into Product

   DJ Patil, 2012

The Data Science Handbook

   Carl Shan, Henry Wang, William Chen, & Max Song, 2015

The Data Analytics Handbook

   Brian Liou, Tristan Tao, & Declan Shener, 2015

Data Driven: Creating a Data Culture

   Hilary Mason & DJ Patil, 2015

Building Data Science Teams

   DJ Patil, 2011

Understanding the Chief Data Officer

   Julie Steele, 2015

The Elements of Data Analytic Style

   Jeff Leek, 2015

Hadoop: The Definitive Guide

   Tom White, 2011

Data-Intensive Text Processing with MapReduce

   Jimmy Lin & Chris Dyer, 2010

Learn SQL The Hard Way

   Zed. A. Shaw, 2010

SQL Tutorial 

   Tutorials Point

Introduction to Machine Learning 

   Amnon Shashua, 2008

Machine Learning

   Abdelhamid Mellouk & Abdennacer Chebira, 450

Machine Learning – The Complete Guide


Social Media Mining An Introduction

   Reza Zafarani, Mohammad Ali Abbasi, & Huan Liu, 2014

Data Mining: Practical Machine Learning Tools and Techniques

Ian H. Witten & Eibe Frank, 2005

Mining of Massive Datasets

Jure Leskovec, Anand Rajaraman, & Jeff Ullman, 2014

A Programmer’s Guide to Data Mining

Ron Zacharski, 2015

Probabilistic Programming & Bayesian Methods for Hackers

Cam Davidson-Pilon, 2015

Data Mining Techniques For Marketing, Sales, and Customer Relationship Management

Michael J.A. Berry & Gordon S. Linoff, 2004

Inductive Logic Programming: Techniques and Applications

Nada Lavrac & Saso Dzeroski, 1994

Pattern Recognition and Machine Learning

Christopher M. Bishop, 2006

Machine Learning, Neural and Statistical Classification

D. Michie, D.J. Spiegelhalter, & C.C. Taylor, 1999

Information Theory, Inference, and Learning Algorithms

David J.C. MacKay, 2005

Bayesian Reasoning and Machine Learning

David Barber, 2014

Gaussian Processes for Machine Learning

C. E. Rasmussen & C. K. I. Williams, 2006

Reinforcement Learning: An Introduction

Richard S. Sutton & Andrew G. Barto, 2012

Algorithms for Reinforcement Learning

Csaba Szepesvari , 2009

Big Data, Data Mining, and Machine Learning

Jared Dean, 2014

Modeling With Data

Ben Klemens, 2008

Deep Learning

Yoshua Bengio, Ian J. Goodfellow, & Aaron Courville, 2015

Neural Networks and Deep Learning

Michael Nielsen, 2015

Data Mining and Analysis: Fundamental Concepts and Algorithms

Mohammed J. Zaki & Wagner Meira Jr., 2014

Theory and Applications for Advanced Text Mining

Shigeaki Sakurai, 2012

Think Stats: Exploratory Data Analysis in Python

Allen B. Downey, 2014

Think Bayes: Bayesian Statistics Made Simple

Allen B. Downey, 2012

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

Trevor Hastie, Robert Tibshirani, & Jerome Friedman, 2008

A First Course in Design and Analysis of Experiments

Gary W. Oehlert, 2010

D3 Tips and Tricks

Malcolm Maclean, 2015

Interactive Data Visualization for the Web

Scott Murray, 2013

Disruptive Possibilities: How Big Data Changes Everything [Buy on Amazon]

Jeffrey Needham, 2013

Real-Time Big Data Analytics: Emerging Architecture

Mike Barlow, 2013

Big Data Now: 2012 Edition

O’Reilly Media, Inc., 2012

Computer Vision [Buy on Amazon]

Richard Szeliski, 2010

Concise Computer Vision [Buy on Amazon]

Reinhard Klette, 2010