RapidMiner - the "Apple of Data Science"?
Like many of you, I spent my childhood during the late 1970s / early 1980s which was really the beginning of the "PC era" - at least through my eyes. The first computer I ever saw was the Radio Shack TRS-80 Model 1 which my elementary school bought in 1981 for goodness-knows-what-reason. My buddy and I placed out of the 4th and 5th grade math curricula in one year and hence, for 5th grade, our "math class" consisted of putting the two of us in front of this TRS-80 for 45 min every day and leaving us to our own devices (pardon the pun). In a year we taught ourselves BASIC and were able to read/write to a wonderful cassette tape drive. We had a pretty good swagger going around school due to our BASIC prowess - we knew how to use this machine better than anyone in the building. Life was good.
a RadioShack TRS-80 Model 1 (source: Wikipedia)
During this time my mother, a math protegy in her youth, went back to school and was in the 1st cohort of "computer science" masters candidates at Pace University (now called the "Seidenberg School of Computer Science and Information Systems", founded in 1983). I often went with her to the mainframe center where she and I wrote software with stacks of punchcards that took hours to compile.
a stack of punchcards ready to be compiled (source: Wikipedia)
She received a job right afterwards with Carl Zeiss, Inc. - charged with pioneering the idea of connecting a microscope with a "PC". Having a PC back then was a novelty, and hence yet again I had a nice swagger being a person who could navigate MS-DOS at home with my own (ok, my mom's) computer. Like many of my peers, I made a nice living on the side setting up computers for people, building databases (dB III), creating spreadsheets (Lotus 1-2-3), and word processing (WordPerfect).
my mother's computer (circa 1985) (source: Pinterest)
College was more of the same. I moved from MS-DOS to Unix and spent much of my time "finger-ing" and "ping-ing" my friends over the new internet, writing email, and using emacs for code. MS Windows was getting popular by this time, but Apple's computers and its GUI were considered "not serious" and "watered down" for us serious computer people. It was "good for graphics", some admitted, but all agreed that there was no way that a drag-and-drop GUI would be useful beyond the toy phase. If you could not see what the computer was doing "under the hood", the thinking went, then you could do things on an Apple without understanding what it was doing. And this was viewed as very dangerous. Outwardly we said you could get into real trouble with your computer, and deep down we were probably threatened by the idea of "non-computer-people" intruding into our geeky, members-only world.
Time moved on and at some point I "saw the light" - moved 100% from PC to Mac - and still remain a diehard Mac user to this day. Having Mac OS X built on a Unix kernel was a huge plus, but more importantly, I saw how the Mac OS was designed to help you do things correctly, and prevent you from doing stupid things (like accidentally downloading 100 viruses or deleting your hard drive). In the current age, Mac OS has 100% of the functionality of a PC (if not more so) but does it in a way that lowers the threshold for access to a computer's capabilities. It is "serious computing for the masses", and the swagger that we all earned in the 1980s has become a source of mockery rather than admiration. "Why on earth would you use command line operators to do things that I can do with one click?", people would say. It sounds quaint now, but it really hurt back then.
My first Mac, circa 1996? (source: Wikipedia)
Fast forward to today and the world of data science. The vast majority of people in this field use Python for this work, followed by R and some other code-based environments (how Excel makes this list is beyond me). And I will argue that the same 1980s swagger that we had for command-line operating systems like MS-DOS and Unix in the 1980s has resurfaced with the data science community today with Python and R. "If you're serious about data science, you must be coding" is a common phrase seen on StackExchange and other platforms. Follow aggregators such as @machinelearnbot on Twitter and you will be innundated with such swagger. The prevailing school of thought says that using drag-and-drop platforms for data science, like RapidMiner, is "not for serious data scientists. How can you be serious if you're not coding?"
Case in point is the Kaggle Competition platform. I think Kaggle is amazing - it is a platform where people with complex data science problems can leverage the entire world's brains in a fun, cost-effective way. But the swagger there is tremendous. If you're not solving these challenges in Python or R, you're not taken seriously. The challenges are not often even ALLOWED to be solved any other way. Why? They will say that it's to keep it all open-source, blah blah blah. Hogwash. The entire RapidMiner core is completely open-source and the majority of RapidMiner users work with the free license. I believe that it's the data science "swagger" that looks at platforms like RapidMiner in the same glasses-down-the-nose manner that we viewed Apple computers in 1989. "How can you possibly solve a 'serious' data science problem in a few minutes via drag-and-drop?"
As someone who now has the privilege to work for RapidMiner, I will say that the onus is on us, and our community, to design the software so that we can continue to lower the threshold for people to access the groundbreaking tools of data science, exactly the way Apple did in the 1980s. And like Apple, we must guide the user toward effective methods and techniques, and thwart ineffective, unethical, and invalid ones. It is our mission to take ANY user with data and enable her/him to do real data science - fast and simple. If we heed the advice of sages such as George Santayana, perhaps RapidMiner can become the "Apple of Data Science." And wouldn't that be nice?
George Santayana (source: Wikipedia)
"Progress, far from consisting in change, depends on retentiveness. When change is absolute there remains no being to improve and no direction is set for possible improvement: and when experience is not retained, as among savages, infancy is perpetual. Those who cannot remember the past are condemned to repeat it." (George Santayana - "The Life of Reason" - 1905-1906)