RapidMiner - the "Apple of Data Science"?
Hello Community,
Like many of you, I spent my childhood during the late 1970s / early 1980s which was really the beginning of the "PC era" - at least through my eyes. The first computer I ever saw was the Radio Shack TRS-80 Model 1 which my elementary school bought in 1981 for goodness-knows-what-reason. My buddy and I placed out of the 4th and 5th grade math curricula in one year and hence, for 5th grade, our "math class" consisted of putting the two of us in front of this TRS-80 for 45 min every day and leaving us to our own devices (pardon the pun). In a year we taught ourselves BASIC and were able to read/write to a wonderful cassette tape drive. We had a pretty good swagger going around school due to our BASIC prowess - we knew how to use this machine better than anyone in the building. Life was good.
a RadioShack TRS-80 Model 1 (source: Wikipedia)
During this time my mother, a math protegy in her youth, went back to school and was in the 1st cohort of "computer science" masters candidates at Pace University (now called the "Seidenberg School of Computer Science and Information Systems", founded in 1983). I often went with her to the mainframe center where she and I wrote software with stacks of punchcards that took hours to compile.
a stack of punchcards ready to be compiled (source: Wikipedia)
She received a job right afterwards with Carl Zeiss, Inc. - charged with pioneering the idea of connecting a microscope with a "PC". Having a PC back then was a novelty, and hence yet again I had a nice swagger being a person who could navigate MS-DOS at home with my own (ok, my mom's) computer. Like many of my peers, I made a nice living on the side setting up computers for people, building databases (dB III), creating spreadsheets (Lotus 1-2-3), and word processing (WordPerfect).
my mother's computer (circa 1985) (source: Pinterest)
College was more of the same. I moved from MS-DOS to Unix and spent much of my time "finger-ing" and "ping-ing" my friends over the new internet, writing email, and using emacs for code. MS Windows was getting popular by this time, but Apple's computers and its GUI were considered "not serious" and "watered down" for us serious computer people. It was "good for graphics", some admitted, but all agreed that there was no way that a drag-and-drop GUI would be useful beyond the toy phase. If you could not see what the computer was doing "under the hood", the thinking went, then you could do things on an Apple without understanding what it was doing. And this was viewed as very dangerous. Outwardly we said you could get into real trouble with your computer, and deep down we were probably threatened by the idea of "non-computer-people" intruding into our geeky, members-only world.
Time moved on and at some point I "saw the light" - moved 100% from PC to Mac - and still remain a diehard Mac user to this day. Having Mac OS X built on a Unix kernel was a huge plus, but more importantly, I saw how the Mac OS was designed to help you do things correctly, and prevent you from doing stupid things (like accidentally downloading 100 viruses or deleting your hard drive). In the current age, Mac OS has 100% of the functionality of a PC (if not more so) but does it in a way that lowers the threshold for access to a computer's capabilities. It is "serious computing for the masses", and the swagger that we all earned in the 1980s has become a source of mockery rather than admiration. "Why on earth would you use command line operators to do things that I can do with one click?", people would say. It sounds quaint now, but it really hurt back then.
My first Mac, circa 1996? (source: Wikipedia)
Fast forward to today and the world of data science. The vast majority of people in this field use Python for this work, followed by R and some other code-based environments (how Excel makes this list is beyond me). And I will argue that the same 1980s swagger that we had for command-line operating systems like MS-DOS and Unix in the 1980s has resurfaced with the data science community today with Python and R. "If you're serious about data science, you must be coding" is a common phrase seen on StackExchange and other platforms. Follow aggregators such as @machinelearnbot on Twitter and you will be innundated with such swagger. The prevailing school of thought says that using drag-and-drop platforms for data science, like RapidMiner, is "not for serious data scientists. How can you be serious if you're not coding?"
(source: Twitter)
Case in point is the Kaggle Competition platform. I think Kaggle is amazing - it is a platform where people with complex data science problems can leverage the entire world's brains in a fun, cost-effective way. But the swagger there is tremendous. If you're not solving these challenges in Python or R, you're not taken seriously. The challenges are not often even ALLOWED to be solved any other way. Why? They will say that it's to keep it all open-source, blah blah blah. Hogwash. The entire RapidMiner core is completely open-source and the majority of RapidMiner users work with the free license. I believe that it's the data science "swagger" that looks at platforms like RapidMiner in the same glasses-down-the-nose manner that we viewed Apple computers in 1989. "How can you possibly solve a 'serious' data science problem in a few minutes via drag-and-drop?"
(source: kdnuggets.com)
As someone who now has the privilege to work for RapidMiner, I will say that the onus is on us, and our community, to design the software so that we can continue to lower the threshold for people to access the groundbreaking tools of data science, exactly the way Apple did in the 1980s. And like Apple, we must guide the user toward effective methods and techniques, and thwart ineffective, unethical, and invalid ones. It is our mission to take ANY user with data and enable her/him to do real data science - fast and simple. If we heed the advice of sages such as George Santayana, perhaps RapidMiner can become the "Apple of Data Science." And wouldn't that be nice?
George Santayana (source: Wikipedia)
"Progress, far from consisting in change, depends on retentiveness. When change is absolute there remains no being to improve and no direction is set for possible improvement: and when experience is not retained, as among savages, infancy is perpetual. Those who cannot remember the past are condemned to repeat it." (George Santayana - "The Life of Reason" - 1905-1906)
Answers
Hi Scott!
I liked your article and agree with many of the points in it.
However, in my opinion Apple is nowadays limiting users, at least on the mobile platforms (and is trying to move into the same direction on the desktop platform). I always refer to their products as a "nicely decorated jail".
RapidMiner, in contrast, is open. You can start quickly and solve data science tasks easily. But you can integrate R and Python stuff that you and your colleges worked on before and put time and love into. You can use the Groovy scripting to easily extend RapidMiner with libraries from the vast Java ecosystem, as I did despite not being much of a developer. And as you mentioned, the core is even open source.
So if Apple was as open as RapidMiner, your comparison would be more valid. I think that the RapidMiner way is the model that should be followed, not the Apple one.
Regards,
Balázs
Hello Balázs,
Thanks for your comments (and reading!). I would say that Apple has ALWAYS created "nicely decorated jails" - it was a big deal in Steve Jobs' era and still remains so. It's the age-old compromise between controlling the environment (and hence controlling stability and UI/UX) and opening up for others. We have seen this play out in the Android vs iOS arena, the iTunes vs Google Play arena, the Windows vs MacOS arena, and several other examples. Developers always want things to be open (so they can develop without Big Brother vetting); users want a seamless experience and just get things done. My point is that "Data Science Tools" is becoming a technology with wide-reaching implications for society with the same hand-wringing that was done in the 1980s with PCs.
So back to RapidMiner, I completely agree that a balance between open-source and control of the UI/UX is ideal, and that RapidMiner is doing a great job trying to strike that balance. Let's continue doing so.
Scott