I have been watching with keen interest how, in the recent couple of years, Gartner has redefined its Magic Quadrants & Market Guides for various layers of what could be called the Enterprise Analytics pyramid.
I observed that the broadly following are the layers that emerge:
- Data warehouses and database management systems (including data-lakes) that form the foundation
- Classic enterprise reporting platforms that sit on top (for system of record reporting)
- New age Business Intelligence & Analytics platforms that enable governed data discovery
- Advanced Analytics platforms – each of these layers has a dedicated Magic Quadrant or a market guide. A clear call out from Gartner says that these distinct layers reflect today’s market context and seller-buyer dynamics, but some layers could potentially fuse in future (as already evidenced in some interesting partnerships between players in these layers – Alteryx & Tableau for e.g.)
Post analysis, one name, especially in the Advanced Analytics Platforms Magic Quadrant stood out: KNIME – the Konstanz Information Miner. Though I have had a brief tryst with KNIME a few years back, their consistent position in the leaders quadrant (in 2015 too) with a significant lead over other respected players on the ‘Completeness of Vision Axis’ (while also maintaining a healthy position on the Ability to Execute), prompted me to take another closer look.
What I see, is quite impressive.
1. An open-sourced data science workbench that helps build visually intuitive data science workflows. See the headline image – it doesn’t require much explanation to understand the model training & validation workflow followed (by the way, I used the airline on-time performance dataset for one month, consisting of about half a million records, for a rough-draft logistic regression model build to predict delays as a function of weekday/time/distance/origin/destination). I could see KNIME offering a very good alternative compared to coding intensive open source platforms – for analytic users that may understand the concepts and steps of even relatively complex algorithms, but don’t have the time or expertise to code them in detail (because of which these could be out of reach). Importantly, this designer workbench is not charged and is readily available for download and use.
2. Commercially licensed extensions that add value: since it’s been a while since my previous usage, I was pleased to note the Big Data Connectors that help to read from / write to HDFS & Hive, the Spark Executor (and/or Cluster Executor in some cases) which allows to leverage centrally located big data/cluster compute infrastructure on a need basis – very neatly thought through and architected, while maintaining the core principle of workflow design.
3. Collaboration & productivity extensions (also commercially licensed) such as on-premise or cloud server deployments, and utilities that allow collapsing an entire workflow into an encrypted meta-node (relevant if there’s a need to protect the analytic workflow as intellectual property) are very useful extensions too.
4. Ability to integrate with other tools (R/Python/..), or visualization tools such as Tableau, Spotfire, Social Media APIs, etc., which help in constructing & orchestrating the end-to-end workflow across multiple platforms without having to hop to different places, all from within KNIME.
While my top preference in the open source space would still be R or Python code based analytic solutions in the form of reusable notebooks (for the amount of freedom that they offer with respect to analytic code and hence the immense flexibility), and I see & hear that visual workflow concept is a lot more widely adopted in the industry – especially in broader enterprise analytics user groups – through tools such Alteryx (looking to review the Alteryx designer soon), it is nice to note the unique proposition offered by Konstanz Information Miner (KNIME) through an interesting combination of Cost, Performance & Functionalities. Well done, and definitely worth adding to the repertoire of tools of data scientists – both the code savvies and the not-sos.