Microsoft’s Cortana Analytics Suite (CAS) seems to be getting market traction. This post will not list the offerings contained within this suite, as that is well covered in Microsoft’s own literature. What this post will attempt to cover, though, is ‘your’ perspective.
Why is CAS touted as being an ‘end to end’ Big Data & Analytics platform?
CAS has components for ingesting data from various sources, storing the data, performing transformations on the data, performing analytics on the data, and, visualizing the results. So, you get all the key tools that you’d want in a Data Engineering or Data Science project under one roof.
Do I need to have PhDs on my team, for using CAS ?
Let’s take a step back and understand why Microsoft invested in creating this suite. It is well known that Enterprises and Consumers are swimming in a sea of data. This amount of data potentially can help everyone make better and more informed decisions. But here’s the catch. In order to glean useful insights from the data, some fairly advanced algorithms need to be applied on the data. Most of the enterprises today do not have these skills, and the gap will not be bridged that easily. Microsoft’s CAS fits well into this gap.
Azure Machine Learning (ML), which is one of the components contained within CAS, provides a good collection of advanced algorithms for solving business problems. These can be used by non-PhDs also. For instance, a Software or Data Engineer, with a *good* grasp of analytics concepts, can decide which of these algorithms fit a business problem, evaluate them, pick the right one, explain the results to the business stakeholders, and deploy that into production. Even as you read this, several Software and Data Engineers are making this transition into Data Science. Not a cakewalk, but highly doable. Expect to see more of this upskilling as we go along.
So, does that make PhDs obsolete? Absolutely not. PhDs come in for creating new algorithms, enhancing existing ones and providing other deep expertise on a need basis.
This is Microsoft’s way of ‘democratizing’ Data Science.
Is Microsoft alone on this path ?
Not really, IBM, Google and some others have offerings in this area.
So how do I know which one to pick ?
J Hate to say ‘it depends’, but you know, it does depend upon several factors. For instance, do your business use cases call for some serious integration with the existing IT systems? If yes, what kind of a shop are you — Microsoft or non-Microsoft ? Is your company open to using open source tools, or is there a very strong cultural preference for commercial tools? Is there an aversion towards cloud, driven by internal or external (say, regulatory) factors ? This list goes on…
Note that, in the above points, we have not included a head-to-head comparison between the product stacks. Why? That is because many of the product features (or lack thereof) are fairly transient. A feature that is not present in any stack currently might appear in a couple of months’ time, given the aggressive pace of innovation and investments being done by the vendor companies.
In other words, you should base your Big Data and Analytics stack decision less on the ‘current’ product features, and more on the overall fitment with your current IT eco-system.
A key benefit of this approach is that it will provide you with incremental business benefits using shorter incursion loops into Big Data and Analytics, rather than a ‘big bang’ approach of spending millions and setting up a grandiose infrastructure that you’re not 100% confident will provide proportionate business benefits.
Can I benefit from using just one or two modules within CAS ?
Yes. CAS is just a well-rounded collection of components. While they do talk well with each other, not all of them are needed for every conceivable project. You can still get real business benefits by just using individual components. For instance, a prediction model hosted within Azure ML can be called by your on-premise application on a need basis. The predicted value can then act as an input to the next step in your on-premise workflow.
Hope the above paragraphs provide you with answers to at least some of the questions playing around in your mind.
About the Author
Jayaprakash (JP) Nair is an IT industry veteran with about 19 years of experience in the IT industry, across multiple domains, technologies and project methodologies, with start-ups as well as MNCs (Dell and Siemens). Lived/worked extensively in various geographies (US, Dubai, India), managing multi-cultural teams and owning turnkey projects up to several MM dollars. Agile coach and Enterprise Architecture (TOGAF) practitioner. A technologist at heart, with a penchant for teaching. Currently heads the Big Data & Analytics CoE at Aspire Systems.