Now, before we get into it, I have to admit that I am somewhat of a Data Science sceptic. Not that I do not believe in Data Science, quite the opposite, I think it is a critical field in data analytics world that will eventually mature enough to be the dominant force in insight generation. However, having been around hundreds of Data Science related initiatives, I became quite cynical about how practical these initiatives actually are. In my estimate, 90% of these initiatives do not get past the POC phase (in other words, they do not get deployed to production), even though close to 100% of them are advertised internally and externally as unbelievable success.
With that in mind, I was surprised to come across this article (We Don’t Need Data Scientists, We Need Data Engineers) written by a legit data scientist Mihail Eric. In my experience, it is hard to find a data scientist who would not consider their trade as the top of the data analytics food pyramid. Data Scientists (in their mind) are the apex predators, and the data engineers are there to do the menial and basic work. Mihail, however, put some elbow grease into his research and he concluded in his article that demand is actually higher for data engineers than data scientists as you can see in the excerpt below:
TLDR: There are 70% more open roles at companies in data engineering as compared to data science. As we train the next generation of data and machine learning practitioners, let’s place more emphasis on engineering skills.
I think he makes some interesting points in his article and his distribution of demand is definitely thought provoking as you can see below:
However, I cannot say that I fully agree with his analysis.
- In Mihail’s opinion the demand for Data Engineer vs Data Scientist is 70% higher. Unfortunately, this number is not intuitive, and it think his approach to arrive at this number is flawed. We cannot talk about perceived demand for any of these positions without understanding how big the pool sizes for each role. I think the data engineering pool is order of magnitude larger that that of data scientists. I also think that it takes significantly less time to fill data engineering positions relative to filling data science positions, which means that the data engineering demand will tend to be understated and data science demand will tend to be over stated. Which means that the chart above is not very representative of reality in and of itself.
- Mihail states that demand for data scientists may be negatively affected by how the field is maturing with frameworks like Tensorflow and PyTorch becoming better and better every day. So, the tools are getting better, therefore we do not need as many bodies. The tools are getting better, but the betting average of the AI initiatives is not. I still see about 10% of them getting productionalized or even fewer. Why the success rate is so low is beyond the scope of this post, but let us just say that business acumen is still a thing and no tools seem to make it less relevant yet.
- Lastly, I disagree with Mihail in his assessment that Data Science is somehow far along in Gartner’s Hype Cycle. I think that overall, the field of Data Science is still very much in the Inflated Expectations where I think he believes that it is now climbing the slope of enlightenment.
I think that the field of Data Science is very far from being mature and therefore we need more data scientists, but we need data scientists who know how to work with data and who have the business acumen to really understand the issues we are trying to solve and to effectively communicate the discoveries back to business audience using the language that the audience can actually understand.