The confusion is real. Many companies just hire them without knowing that their skill set is almost completely different only to regret their decision later. The only similarity between them is they work on data. There is a really clear distinction between Data Scientist and Data Engineer actually.
The one who laid the path for us
Data Engineer, their responsibility is to create a way for us to comfortable pull out any data we want. They prepared the plan on how our data infrastructure would be look like. Created data pipeline processes. And make sure, every data we would like to work on is available for us.
Based on that description, the skill set should looks like this:
- Data Infrastructure. From databases (RDBMS, NoSQL), big data processing tools (MapReduce, Spark), to servers operating system (Ubuntu, *nix)
- Data Cleaning. I will talk about this topic later exclusively
- Basic Data Transformation. Know how to change data from one form to another, from one place to other place.
The one who brings out insights
As opposed to Data Engineer, Data Scientist turns the available data into valuable insights. They need to use the product made by Data Engineer and run the data pipeline process. Change the data using ETL into a more workable datasets and turn them into actionable insights. Or even better, plan a better product with those insights directly, i.e. recommendations.
So the skill set would be:
- ETL, I have covered this topic here.
- Python or R programming language
- SQL queries and NoSQL queries
- Big data processing (Spark, MapReduce)
- Would be good, Machine Learning and Deep Learning. Look at the list of Machine Learning projects I post.
Okay, I know what are the skill sets. What’s next?
These are the real cases you can refer to.
You want to start doing work on data
You probably don’t know where to start. This means, you need to lay out a foundation first before hiring full fledged Data Scientist. In this stage, you probably need a Data Engineer who can open a path for you. Storing all of your logs data, and prepare the right infrastructure.
Maybe you will like it more if the Data Engineer you will hire has a subset of the skill sets needed as a Data Scientist, like SQL queries, Python, and statistics. This will covers your needs to get started.
Okay I have the data stored well. Do I need a Data Scientist?
Yes, at this point, you need a Data Scientist. You can tell him your business. How it does lately, what kind of improvement do you need, etc. He will try to process your data and gives back insights for you. And even better, he will know the real state of your business better than you.
I have both of them now. But, I can see they are overwhelmed with work. How to scale the team?
There are two ways of how to scale those two teams. Specialization and function.
What I mean specialization is the team member will do a more specialized tasks. In example, a machine learning scientist will lean into creating a more sophisticated machine learning model, while data analyst will lean more into historical data statistical analytics.
And function is basically on how the member will serve which ones. It is the old school way to scale. You will have a business intelligence team which serves the marketing and the management, and the usual data science team who serves the product team. This data science team can be split according to the product you have, maybe you have three products, and you might be have three data science teams.
That is also applied for Data Engineers. Each member could be specialized in a more advanced technology like the Google TPU and they could be also split by function, attaching them into BI or Data Science teams.
At the end of the day, you will be the one knows your team better. Based on the current situation and your needs, you can hire the correct one. Remember, don’t get caught in the hype.
You get it or you lose it. You choose!
Please share it if you like it!