Data Engineering vs Data Science vs Data Analytics
15 minutes
“3 billion gigabytes of data was created between the beginning of civilisation and 2003. In 2023, we create that much everyday.”
Data has become a talking point globally in the last 10 years. With the development of supercomputers that can process large data, availability of this computing power on cloud servers, access to data storage and companies willing to invest in these technologies, data jobs are created in all industries today.
Being a '“data-driven” person and having an “analytical mind” is not enough to break into this industry today. It is important to research about the possibilities the big data landscape has to offer and choose the right profile that fits your desire and skillset. The first mistake people do when switching into or starting a data-driven career is not understanding the different job profiles and blindly applying to any job that has ‘data’ in the title and ‘SQL’ or ‘Python’ or ‘database’ in it’s description.
This story aims to help you understand the difference between three dominant data jobs; data analysts, data engineers and data scientists. I will also try to shed some light on other technical and non-technical data job profiles at the end.
The journey of data :
Throughout the journey of data, different data professionals act on the data and create value for their stakeholders.
Data Engineers :
Data Engineers are the first responders to data in most organisations. They are responsible for collecting data and bringing it into the organisation, preparing it for the other teams that create value from data and developing infrastructure to support these teams with their data needs.
They build data pipelines that move data from one source to a destination.
They are responsible for cleaning raw data and implementing a structure to data.
They build tests on data to validate it’s quality so that the other teams can rely on the data.
They have to build cost-efficient and optimised solutions for data pipelines, data storage and data manipulation.
Skills :
Strong programming skills to build scalable and efficient data pipelines (SQL, Python, Spark, Java)
Orchestration tools which will schedule and execute data pipelines (Airflow, dagster)
Cloud computing technologies
Data Analysts :
Data Analysts are experts that understand what the data means. While most people see rows and columns, a data analyst understands the data and narrates it with a meaning. Every data is stored and processed for a reason, because it helps to answer business questions. You can ask a question and a data analyst will look into the data to provide with answers.
They have a good understanding of the domain. (product data analyst / marketing data analyst / supply chain data analyst / purchase data analyst …)
They have good communication skills and can present analysis to non-technical stakeholders.
They have strong analytical skills, they can look at data and understand the patterns it shows.
Skills :
SQL, Python for data manipulation
Data Visualisation skills
Domain / business understanding
Analytical reasoning
Data Scientists :
Data scientists are problem solvers, they see a business problem and use a data-driven solution to solve it. The solutions they create can be complex mathematical equations that predict the future or simple rule-based instructions. They are good at math and statistics.
An organisation could have some problems like users that unsubscribe from a service, not being able to sell products, facing frauds, etc. The possibilities are endless. The data engineers are able to get data from different sources relevant to business problems. The data scientist has the job to use this data solve the problems. They use statistical modelling to predict future events and forecasts, process natural languages, classify images and videos, identify fraud activities. These solutions do not just help businesses to make decisions, but actually solve the problems to some extent.
Use Machine Learning and AI concepts on a business problem
Generate important elements (features) from data to train algorithms and use these algorithms to predict and forecast events
Validate the algorithms and improve the accuracy with which they solve business problems
Skills :
Strong programming skills to build ML and AI algorithms (Python, R)
Good business understanding
Strong academic and theoretical understanding of statistics
Let’s also look at some other data careers :
Business Intelligence Engineers :
BI engineers create solutions with which, other non-data professionals can easily access data and derive insights and analytics from it. An important skillset for BI engineers is the ability to create reports, dashboards and visualisations which make it easy to understand data. They use BI tools like Tableau, Microsoft BI, Google Looker studio, Periscope, etc.
Data Operations (DataOps) Engineers :
DataOps Engineers are responsible for maintaining day-to-day activities during the journey of data. DataOps engineers maintain the solutions created by data engineers by testing data, monitoring the pipelines, updating the data models and data pipelines based on business needs, creating processes on the operations of a business and carrying out day-to-day updates, communications and analysis related to the data.
Analytics Engineers :
Analytics Engineers focus on (complex) data transformations, and their key responsibility is to understand the business needs and accordingly clean and transform data to get the desired insights and analytics from raw or warehouse data.
Data Architects :
Data architects are planners that use computer science and system design skills to plan an organisations data, infrastructure, databases and storage and data science and analytics deployment solutions. They forecast what the organisation could need in the future to support its data, and deploy solutions accordingly.
Data Strategists :
Data Strategists are non-technical professionals that have a strong understanding of business needs and the data. They form relationships between different stakeholders, create plans on the project execution and manages the integration of data across platforms and teams.
Data Product Owners :
Data product owners are responsible for the correct use of data and the possible actions that all teams can take on the data. Data product owners are owners of the data in an organisation, they oversee the SLAs, compliance, documentation and decide if a certain dataset is relevant for a project or an application.
These are some of the most sought data professionals in the industry, but the list is not extensive. As you might have understood, there are many technical and non-technical stakeholders that work with data. Just like the amount of data produced globally, the possibilities of a career in data are endless.