How to become a Data Engineer

And the skills to master it

How to become a Data Engineer !

What are the skills required to become a data engineer

A data engineer is a tech professional who builds and maintains the infrastructure that allows organizations to collect, store, and use data effectively.

Following are the courses that should be pursued be able to become a data engineer.

  1. Computer Science Fundamentals (If you don’t have a CS background)

edX — Introduction to Computer Science and Programming Using Python | edX

edX — CS50’s Introduction to Computer Science

Coursera — Computer Communications Specialization

Book — Grokking Algorithms: An illustrated guide

2. Programming Language

Do any courses, your main goal here is to understand how to write basic Python

Code and how to work with different datasets!

DataCamp — Data Engineering With Python

Coursera — Python for Everybody Specialization (Do this if you don’t know anything about python)

edX — Python Basics for Data Science | edX

Udemy — Python Bootcamps: Learn Python Programming and Code Training

Practice Projects:

  • Scrape Data Using BeautifulSoup Library eg. Amazon, Covid, Wikipedia, or any website you like

  • Build A Calculator Using Python

3. SQL (Structured Query Language)

Learn about the basics of SQL and how to write queries, once you complete the course make sure you do hands-on practice on Hackerrank or any website you like!

Udemy — The Complete SQL Bootcamp for the Manipulation and Analysis of Data (Recommended)

Coursera — SQL for Data Science

DataCamp — Intro To SQL DataCamp

Practice SQL here

4. Basics Of Linux

Why Linux? Because you will be working with many remote machines, doing SSH to access them, and performing operations so it’s important to learn them.

You don’t have to remember all the commands but just understand what they do and how to write them

Udemy — Linux for Beginners: Linux Basics (Recommended)

Coursera — Linux Fundamentals

Do Hands-On Project

5. Big Data Fundamentals

This section is theoretical and you need to understand how big data system works and their history of them

Coursera — Big Data Specialization (Recommended)

edX — Big Data Fundamentals

Udemy — Learn Big Data: The Hadoop Ecosystem Masterclass (Do this if you want to learn about legacy systems)

Data Warehouse Fundamentals

Same as the previous section, more theory, and understanding of concepts

Coursera — Data Warehousing for Business Intelligence Specialization (recommended for deep dive)

Udemy — Data Warehouse Fundamentals for Beginners (recommended for quick learning)

6. Learn Batch/Realtime Streaming Pipeline Building

Batch Pipeline (Spark)

DataCamp — Big Data Fundamentals with PySpark (recommended)

Udemy — Spark and Python for Big Data with PySpark

7. Realtime Streaming (Kafka)

Udemy — Apache Kafka Course for Beginners: Learn Kafka Online (check this)

edX — Building ETL and Data Pipelines with Bash, Airflow, and Kafka

8. Data Orchestration (AirFlow)

Udemy — The Complete Hands-On Introduction to Apache Airflow

DataCamp — Airflow

9. Dashboard Tool
There are two ways to visualize, one using code and another one using the tool so I have added both here

Udemy — Python Data Analysis & Visualization Masterclass (Using Code)

Udemy — Tableau 10: Training on How to Use Tableau For Data Science (Using Tool)

Coursera — Data Visualization with Tableau Specialization

Udemy — Microsoft Power BI with Desktop Training Course

10. Cloud Computing

Advance section, do courses, and then do the certification to add value in your Resume, If you are new then start with AWS but if you know about other clouds then you can do that too!

AWS (Amazon Web Services)

Udemy — Ultimate AWS Certified Cloud Practitioner

Udemy — Ultimate AWS Certified Solutions Architect Associate (SAA)

CP (Google Cloud Platform)

Coursera — Cloud Data Engineer Professional Certificate

Microsoft Azure

Udemy — AZ-900: Microsoft Azure Fundamentals

Udemy — Azure Data Engineer Certified:8 COURSE BUNDLE

Once you learn about different services then consider doing some hands-on projects

Do Hands-On — Data Engineering Cloud Project Series (AWS)

Do Hands-On — YouTube Data Analysis Project (AWS)