Job Detail - Data Engineer III

Job Description for Data Engineer (primarily) with MLOps flavour

Skills Required:

  1. Good understanding of cloud technologies - Azure/AWS (primarily - Azure)
  2. 4+ years of experience in Data Warehousing projects with working experience on data modeling, data normalization techniques, slow changing dimensions, star & snowflake schemas.
  3. Strong capability on advanced SQL.
  4. Experience on both structured and unstructured data (RDBMS and NoSQL databases, and vector database would be a plus)
  5. 4+ years of experience on developing and maintaining data pipelines (ETL).
  6. Should have experience with handling big data - Spark and Cloud hosted relational databases like Snowflake, Redis, SQL Servers.
  7. Experience on performance optimization of both data pipelines and complex SQL queries
  8. 2-3 years of experience on programming language python is must, should have knowledge on modular & OOP programming.
  9. Technical know-how to convert jupyter notebooks or python code written by data scientists to sophisticated production ready code, preferrable MLOps platform would be Azure Machine Learning services, Spark/Databricks, MLflow.



Roles & Responsibilities –

  1. Collaborate with business analysts, data scientists and other stakeholders to understand their data needs and requirements.
  2. Build data pipelines on Azure Data Factory while architecting data pipelines which are easy to maintain.
  3. Write Documentation (Technical Specifications Documents) for the ETL pipelines developed by you.
  4. Have ownership of various data pipelines.
  5. Review and audit datapipelines built by peers.
  6. Debug the datapipelines at the earliest when they fail.
  7. Respond quickly to ad-hoc requests to generate any data reports based on SQL queries.
  8. Think critically when building data pipelines, explicit and easily comprehensible is always better than complex data transformations.
  9. Optimize datapipelines accordingly based on their resource utilization and execution time.
  10. Write production ready code for the code written by data scientists preferably on Azure ML layer.