Data Engineer

De Heus Asia

Tòa nhà Sofic, 10 Đường Mai Chí Thọ, Quận 2, An Khanh, Ho Chi Minh

At office

Posted 5 days ago

Skills:

Job Expertise:

Job Domain:

Food and Beverage

Consumer Goods

Manufacturing and Engineering

Agriculture

Top 3 reasons to join us

JOB PURPOSE:

The Data Engineer is responsible for designing, building, operating, and optimizing the Lakehouse data platform on Microsoft Azure, with a strong focus on Azure Databricks, Unity Catalog, Delta Lake, and related integration services. This role ensures data is ingested, transformed, governed, and served in a stable, secure, scalable, and AI/GenAI-ready manner to support reporting, advanced analytics, self-service data use, and machine learning use cases both now and in the future.
The position works closely with business teams, analysts, data visualization specialists, Data Champions, IT, and regional stakeholders to deliver production-grade data pipelines with strong emphasis on data quality, observability, performance, cost efficiency, and operational excellence across the platform.
In addition, the Data Engineer proactively leverages AI tools (including Azure OpenAI and GenAI assistants) to improve coding productivity, documentation, troubleshooting, and automation, while designing the data platform with an AI-ready architecture capable of supporting feature engineering, model training, and real-time inference in the future.

JOB DESCRIPTON:

1. Lakehouse Platform Development & Maintenance

Design and build Lakehouse architecture on Azure Databricks using Unity Catalog, Delta Lake, and Medallion Architecture; develop production-grade pipelines using Azure Data Factory, Databricks Workflows, Auto Loader, and modern integration methods.
Apply incremental loading, CDC, merge operations, schema evolution, Liquid Clustering, monitoring, alerting, logging, observability, and automated table maintenance (OPTIMIZE, VACUUM, Predictive Optimization).

2. Data Governance, Security & Compliance

Implement Unity Catalog as the centralized governance foundation; manage metadata, lineage, and access control across catalog/schema/table/view/volume; apply row filters and column masking for sensitive data.
Maintain data quality standards, automated quality checks, audit logging, authorization matrix, data catalog documentation, and compliance with data protection requirements.

3. AI/ML Foundation & Feature Engineering

Prepare gold-layer tables for analytics and training data; support MLflow, Feature Store (if applicable), and design streaming pipelines for real-time/near-real-time use cases.
Leverage Azure OpenAI and GenAI tools to improve coding productivity, documentation, troubleshooting, automation, and data profiling.

4. Cost & Performance Optimization

Optimize compute and storage costs through auto-scaling, spot instances, workload-aware sizing, job scheduling, SQL/PySpark tuning, and cost anomaly alerting.

5. Collaboration, Documentation & Continuous Improvement

Work with business stakeholders, analysts, visualization specialists, Data Champions, IT, and regional teams; document architecture, pipelines, runbooks, and knowledge base; apply Git, Databricks Repos, CI/CD, and continuous learning

Qualifications

Bachelor's degree in Information Technology, Computer Science, Data Science, Information Systems, or a related field.
High priority for candidates with Microsoft Azure Data Engineer Associate (DP-203), Azure Fundamentals (AZ-900), Databricks Data Engineer Associate/Professional, or equivalent certifications

Experience

Minimum 3-5 years of experience as a Data Engineer or in a similar enterprise data role, with hands-on experience in Azure Databricks, Azure Data Factory, Delta Lake, Unity Catalog, Medallion Architecture, operational support, optimization, streaming pipelines, and integration/CI-CD practices

Competencies

Proficiency in SQL/Spark SQL, Python & PySpark; master ETL/ELT, data modeling, Lakehouse/data warehouse concepts, Azure Databricks, Data Factory, Event Hubs, Synapse Link, ADLS Gen2, Blob, API integration, and structured/semi-structured/unstructured data.
Deep understanding of data governance, quality frameworks, lineage, RBAC/ABAC, row-level security, column masks, compliance (GDPR, HIPAA), data observability, CI/CD & IaC.
AI/ML and GenAI ready: data preparation for AI/ML, MLflow, Feature Store concepts, leveraging Azure OpenAI/GitHub Copilot/GenAI assistants for coding, documentation, profiling, and troubleshooting.
There is system thinking, the ability to standardize ways of working, reusable patterns, clear documentation, version control, testing, dev/test/prod deployment, and continuous learning about Fabric, Synapse evolution, Databricks positioning.

Language(s)