Explore outstanding Cloud & Infrastructure jobs.
See now

Data Engineer

De Heus Asia
+2
Tòa nhà Sofic, 10 Đường Mai Chí Thọ, Quận 2, An Khanh, Ho Chi Minh
At office
Posted 5 days ago
Job Expertise:
Job Domain:
Food and Beverage
Consumer Goods
Manufacturing and Engineering
Agriculture

Top 3 reasons to join us

  • Professional & global working environment
  • Lots of training & development opportunities
  • Health-care insurances by international provider

Job description

JOB PURPOSE:

  • The Data Engineer is responsible for designing, building, operating, and optimizing the Lakehouse data platform on Microsoft Azure, with a strong focus on Azure Databricks, Unity Catalog, Delta Lake, and related integration services. This role ensures data is ingested, transformed, governed, and served in a stable, secure, scalable, and AI/GenAI-ready manner to support reporting, advanced analytics, self-service data use, and machine learning use cases both now and in the future.
  • The position works closely with business teams, analysts, data visualization specialists, Data Champions, IT, and regional stakeholders to deliver production-grade data pipelines with strong emphasis on data quality, observability, performance, cost efficiency, and operational excellence across the platform.
  • In addition, the Data Engineer proactively leverages AI tools (including Azure OpenAI and GenAI assistants) to improve coding productivity, documentation, troubleshooting, and automation, while designing the data platform with an AI-ready architecture capable of supporting feature engineering, model training, and real-time inference in the future.

JOB DESCRIPTON:

1. Lakehouse Platform Development & Maintenance

  • Design and build Lakehouse architecture on Azure Databricks using Unity Catalog, Delta Lake, and Medallion Architecture; develop production-grade pipelines using Azure Data Factory, Databricks Workflows, Auto Loader, and modern integration methods.
  • Apply incremental loading, CDC, merge operations, schema evolution, Liquid Clustering, monitoring, alerting, logging, observability, and automated table maintenance (OPTIMIZE, VACUUM, Predictive Optimization).

2. Data Governance, Security & Compliance

  • Implement Unity Catalog as the centralized governance foundation; manage metadata, lineage, and access control across catalog/schema/table/view/volume; apply row filters and column masking for sensitive data.
  • Maintain data quality standards, automated quality checks, audit logging, authorization matrix, data catalog documentation, and compliance with data protection requirements.

3. AI/ML Foundation & Feature Engineering

  • Prepare gold-layer tables for analytics and training data; support MLflow, Feature Store (if applicable), and design streaming pipelines for real-time/near-real-time use cases.
  • Leverage Azure OpenAI and GenAI tools to improve coding productivity, documentation, troubleshooting, automation, and data profiling.

4. Cost & Performance Optimization

  • Optimize compute and storage costs through auto-scaling, spot instances, workload-aware sizing, job scheduling, SQL/PySpark tuning, and cost anomaly alerting.

5. Collaboration, Documentation & Continuous Improvement

  • Work with business stakeholders, analysts, visualization specialists, Data Champions, IT, and regional teams; document architecture, pipelines, runbooks, and knowledge base; apply Git, Databricks Repos, CI/CD, and continuous learning

Your skills and experience

Qualifications

  • Bachelor's degree in Information Technology, Computer Science, Data Science, Information Systems, or a related field.
  • High priority for candidates with Microsoft Azure Data Engineer Associate (DP-203), Azure Fundamentals (AZ-900), Databricks Data Engineer Associate/Professional, or equivalent certifications

Experience

  • Minimum 3-5 years of experience as a Data Engineer or in a similar enterprise data role, with hands-on experience in Azure Databricks, Azure Data Factory, Delta Lake, Unity Catalog, Medallion Architecture, operational support, optimization, streaming pipelines, and integration/CI-CD practices

Competencies

  • Proficiency in SQL/Spark SQL, Python & PySpark; master ETL/ELT, data modeling, Lakehouse/data warehouse concepts, Azure Databricks, Data Factory, Event Hubs, Synapse Link, ADLS Gen2, Blob, API integration, and structured/semi-structured/unstructured data.
  • Deep understanding of data governance, quality frameworks, lineage, RBAC/ABAC, row-level security, column masks, compliance (GDPR, HIPAA), data observability, CI/CD & IaC.
  • AI/ML and GenAI ready: data preparation for AI/ML, MLflow, Feature Store concepts, leveraging Azure OpenAI/GitHub Copilot/GenAI assistants for coding, documentation, profiling, and troubleshooting.
  • There is system thinking, the ability to standardize ways of working, reusable patterns, clear documentation, version control, testing, dev/test/prod deployment, and continuous learning about Fabric, Synapse evolution, Databricks positioning.

Language(s)

  • Proficiency in English (reading, writing, technical communication).

Why you'll love working here

  • 13th month salary, KPI bonus, annual salary review
  • Annual Health Check, Year End Party
  • Company Trip, Team Building, Family Day
  • Health Insurance (Generali care 24/24 Insurance)
  • Birthday Gift, Staff Gift

De Heus LLC

Company type
Non-IT
Company industry
Agriculture
Company size
1000+ employees
Country
Vietnam
Working days
Monday - Friday
Overtime policy
No OT

More jobs for you

Get similar jobs by email Subscribe