This job has been added to your Saved jobs.
You have reached the limit of 20 Saved Jobs. If you want to create a new one, please manage your Saved Jobs.
Job Expertise:
Job Domain:
Food and Beverage
Consumer Goods
Manufacturing and Engineering
Agriculture
Top 3 reasons to join us
- Professional & global working environment
- Lots of training & development opportunities
- Health-care insurances by international provider
Job description
JOB PURPOSE:
- The Data Engineer is responsible for designing, building, operating, and optimizing the Lakehouse data platform on Microsoft Azure, with a strong focus on Azure Databricks, Unity Catalog, Delta Lake, and related integration services. This role ensures data is ingested, transformed, governed, and served in a stable, secure, scalable, and AI/GenAI-ready manner to support reporting, advanced analytics, self-service data use, and machine learning use cases both now and in the future.
- The position works closely with business teams, analysts, data visualization specialists, Data Champions, IT, and regional stakeholders to deliver production-grade data pipelines with strong emphasis on data quality, observability, performance, cost efficiency, and operational excellence across the platform.
- In addition, the Data Engineer proactively leverages AI tools (including Azure OpenAI and GenAI assistants) to improve coding productivity, documentation, troubleshooting, and automation, while designing the data platform with an AI-ready architecture capable of supporting feature engineering, model training, and real-time inference in the future.
JOB DESCRIPTON:
1. Lakehouse Platform Development & Maintenance
- Design and build Lakehouse architecture on Azure Databricks using Unity Catalog, Delta Lake, and Medallion Architecture; develop production-grade pipelines using Azure Data Factory, Databricks Workflows, Auto Loader, and modern integration methods.
- Apply incremental loading, CDC, merge operations, schema evolution, Liquid Clustering, monitoring, alerting, logging, observability, and automated table maintenance (OPTIMIZE, VACUUM, Predictive Optimization).
2. Data Governance, Security & Compliance
- Implement Unity Catalog as the centralized governance foundation; manage metadata, lineage, and access control across catalog/schema/table/view/volume; apply row filters and column masking for sensitive data.
- Maintain data quality standards, automated quality checks, audit logging, authorization matrix, data catalog documentation, and compliance with data protection requirements.
3. AI/ML Foundation & Feature Engineering
- Prepare gold-layer tables for analytics and training data; support MLflow, Feature Store (if applicable), and design streaming pipelines for real-time/near-real-time use cases.
- Leverage Azure OpenAI and GenAI tools to improve coding productivity, documentation, troubleshooting, automation, and data profiling.
4. Cost & Performance Optimization
- Optimize compute and storage costs through auto-scaling, spot instances, workload-aware sizing, job scheduling, SQL/PySpark tuning, and cost anomaly alerting.
5. Collaboration, Documentation & Continuous Improvement
- Work with business stakeholders, analysts, visualization specialists, Data Champions, IT, and regional teams; document architecture, pipelines, runbooks, and knowledge base; apply Git, Databricks Repos, CI/CD, and continuous learning
Your skills and experience
Qualifications
- Bachelor's degree in Information Technology, Computer Science, Data Science, Information Systems, or a related field.
- High priority for candidates with Microsoft Azure Data Engineer Associate (DP-203), Azure Fundamentals (AZ-900), Databricks Data Engineer Associate/Professional, or equivalent certifications
Experience
- Minimum 3-5 years of experience as a Data Engineer or in a similar enterprise data role, with hands-on experience in Azure Databricks, Azure Data Factory, Delta Lake, Unity Catalog, Medallion Architecture, operational support, optimization, streaming pipelines, and integration/CI-CD practices
Competencies
- Proficiency in SQL/Spark SQL, Python & PySpark; master ETL/ELT, data modeling, Lakehouse/data warehouse concepts, Azure Databricks, Data Factory, Event Hubs, Synapse Link, ADLS Gen2, Blob, API integration, and structured/semi-structured/unstructured data.
- Deep understanding of data governance, quality frameworks, lineage, RBAC/ABAC, row-level security, column masks, compliance (GDPR, HIPAA), data observability, CI/CD & IaC.
- AI/ML and GenAI ready: data preparation for AI/ML, MLflow, Feature Store concepts, leveraging Azure OpenAI/GitHub Copilot/GenAI assistants for coding, documentation, profiling, and troubleshooting.
- There is system thinking, the ability to standardize ways of working, reusable patterns, clear documentation, version control, testing, dev/test/prod deployment, and continuous learning about Fabric, Synapse evolution, Databricks positioning.
Language(s)
- Proficiency in English (reading, writing, technical communication).
Why you'll love working here
- 13th month salary, KPI bonus, annual salary review
- Annual Health Check, Year End Party
- Company Trip, Team Building, Family Day
- Health Insurance (Generali care 24/24 Insurance)
- Birthday Gift, Staff Gift
De Heus LLC
Company type
Non-IT
Company industry
Agriculture
Company size
1000+
employees
Country
Vietnam
Working days
Monday - Friday
Overtime policy
No OT
More jobs for you
Get similar jobs by email
Subscribe
Feedback