Khám phá việc làm Cloud & Infrastructure nổi bật.
Xem ngay
+2
Tòa nhà Sofic, 10 Đường Mai Chí Thọ, Quận 2, An Khánh, TP Hồ Chí Minh
Tại văn phòng
Đăng 5 ngày trước
Chuyên môn:
Lĩnh vực:
Thực Phẩm và Đồ Uống
Hàng Tiêu Dùng
Sản Xuất và Kỹ Thuật
Nông Nghiệp

3 Lý do để gia nhập công ty

  • Professional & global working environment
  • Lots of training & development opportunities
  • Health-care insurances by international provider

Mô tả công việc

JOB PURPOSE:

  • The Data Engineer is responsible for designing, building, operating, and optimizing the Lakehouse data platform on Microsoft Azure, with a strong focus on Azure Databricks, Unity Catalog, Delta Lake, and related integration services. This role ensures data is ingested, transformed, governed, and served in a stable, secure, scalable, and AI/GenAI-ready manner to support reporting, advanced analytics, self-service data use, and machine learning use cases both now and in the future.
  • The position works closely with business teams, analysts, data visualization specialists, Data Champions, IT, and regional stakeholders to deliver production-grade data pipelines with strong emphasis on data quality, observability, performance, cost efficiency, and operational excellence across the platform.
  • In addition, the Data Engineer proactively leverages AI tools (including Azure OpenAI and GenAI assistants) to improve coding productivity, documentation, troubleshooting, and automation, while designing the data platform with an AI-ready architecture capable of supporting feature engineering, model training, and real-time inference in the future.

JOB DESCRIPTON:

1. Lakehouse Platform Development & Maintenance

  • Design and build Lakehouse architecture on Azure Databricks using Unity Catalog, Delta Lake, and Medallion Architecture; develop production-grade pipelines using Azure Data Factory, Databricks Workflows, Auto Loader, and modern integration methods.
  • Apply incremental loading, CDC, merge operations, schema evolution, Liquid Clustering, monitoring, alerting, logging, observability, and automated table maintenance (OPTIMIZE, VACUUM, Predictive Optimization).

2. Data Governance, Security & Compliance

  • Implement Unity Catalog as the centralized governance foundation; manage metadata, lineage, and access control across catalog/schema/table/view/volume; apply row filters and column masking for sensitive data.
  • Maintain data quality standards, automated quality checks, audit logging, authorization matrix, data catalog documentation, and compliance with data protection requirements.

3. AI/ML Foundation & Feature Engineering

  • Prepare gold-layer tables for analytics and training data; support MLflow, Feature Store (if applicable), and design streaming pipelines for real-time/near-real-time use cases.
  • Leverage Azure OpenAI and GenAI tools to improve coding productivity, documentation, troubleshooting, automation, and data profiling.

4. Cost & Performance Optimization

  • Optimize compute and storage costs through auto-scaling, spot instances, workload-aware sizing, job scheduling, SQL/PySpark tuning, and cost anomaly alerting.

5. Collaboration, Documentation & Continuous Improvement

  • Work with business stakeholders, analysts, visualization specialists, Data Champions, IT, and regional teams; document architecture, pipelines, runbooks, and knowledge base; apply Git, Databricks Repos, CI/CD, and continuous learning

Yêu cầu công việc

Qualifications

  • Bachelor's degree in Information Technology, Computer Science, Data Science, Information Systems, or a related field.
  • High priority for candidates with Microsoft Azure Data Engineer Associate (DP-203), Azure Fundamentals (AZ-900), Databricks Data Engineer Associate/Professional, or equivalent certifications

Experience

  • Minimum 3-5 years of experience as a Data Engineer or in a similar enterprise data role, with hands-on experience in Azure Databricks, Azure Data Factory, Delta Lake, Unity Catalog, Medallion Architecture, operational support, optimization, streaming pipelines, and integration/CI-CD practices

Competencies

  • Proficiency in SQL/Spark SQL, Python & PySpark; master ETL/ELT, data modeling, Lakehouse/data warehouse concepts, Azure Databricks, Data Factory, Event Hubs, Synapse Link, ADLS Gen2, Blob, API integration, and structured/semi-structured/unstructured data.
  • Deep understanding of data governance, quality frameworks, lineage, RBAC/ABAC, row-level security, column masks, compliance (GDPR, HIPAA), data observability, CI/CD & IaC.
  • AI/ML and GenAI ready: data preparation for AI/ML, MLflow, Feature Store concepts, leveraging Azure OpenAI/GitHub Copilot/GenAI assistants for coding, documentation, profiling, and troubleshooting.
  • There is system thinking, the ability to standardize ways of working, reusable patterns, clear documentation, version control, testing, dev/test/prod deployment, and continuous learning about Fabric, Synapse evolution, Databricks positioning.

Language(s)

  • Proficiency in English (reading, writing, technical communication).

Tại sao bạn sẽ yêu thích làm việc tại đây

  • 13th month salary, KPI bonus, annual salary review
  • Annual Health Check, Year End Party
  • Company Trip, Team Building, Family Day
  • Health Insurance (Generali care 24/24 Insurance)
  • Birthday Gift, Staff Gift

De Heus LLC

Mô hình công ty
Non-IT
Lĩnh vực công ty
Nông Nghiệp
Quy mô công ty
1000+ nhân viên
Quốc gia
Vietnam
Thời gian làm việc
Thứ 2 - Thứ 6
Làm việc ngoài giờ
Không có OT

Việc làm tương tự dành cho bạn

Nhận các việc làm tương tự qua email Nhận thông báo