Việc làm này đã được thêm vào mục Việc làm đã lưu.
Bạn đã lưu tối đa 20 việc làm. Nếu bạn muốn lưu mới, hãy cập nhật Việc làm đã lưu.
Mô tả công việc
About the Role
You will own the pipelines that move every event, transaction, and Change Data Capture (CDC) record from our products into a ClickHouse lakehouse and back out to the tools that act on it. Concretely, you own the medallion architecture (Bronze → Silver → Gold) that turns raw events and revenue postbacks into the curated, aggregated tables that power Business Intelligence (BI), marketing attribution, and our emerging Machine Learning / Artificial Intelligence (ML/AI) workloads.
This is a hands-on, end-to-end data engineering role on a modern open-source stack — not a narrow ticket-taking position. You will touch ingestion, storage, transformation, orchestration, and activation, and you will be the person who decides how a given dataset should be modeled and served.
What You'll Work On
Ingestion & event collection
- Maintain and extend event collection through RudderStack (our Customer Data Platform), PostHog, GA4, and server-side event sources across the web app and mobile apps
- Land raw revenue events from Stripe and Apple In-App Purchase (IAP) postbacks reliably and idempotently — billing data has zero tolerance for loss or duplication.
Integration & storage
- Run Airbyte connectors, including Change Data Capture from internal production databases, into the AWS S3 data lake.
- Manage the S3 → ClickHouse load path and keep the lake organized, partitioned, and costefficient.
Transformation — the medallion core
- Build and operate the Bronze → Silver → Gold pipelines in ClickHouse.
- Make correct use of ClickHouse engine families: ReplacingMergeTree for deduplicated, CDC-driven Silver tables and AggregatingMergeTree for the Gold aggregates that BI and ML consume.
- Model subscription-domain data — trials, conversions, Monthly Recurring Revenue (MRR), churn, Lifetime Value (LTV) — so that one number means one thing everywhere.
Orchestration & reliability
- Author, schedule, and monitor pipelines in Apache Airflow.
- Build data-quality checks, freshness Service Level Agreements (SLAs), alerting, and lineage so failures are caught before stakeholders see them.
Activation & reverse pipelines
- Sync curated data out to BigQuery and to the activation layer: ad platforms for conversion tracking (Google Ads, Meta Ads, Apple Ads, and the TikTok/Reddit Ads channels as they come online), plus Braze, Singular, and Firebase.
- Feed the Gold layer to AWS QuickSight dashboards and prepare clean, well-documented feature tables for ML/AI use cases.
Yêu cầu công việc
Required
- 7+ years building production data pipelines, with strong, idiomatic SQL and Python.
- Hands-on experience with a columnar analytical warehouse — ClickHouse strongly preferred.
- Solid grasp of dimensional / medallion data modeling and when to denormalize, dedup, or preaggregate.
- Experience orchestrating pipelines with Apache Airflow.
- Experience with an Extract-Load-Transform (ELT) / ingestion tool — Airbyte preferred — and a working understanding of Change Data Capture.
- Comfort on AWS, especially S3 as a data lake.
- Track record of treating data quality, observability, and idempotency as first-class — not afterthoughts.
Tại sao bạn sẽ yêu thích làm việc tại đây
- Full gross salary from Day 1 of Probation (2 months)
- Annual performance-based bonus (discretionary)
- Compulsory Social, Health & Unemployment Insurance from Day 1 of Probation
- Supplemental health insurance — Allianz / Bảo Việt
- MacBook provided for work
- 12 days paid annual leave/year, +1 day per 5 consecutive years
- Public holidays per Vietnamese law
- Training & professional development provided as needed
- Free parking
Công ty TNHH Stravo Vietnam