Khám phá việc làm AI & Data nổi bật.
Xem ngay

Lead DevOps Engineer (Go, Python)

PAVE
+2
34/4 Trần Khánh Dư, phường Tân Định, Quận 1, TP Hồ Chí Minh
Linh hoạt
Đăng 19 giờ trước
Kỹ năng:
Chuyên môn:
Lĩnh vực:
Sản Phẩm Phần Mềm và Dịch Vụ Web

3 Lý do để gia nhập công ty

  • Hybrid and flexible working environment
  • Innovative Product
  • Growth Opportunities

Mô tả công việc

We're seeking an experienced Lead DevOps Engineer to spearhead our critical infrastructure transformation as PAVE.ai scales to enterprise level. This role will lead the strategic migration from Google Cloud Platform to AWS while building and managing a high-performing DevOps team. As Lead DevOps Engineer at PAVE.ai, you'll architect enterprise-grade infrastructure, establish site reliability engineering practices, and ensure 99.9%+ uptime for our vehicle inspection platform serving global automotive enterprises. This is a pivotal role that will define our infrastructure strategy and operational excellence as we process millions of vehicle inspections for dealerships, fleet operators, insurers, and vehicle marketplaces worldwide.

Cloud Migration Leadership

  • Lead and execute the complete migration strategy from GCP to AWS, ensuring zero downtime
  • Design and implement AWS enterprise architecture following Well-Architected Framework principles
  • Create detailed migration roadmaps with clear milestones, risk assessments, and rollback plans
  • Architect hybrid cloud solutions during transition phase to maintain business continuity
  • Optimize costs during and after migration while improving performance and reliability
  • Document migration processes and create runbooks for knowledge transfer

Team Leadership & Development

  • Build and lead a world-class DevOps team, including hiring, mentoring, and performance management
  • Define team structure, roles, and responsibilities for 24/7 operational coverage
  • Establish DevOps culture and best practices across the engineering organization
  • Create career development paths and training programs for team members
  • Foster collaboration between DevOps, development, and security teams
  • Lead incident response and post-mortem processes to drive continuous improvement

Site Reliability Engineering (SRE)

  • Establish and maintain SLIs, SLOs, and SLAs for all critical services
  • Design and implement comprehensive monitoring and observability strategies
  • Build automated incident detection and response systems
  • Ensure 99.9%+ uptime for production systems through proactive reliability engineering
  • Implement chaos engineering practices to identify and fix potential failures
  • Create capacity planning models to support 10x growth

Infrastructure & Automation

  • Design scalable, secure, and cost-effective AWS infrastructure for enterprise workloads
  • Implement Infrastructure as Code (IaC) using Terraform/CloudFormation
  • Build CI/CD pipelines supporting multiple deployment strategies (blue-green, canary)
  • Automate security compliance and governance using AWS native tools
  • Implement auto-scaling and self-healing infrastructure
  • Design disaster recovery and business continuity strategies
  • Develop and enhance logging systems and observability tools (ongoing improvement initiative)

Enterprise Platform Development

  • Architect multi-tenant infrastructure supporting enterprise isolation requirements
  • Implement enterprise-grade security including VPN, SSO, and zero-trust networking
  • Design data residency and compliance solutions for global operations
  • Build platform services for logging, monitoring, secrets management, and service mesh
  • Create developer self-service platforms to accelerate delivery
  • Establish FinOps practices for cloud cost optimization

Strategic Planning

  • Develop long-term infrastructure roadmap aligned with business objectives
  • Partner with leadership to define technology strategy and investments
  • Evaluate and introduce new technologies to improve operational efficiency
  • Create business cases for infrastructure investments with ROI analysis
  • Establish vendor relationships and manage AWS enterprise support
  • Drive infrastructure standardization and consolidation initiatives

Success Metrics

  • Complete GCP to AWS migration within 6 months with zero critical incidents
  • Achieve and maintain 99.9% uptime across all production services
  • Reduce infrastructure costs by 30% while improving performance
  • Build and retain a high-performing DevOps team with <10% attrition
  • Decrease deployment frequency from weekly to multiple times daily
  • Reduce MTTR (Mean Time To Recovery) by 50%

Yêu cầu công việc

Technical Skills

Cloud Expertise:

  • Expert-level AWS knowledge (Solutions Architect Professional preferred)
  • Strong GCP experience with migration expertise
  • Multi-cloud architecture and management
  • AWS services mastery: EC2, ECS/EKS, Lambda, RDS, S3, CloudFront, Route53
  • Cloud networking: VPC, Transit Gateway, Direct Connect, Global Accelerator
  • Security services: IAM, KMS, WAF, Shield, GuardDuty, Security Hub

DevOps & Automation:

  • Infrastructure as Code: Terraform, CloudFormation, AWS CDK
  • Configuration management: Ansible, Chef, or Puppet
  • CI/CD platforms: Jenkins, GitLab CI, GitHub Actions, AWS CodePipeline
  • Container orchestration: Kubernetes (EKS), Docker, Helm
  • GitOps practices with ArgoCD or Flux
  • Scripting languages: Bash, Go, Python

Site Reliability:

  • Monitoring/Observability: Prometheus, Grafana, ELK, Datadog, New Relic
  • APM and distributed tracing: OpenTelemetry, Jaeger
  • Incident management: PagerDuty, Opsgenie
  • SRE practices: Error budgets, SLI/SLO definition, toil reduction
  • Performance tuning and capacity planning
  • Chaos engineering tools: Gremlin, Chaos Monkey

Leadership Skills

  • Proven ability to lead and inspire technical teams
  • Experience managing remote and distributed teams
  • Strong project management and organizational skills
  • Excellent stakeholder management across technical and business teams
  • Budget management and cost optimization experience
  • Change management expertise for large-scale transformations

Soft Skills

  • Excellent written and verbal communication skills in both English and Vietnamese
  • Strategic thinking with ability to balance long-term vision with immediate needs
  • Strong problem-solving skills with calm demeanor during incidents
  • Ability to influence and drive consensus across organizations
  • Mentoring mindset with passion for developing talent
  • Adaptable to rapidly changing requirements and technologies

Experience

  • 7+ years of DevOps/SRE experience with 3+ years in a leadership role
  • Proven experience leading large-scale cloud migrations (GCP to AWS preferred)
  • Track record of managing DevOps teams of 4+ engineers
  • Experience with enterprise B2B SaaS platforms at scale (millions of requests/day)
  • Demonstrated success improving system reliability from <99% to 99.9%+

Preferred Qualifications

  • AWS Certified DevOps Engineer or Solutions Architect Professional
  • Experience with AI/ML workload infrastructure and GPU clusters
  • Knowledge of automotive industry compliance and regulations
  • Experience with computer vision and image processing pipelines
  • Serverless architecture and event-driven systems
  • FinOps certification or demonstrated cost optimization achievements
  • Experience with regulated environments (SOC2, ISO 27001, GDPR)
  • Contributions to open-source DevOps/SRE tools
  • Public speaking experience at DevOps/SRE conferences
  • Experience scaling startups to enterprise level

Tại sao bạn sẽ yêu thích làm việc tại đây

 

1. Competitive Compensation & Perks

  • Attractive salary package.
  • 15 days of annual leave.
  • 13th-month bonus
  • Premium healthcare coverage for you and your family.
  • Thoughtful appreciation gifts throughout the year.

2. Growth & Learning Opportunities

  • Work on cutting-edge, large-scale products in the car inspection field.
  • Clear career paths for both technical experts and aspiring leaders.
  • Continuous learning programs to sharpen your skills and grow your career.
  • Learn from everything, everywhere—but be a smart copy-paster, not a copycat!
  • Be ready to embrace and implement new ideas in a fast-paced environment.

3. An Inspiring Workplace

  • Flexible hybrid work model and a strong focus on work-life balance.
  • A modern, fully-equipped Office with a well-stocked pantry.
  • Be motivated, creative, and passionate—we can’t ask for more!
  • Respect and care for your teammates, your environment, and even yourself.
  • Treat yourself well, and while you’re at it, save the Earth too.

4. A Mindset for Growth

  • Have the courage to move fast, stay flexible, and take full responsibility for every single line of code.
  • Always look back at your work and strive to make it better—nothing is perfect, and that’s where you come in.
  • It’s okay to be late sometimes, but make sure you’re fully accountable and aware of your actions.

5. A Dynamic and Open Culture

  • We don’t stick rigidly to the gameplan, so feel free to add or remove your own “blah blah” from this list. 😉

We invest in giving your people’s ideas a chance.

Mô hình công ty
Sản phẩm
Lĩnh vực công ty
Sản Phẩm Phần Mềm và Dịch Vụ Web
Quy mô công ty
151-300 nhân viên
Quốc gia
Canada
Thời gian làm việc
Thứ 2 - Thứ 6
Làm việc ngoài giờ
Không có OT

Việc làm tương tự dành cho bạn

Nhận các việc làm tương tự qua email