This job has been added to your Saved jobs.

View list

You have reached the limit of 20 Saved Jobs. If you want to create a new one, please manage your Saved Jobs.

Lead DevOps Engineer (Go, Python)

PAVE

Apply now

34/4 Trần Khánh Dư, phường Tân Định, District 1, Ho Chi Minh

Hybrid

Posted 20 hours ago

Skills:

AWS Golang Python CI/CD English

Job Expertise:

DevOps Engineer

Job Domain:

Software Products and Web Services

Top 3 reasons to join us

Hybrid and flexible working environment
Innovative Product
Growth Opportunities

Job description

We're seeking an experienced Lead DevOps Engineer to spearhead our critical infrastructure transformation as PAVE.ai scales to enterprise level. This role will lead the strategic migration from Google Cloud Platform to AWS while building and managing a high-performing DevOps team. As Lead DevOps Engineer at PAVE.ai, you'll architect enterprise-grade infrastructure, establish site reliability engineering practices, and ensure 99.9%+ uptime for our vehicle inspection platform serving global automotive enterprises. This is a pivotal role that will define our infrastructure strategy and operational excellence as we process millions of vehicle inspections for dealerships, fleet operators, insurers, and vehicle marketplaces worldwide.

Cloud Migration Leadership

Lead and execute the complete migration strategy from GCP to AWS, ensuring zero downtime
Design and implement AWS enterprise architecture following Well-Architected Framework principles
Create detailed migration roadmaps with clear milestones, risk assessments, and rollback plans
Architect hybrid cloud solutions during transition phase to maintain business continuity
Optimize costs during and after migration while improving performance and reliability
Document migration processes and create runbooks for knowledge transfer

Team Leadership & Development

Build and lead a world-class DevOps team, including hiring, mentoring, and performance management
Define team structure, roles, and responsibilities for 24/7 operational coverage
Establish DevOps culture and best practices across the engineering organization
Create career development paths and training programs for team members
Foster collaboration between DevOps, development, and security teams
Lead incident response and post-mortem processes to drive continuous improvement

Site Reliability Engineering (SRE)

Establish and maintain SLIs, SLOs, and SLAs for all critical services
Design and implement comprehensive monitoring and observability strategies
Build automated incident detection and response systems
Ensure 99.9%+ uptime for production systems through proactive reliability engineering
Implement chaos engineering practices to identify and fix potential failures
Create capacity planning models to support 10x growth

Infrastructure & Automation

Design scalable, secure, and cost-effective AWS infrastructure for enterprise workloads
Implement Infrastructure as Code (IaC) using Terraform/CloudFormation
Build CI/CD pipelines supporting multiple deployment strategies (blue-green, canary)
Automate security compliance and governance using AWS native tools
Implement auto-scaling and self-healing infrastructure
Design disaster recovery and business continuity strategies
Develop and enhance logging systems and observability tools (ongoing improvement initiative)

Enterprise Platform Development

Architect multi-tenant infrastructure supporting enterprise isolation requirements
Implement enterprise-grade security including VPN, SSO, and zero-trust networking
Design data residency and compliance solutions for global operations
Build platform services for logging, monitoring, secrets management, and service mesh
Create developer self-service platforms to accelerate delivery
Establish FinOps practices for cloud cost optimization

Strategic Planning

Develop long-term infrastructure roadmap aligned with business objectives
Partner with leadership to define technology strategy and investments
Evaluate and introduce new technologies to improve operational efficiency
Create business cases for infrastructure investments with ROI analysis
Establish vendor relationships and manage AWS enterprise support
Drive infrastructure standardization and consolidation initiatives

Success Metrics

Complete GCP to AWS migration within 6 months with zero critical incidents
Achieve and maintain 99.9% uptime across all production services
Reduce infrastructure costs by 30% while improving performance
Build and retain a high-performing DevOps team with <10% attrition
Decrease deployment frequency from weekly to multiple times daily
Reduce MTTR (Mean Time To Recovery) by 50%

Your skills and experience

Technical Skills

Cloud Expertise:

Expert-level AWS knowledge (Solutions Architect Professional preferred)
Strong GCP experience with migration expertise
Multi-cloud architecture and management
AWS services mastery: EC2, ECS/EKS, Lambda, RDS, S3, CloudFront, Route53
Cloud networking: VPC, Transit Gateway, Direct Connect, Global Accelerator
Security services: IAM, KMS, WAF, Shield, GuardDuty, Security Hub

DevOps & Automation:

Infrastructure as Code: Terraform, CloudFormation, AWS CDK
Configuration management: Ansible, Chef, or Puppet
CI/CD platforms: Jenkins, GitLab CI, GitHub Actions, AWS CodePipeline
Container orchestration: Kubernetes (EKS), Docker, Helm
GitOps practices with ArgoCD or Flux
Scripting languages: Bash, Go, Python

Site Reliability:

Monitoring/Observability: Prometheus, Grafana, ELK, Datadog, New Relic
APM and distributed tracing: OpenTelemetry, Jaeger
Incident management: PagerDuty, Opsgenie
SRE practices: Error budgets, SLI/SLO definition, toil reduction
Performance tuning and capacity planning
Chaos engineering tools: Gremlin, Chaos Monkey

Leadership Skills

Proven ability to lead and inspire technical teams
Experience managing remote and distributed teams
Strong project management and organizational skills
Excellent stakeholder management across technical and business teams
Budget management and cost optimization experience
Change management expertise for large-scale transformations

Soft Skills

Excellent written and verbal communication skills in both English and Vietnamese
Strategic thinking with ability to balance long-term vision with immediate needs
Strong problem-solving skills with calm demeanor during incidents
Ability to influence and drive consensus across organizations
Mentoring mindset with passion for developing talent
Adaptable to rapidly changing requirements and technologies

Experience

7+ years of DevOps/SRE experience with 3+ years in a leadership role
Proven experience leading large-scale cloud migrations (GCP to AWS preferred)
Track record of managing DevOps teams of 4+ engineers
Experience with enterprise B2B SaaS platforms at scale (millions of requests/day)
Demonstrated success improving system reliability from <99% to 99.9%+

Preferred Qualifications

AWS Certified DevOps Engineer or Solutions Architect Professional
Experience with AI/ML workload infrastructure and GPU clusters
Knowledge of automotive industry compliance and regulations
Experience with computer vision and image processing pipelines
Serverless architecture and event-driven systems
FinOps certification or demonstrated cost optimization achievements
Experience with regulated environments (SOC2, ISO 27001, GDPR)
Contributions to open-source DevOps/SRE tools
Public speaking experience at DevOps/SRE conferences
Experience scaling startups to enterprise level

Why you'll love working here

1. Competitive Compensation & Perks

Attractive salary package.
15 days of annual leave.
13th-month bonus
Premium healthcare coverage for you and your family.
Thoughtful appreciation gifts throughout the year.

2. Growth & Learning Opportunities

Work on cutting-edge, large-scale products in the car inspection field.
Clear career paths for both technical experts and aspiring leaders.
Continuous learning programs to sharpen your skills and grow your career.
Learn from everything, everywhere—but be a smart copy-paster, not a copycat!
Be ready to embrace and implement new ideas in a fast-paced environment.

3. An Inspiring Workplace

Flexible hybrid work model and a strong focus on work-life balance.
A modern, fully-equipped Office with a well-stocked pantry.
Be motivated, creative, and passionate—we can’t ask for more!
Respect and care for your teammates, your environment, and even yourself.
Treat yourself well, and while you’re at it, save the Earth too.

4. A Mindset for Growth

Have the courage to move fast, stay flexible, and take full responsibility for every single line of code.
Always look back at your work and strive to make it better—nothing is perfect, and that’s where you come in.
It’s okay to be late sometimes, but make sure you’re fully accountable and aware of your actions.

5. A Dynamic and Open Culture