Việc làm này đã được thêm vào mục Việc làm đã lưu.
Bạn đã lưu tối đa 20 việc làm. Nếu bạn muốn lưu mới, hãy cập nhật Việc làm đã lưu.
3 Lý do để gia nhập công ty
- Hybrid and flexible working environment
- Innovative Product
- Growth Opportunities
Mô tả công việc
We're seeking an experienced Lead DevOps Engineer to spearhead our critical infrastructure transformation as PAVE.ai scales to enterprise level. This role will lead the strategic migration from Google Cloud Platform to AWS while building and managing a high-performing DevOps team. As Lead DevOps Engineer at PAVE.ai, you'll architect enterprise-grade infrastructure, establish site reliability engineering practices, and ensure 99.9%+ uptime for our vehicle inspection platform serving global automotive enterprises. This is a pivotal role that will define our infrastructure strategy and operational excellence as we process millions of vehicle inspections for dealerships, fleet operators, insurers, and vehicle marketplaces worldwide.
Cloud Migration Leadership
- Lead and execute the complete migration strategy from GCP to AWS, ensuring zero downtime
- Design and implement AWS enterprise architecture following Well-Architected Framework principles
- Create detailed migration roadmaps with clear milestones, risk assessments, and rollback plans
- Architect hybrid cloud solutions during transition phase to maintain business continuity
- Optimize costs during and after migration while improving performance and reliability
- Document migration processes and create runbooks for knowledge transfer
Team Leadership & Development
- Build and lead a world-class DevOps team, including hiring, mentoring, and performance management
- Define team structure, roles, and responsibilities for 24/7 operational coverage
- Establish DevOps culture and best practices across the engineering organization
- Create career development paths and training programs for team members
- Foster collaboration between DevOps, development, and security teams
- Lead incident response and post-mortem processes to drive continuous improvement
Site Reliability Engineering (SRE)
- Establish and maintain SLIs, SLOs, and SLAs for all critical services
- Design and implement comprehensive monitoring and observability strategies
- Build automated incident detection and response systems
- Ensure 99.9%+ uptime for production systems through proactive reliability engineering
- Implement chaos engineering practices to identify and fix potential failures
- Create capacity planning models to support 10x growth
Infrastructure & Automation
- Design scalable, secure, and cost-effective AWS infrastructure for enterprise workloads
- Implement Infrastructure as Code (IaC) using Terraform/CloudFormation
- Build CI/CD pipelines supporting multiple deployment strategies (blue-green, canary)
- Automate security compliance and governance using AWS native tools
- Implement auto-scaling and self-healing infrastructure
- Design disaster recovery and business continuity strategies
- Develop and enhance logging systems and observability tools (ongoing improvement initiative)
Enterprise Platform Development
- Architect multi-tenant infrastructure supporting enterprise isolation requirements
- Implement enterprise-grade security including VPN, SSO, and zero-trust networking
- Design data residency and compliance solutions for global operations
- Build platform services for logging, monitoring, secrets management, and service mesh
- Create developer self-service platforms to accelerate delivery
- Establish FinOps practices for cloud cost optimization
Strategic Planning
- Develop long-term infrastructure roadmap aligned with business objectives
- Partner with leadership to define technology strategy and investments
- Evaluate and introduce new technologies to improve operational efficiency
- Create business cases for infrastructure investments with ROI analysis
- Establish vendor relationships and manage AWS enterprise support
- Drive infrastructure standardization and consolidation initiatives
Success Metrics
- Complete GCP to AWS migration within 6 months with zero critical incidents
- Achieve and maintain 99.9% uptime across all production services
- Reduce infrastructure costs by 30% while improving performance
- Build and retain a high-performing DevOps team with <10% attrition
- Decrease deployment frequency from weekly to multiple times daily
- Reduce MTTR (Mean Time To Recovery) by 50%
Yêu cầu công việc
Technical Skills
Cloud Expertise:
- Expert-level AWS knowledge (Solutions Architect Professional preferred)
- Strong GCP experience with migration expertise
- Multi-cloud architecture and management
- AWS services mastery: EC2, ECS/EKS, Lambda, RDS, S3, CloudFront, Route53
- Cloud networking: VPC, Transit Gateway, Direct Connect, Global Accelerator
- Security services: IAM, KMS, WAF, Shield, GuardDuty, Security Hub
DevOps & Automation:
- Infrastructure as Code: Terraform, CloudFormation, AWS CDK
- Configuration management: Ansible, Chef, or Puppet
- CI/CD platforms: Jenkins, GitLab CI, GitHub Actions, AWS CodePipeline
- Container orchestration: Kubernetes (EKS), Docker, Helm
- GitOps practices with ArgoCD or Flux
- Scripting languages: Bash, Go, Python
Site Reliability:
- Monitoring/Observability: Prometheus, Grafana, ELK, Datadog, New Relic
- APM and distributed tracing: OpenTelemetry, Jaeger
- Incident management: PagerDuty, Opsgenie
- SRE practices: Error budgets, SLI/SLO definition, toil reduction
- Performance tuning and capacity planning
- Chaos engineering tools: Gremlin, Chaos Monkey
Leadership Skills
- Proven ability to lead and inspire technical teams
- Experience managing remote and distributed teams
- Strong project management and organizational skills
- Excellent stakeholder management across technical and business teams
- Budget management and cost optimization experience
- Change management expertise for large-scale transformations
Soft Skills
- Excellent written and verbal communication skills in both English and Vietnamese
- Strategic thinking with ability to balance long-term vision with immediate needs
- Strong problem-solving skills with calm demeanor during incidents
- Ability to influence and drive consensus across organizations
- Mentoring mindset with passion for developing talent
- Adaptable to rapidly changing requirements and technologies
Experience
- 7+ years of DevOps/SRE experience with 3+ years in a leadership role
- Proven experience leading large-scale cloud migrations (GCP to AWS preferred)
- Track record of managing DevOps teams of 4+ engineers
- Experience with enterprise B2B SaaS platforms at scale (millions of requests/day)
- Demonstrated success improving system reliability from <99% to 99.9%+
Preferred Qualifications
- AWS Certified DevOps Engineer or Solutions Architect Professional
- Experience with AI/ML workload infrastructure and GPU clusters
- Knowledge of automotive industry compliance and regulations
- Experience with computer vision and image processing pipelines
- Serverless architecture and event-driven systems
- FinOps certification or demonstrated cost optimization achievements
- Experience with regulated environments (SOC2, ISO 27001, GDPR)
- Contributions to open-source DevOps/SRE tools
- Public speaking experience at DevOps/SRE conferences
- Experience scaling startups to enterprise level
Tại sao bạn sẽ yêu thích làm việc tại đây
1. Competitive Compensation & Perks
- Attractive salary package.
- 15 days of annual leave.
- 13th-month bonus
- Premium healthcare coverage for you and your family.
- Thoughtful appreciation gifts throughout the year.
2. Growth & Learning Opportunities
- Work on cutting-edge, large-scale products in the car inspection field.
- Clear career paths for both technical experts and aspiring leaders.
- Continuous learning programs to sharpen your skills and grow your career.
- Learn from everything, everywhere—but be a smart copy-paster, not a copycat!
- Be ready to embrace and implement new ideas in a fast-paced environment.
3. An Inspiring Workplace
- Flexible hybrid work model and a strong focus on work-life balance.
- A modern, fully-equipped Office with a well-stocked pantry.
- Be motivated, creative, and passionate—we can’t ask for more!
- Respect and care for your teammates, your environment, and even yourself.
- Treat yourself well, and while you’re at it, save the Earth too.
4. A Mindset for Growth
- Have the courage to move fast, stay flexible, and take full responsibility for every single line of code.
- Always look back at your work and strive to make it better—nothing is perfect, and that’s where you come in.
- It’s okay to be late sometimes, but make sure you’re fully accountable and aware of your actions.
5. A Dynamic and Open Culture
- We don’t stick rigidly to the gameplan, so feel free to add or remove your own “blah blah” from this list. 😉
Việc làm tương tự dành cho bạn
Nhận các việc làm tương tự qua email
NEW FOR YOU
Đăng
19 giờ trước
Systems Engineer (DevOps, System Admin, Python)
Tại văn phòng
TP Hồ Chí Minh - Hà Nội
NEW FOR YOU
Đăng
19 giờ trước
Site Reliability Engineer (Python, Bash, or Go)
Linh hoạt
TP Hồ Chí Minh
HOT
Đăng
1 ngày trước
Remote Backend Developer (Python/NodeJS, AI Automation)
Làm từ xa
TP Hồ Chí Minh - Hà Nội - Đà Nẵng
HOT
Đăng
4 ngày trước
AI Technical Lead (Machine Learning/ LLM/ Python)
Tại văn phòng
TP Hồ Chí Minh
Góp ý