Explore outstanding AI & Data jobs.
See now

Senior Site Reliability Engineer (ELK, GCP, Kubernetes)

VSOL
Build People, Build Culture, Build Product
VSOL
New Office
Glory Home Building, 3rd Floor, 215 B10 Nguyen Van Huong Street, An Khanh Ward, Thu Duc City, Ho Chi Minh
22nd Floor, Ngoc Khanh Plaza Building, No. 1 Pham Huy Thong Street, Giang Vo Ward, , Ba Dinh, Ha Noi
At office
Posted 3 days ago
Job Domain:
IT Services and IT Consulting
Software Development Outsourcing
Software Products and Web Services

Top 3 reasons to join us

  • Onsite opportunities in UAE & Saudi Arabia
  • Premium Health insurance for employees & family
  • 14+ days of Annual leave & 5 days of Outing leave

Job description

VSOL is a digital enabler with a mission to help public and private organizations evolve their businesses through data and technology. We provide an end-to-end service from consulting to execution that drives the growth and innovation of our clients. As VSOL is in a phase of rapid expansion, we offer a dynamic, creative environment that accelerates your personal and professional development. We are looking for talented individuals eager to develop in international markets while contributing to the company’s future in a constructive and supportive manner.

 

Responsibilities:

  • Lead deployment and management of web applications, ensuring stability, scalability and reliability.
  • Design and manage hybrid environment reliability solutions (cloud and on-premises), optimizing for availability and performance.
  • Knowledge of orchestrate and administer containerized applications using Kubernetes, focusing on efficient deployment and runtime management.
  • Administer, including Geographic Information System (GIS) and databases (SQL Server), maintaining data integrity and high performance.
  • Analyze and mitigate service disruptions, developing strategic preventative measures to minimize downtime.
  • Understanding of network engineering principles.
  • Participate in evaluation and integration of new technologies, enhancing service reliability and operational capabilities.
  • Develop and automate critical system health metrics, using tools like ELK stack.
  • Manage major incident response efforts, ensuring effective resolution to maintain system stability.
  • Coordinate with cross-functional teams to align SRE practices with business objectives and IT standards.
  • Create and review technical documentation for system architecture and operational procedures.
  • Assure regulatory compliance and security assessments, implementing best practices to protect system integrity.
  • Participate in pager-duty rotations, resolving critical incidents.

Note: The position may require international travel for periods of 3-6 months continuously

Your skills and experience

Requirements

  • Over 4 years of experience with cloud environments and containerization technologies, including designing and implementing scalable, resilient infrastructure solutions using platforms GCP (and other cloud platforms), and Kubernetes.
  • Experience with monitoring and logging tools such as ELK Stack.
  • Demonstrated excellence in network management, advanced troubleshooting, and system optimization, with a focus on enhancing efficiency and reducing downtime.
  • Awareness of experience in IT, with advanced expertise in network engineering and system administration.
  • Awareness of experience in site reliability practices, any experience with GIS platforms is a plus.
  • Strong skills in scripting and automation, particularly with Python and Bash is a big plus.
  • Good knowledge of GitOps tools (e.g., Argo CD, FluxCD).
  • Knowledge of security frameworks and compliance standards.

Qualifications:

  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • Cisco Certified Network Associate (CCNA) is a plus.
  • Certified Kubernetes Administrator (CKA) is a plus.
  • Written and spoken English communication skills at CEFR B1 level or above.

Why you'll love working here

  • Working in start-up environment, English-speaking, with opportunity to be part of innovation team and global projects
  • Onsite opportunities in UAE (United Arab Emirates) and KSA (Kingdom of Saudi Arabia)
  • 13th-month salary bonus
  • Premium Health insurance for employees and family members (depending on level), Annual Health Check, Government Insurance in probation
  • 14++ days of Annual leave and 5 days of Outing leave
  • Lunch allowance and free parking
  • Taxi & phone allowance (depending on level)

Innovating Technology for Seamless Digital Transformation

Company type
IT Product
Company industry
IT Services and IT Consulting
Company size
51-150 employees
Country
United Arab Emirates
Working days
Monday - Friday
Overtime policy
No OT

More jobs for you

Get similar jobs by email