Site Reliability Engineer - SRE


Job Description

Role and Responsibilities

  • Developing best product or capabilities
  • Identify issues in the product and provide best possible solution or resolution.
  • Perform Performance analysis, Continual improvement, and capacity planning for AWS cloud.
  • Development of tools that supports operation or engineering teams - Automation
  • Attending on-Call incident response.
  • Authoring system processes, rules, and best practices.
  • Understand all aspects of the product and troubleshooting service issues.
  • Performing audits of service usage - Measure everything
  • Planning and contribute release /deployment roadmaps
  • Performing exercise on service incident, Load testing or other capacity management activities
  • Responsible for monitoring the health of the system/services.
  • Experimenting or receiving trainings to expand skills - Continuous Learning


  • B. Tech or M. Tech or MCA in Computer Science /Information Technology.
  • Work experience 3~10 years perfered
  • Programming skills - Java/Python/Go/Scripting Languages
  • Having fundamentals of Unix & SQL commands
  • Understanding of distributed systems / Microservices
  • Problem solving skills in cloud or Data center environments
  • Teamwork and collaboration
  • Have excellent written and verbal communication
  • Infrastructure as Code tools experience - Terraform or Cloud Formation

Key Skills

Unix; Computer science; Automation; Machine learning; thermal; Troubleshooting; Information technology; Monitoring; SQL

About Company


Apply for the Job

Max file size 10MB.
Upload failed. Max size for files is 10 MB.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.