Back to jobs
CareerLink Africa

Site Reliability Engineer (SRE)

Nairobi, KenyaFull-Time · On-site
Mid level
KES 50k–100kPosted Today

About this role

An exciting opportunity is available for an experienced Site Reliability Engineer (SRE) to join a growing technology team in Nairobi. This full-time onsite position is ideal for professionals who are passionate about cloud infrastructure, automation, system reliability, and operational excellence.

The role focuses on ensuring the availability, scalability, and performance of cloud-based systems while helping build resilient infrastructure that supports business growth.

Job Overview

Position: Site Reliability Engineer (SRE)

Employment Type: Full-Time, Onsite

Location: Nairobi, Kenya

Industry: Information Technology & Engineering

Experience Required: 3–5 Years

Salary Range: KSh 50,000 – KSh 100,000 per Month

Education Level: Bachelor's Degree, Higher National Diploma (HND), or Equivalent Qualification

About the Role

As a Site Reliability Engineer, you will be responsible for maintaining highly available systems, improving deployment processes, automating operational tasks, and strengthening infrastructure reliability.

You will work closely with development teams to ensure applications are scalable, observable, and capable of delivering a seamless user experience.

Key Responsibilities

Build and Maintain CI/CD Pipelines

  • Design and implement reliable Continuous Integration and Continuous Deployment (CI/CD) pipelines.

  • Improve software delivery speed while maintaining system stability.

  • Support efficient and repeatable deployment processes.

Cloud Infrastructure Management

  • Manage cloud environments and services hosted on AWS.

  • Deploy and maintain applications using:

    • Amazon ECS

    • Amazon EC2

    • Application Load Balancers (ALB)

  • Ensure infrastructure remains scalable, secure, and cost-efficient.

Monitoring and Observability

  • Monitor system performance and application health.

  • Create dashboards, alerts, and reporting mechanisms.

  • Analyze logs and metrics to identify and resolve issues proactively.

Automation and Scripting

  • Automate repetitive operational tasks using Python and Bash.

  • Improve operational efficiency through workflow automation.

  • Reduce manual intervention across infrastructure processes.

Incident Management

  • Lead incident response activities during system disruptions.

  • Conduct root cause analysis following incidents.

  • Implement corrective measures to prevent recurring issues.

Disaster Recovery and Resilience

  • Develop and maintain disaster recovery strategies.

  • Manage backups and failover mechanisms.

  • Conduct resilience testing to ensure business continuity.

Developer Collaboration

  • Partner with software engineering teams to optimize deployment workflows.

  • Improve application instrumentation and observability.

  • Support performance optimization initiatives.

Required Qualifications

Education

Applicants should possess:

  • A Bachelor's Degree in Computer Science, Information Technology, Engineering, or a related field; or

  • An equivalent Higher National Diploma (HND).

Professional Experience

Candidates should have:

  • 3 to 5 years of experience in:

    • Site Reliability Engineering (SRE)

    • DevOps Engineering

    • Cloud Operations

    • Infrastructure Engineering

  • Proven experience managing production cloud environments.

Required Technical Skills

AWS Cloud Expertise

Strong hands-on experience with:

  • Amazon ECS

  • Amazon EC2

  • Application Load Balancer (ALB)

  • Cloud infrastructure management

Monitoring and Observability Tools

Experience working with:

  • Prometheus

  • Grafana

  • Loki

  • ELK Stack

Containerization

  • Advanced knowledge of Docker.

  • Experience deploying and managing containerized applications.

CI/CD Development

  • Building and maintaining deployment pipelines.

  • Automating software release processes.

Programming and Automation

Proficiency in:

  • Python

  • Bash scripting

Troubleshooting

  • Strong analytical and problem-solving skills.

  • Ability to investigate and resolve complex production issues efficiently.

Preferred Qualifications

Candidates with the following additional skills will have an advantage:

Infrastructure as Code (IaC)

  • Experience using Terraform for infrastructure automation.

Kubernetes Experience

  • Familiarity with Kubernetes environments, especially Amazon EKS.

Database Operations

  • Knowledge of MongoDB Atlas administration and monitoring.

Cost Optimization

  • Experience improving cloud resource utilization and reducing infrastructure costs.

What Success Looks Like

Successful performance in this role will result in:

Reliable Systems

  • Highly available and scalable infrastructure.

  • Reduced downtime and improved service reliability.

Enhanced Visibility

  • Clear monitoring and observability across all applications and services.

  • Faster issue detection and response.

Improved Incident Management

  • Reduced frequency of critical incidents.

  • Faster recovery times when issues occur.

Increased Automation

  • Efficient deployment and operational workflows.

  • Reduced manual processes and improved productivity.

Why Join This Opportunity?

This position offers the chance to work with modern cloud technologies, automation tools, and scalable infrastructure solutions. It is an excellent opportunity for professionals looking to deepen their expertise in Site Reliability Engineering while contributing to high-impact technology projects.

Conclusion

If you have experience in DevOps, cloud infrastructure, or Site Reliability Engineering and are passionate about building reliable and scalable systems, this Nairobi-based opportunity could be the next step in your technology career.

Application Deadline

18 July 2026