Site Reliability Engineer (SRE) at Renmoney

Job Overview

Location
Lagos, Lagos
Job Type
Full Time
Date Posted
7 days ago

Additional Details

Job ID
153201
Job Views
27

Job Description






The Role




  • The Site Reliability Engineer (SRE) is responsible for ensuring the availability, reliability, scalability, and performance of business-critical applications and infrastructure. The role combines software engineering and operations expertise to automate processes, improve platform stability, and enhance system observability.



What You Will Do




  • Design, implement, and maintain highly available and scalable infrastructure.

  • Monitor production systems and proactively identify performance bottlenecks.

  • Manage incident response, root cause analysis (RCA), and problem management activities.

  • Develop automation scripts and tools to improve operational efficiency.

  • Implement and maintain CI/CD pipelines.

  • Manage cloud infrastructure across AWS and hybrid environments.

  • Configure and maintain observability platforms including monitoring, logging, and alerting solutions.

  • Define and track SLIs, SLOs, and error budgets.

  • Support application deployments and release management processes.

  • Collaborate with Engineering, Security, Data, and Product teams to improve system reliability.

  • Perform capacity planning and disaster recovery testing.

  • Ensure infrastructure and systems comply with security and regulatory requirements.



Requirements



What You Bring



Education




  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field.



Experience




  • 4–7 years of experience in Site Reliability Engineering, DevOps, Cloud Engineering, or Infrastructure Operations.

  • Experience supporting mission-critical financial services or fintech platforms is an advantage.



Technical Skills




  • Strong knowledge of AWS services (EC2, ECS/EKS, RDS, Lambda, VPC, IAM, CloudWatch).

  • Experience with Infrastructure as Code (Terraform, CloudFormation).

  • Knowledge of containerization technologies (Docker, Kubernetes).

  • Experience with CI/CD tools (GitHub Actions, GitLab CI/CD, Jenkins, Azure DevOps).

  • Experience with monitoring tools such as Datadog, Prometheus, Grafana, New Relic, or ELK Stack.

  • Strong Linux administration skills.

  • Experience with scripting languages (Python, Bash, PowerShell).

  • Understanding of networking, DNS, load balancing, VPNs, and security controls.



Preferred Certifications




  • AWS Certified Solutions Architect.

  • AWS SysOps Administrator.

  • Kubernetes Certifications (CKA/CKAD).

  • HashiCorp Terraform Associate.

  • Key Competencies

  • Problem-solving and analytical thinking.

  • Incident management and troubleshooting.

  • Automation mindset.

  • Strong communication and collaboration.

  • Attention to detail.



This Role Is Ideal For You If




  • You enjoy solving complex infrastructure and reliability challenges.

  • You are passionate about automation and reducing operational overhead.

  • You thrive in highly available, customer-facing environments where up-time matters.

  • You enjoy working across Engineering, Security, Data, and Product teams to improve system performance.

  • You are proactive and constantly seek opportunities to improve reliability, scalability, and efficiency.



You May Not Enjoy This Role If




  • You prefer manual processes over automation.

  • You are uncomfortable responding to production incidents and troubleshooting critical issues.

  • You prefer working in isolated environments with limited collaboration.

  • You are not interested in continuous learning and evolving cloud technologies.



Cookies

This website uses cookies to ensure you get the best experience on our website. Cookie Policy

Accept