Site Reliability Engineer at InterSwitch

Job Overview

Location
Lagos, Lagos
Job Type
Full Time
Date Posted
3 years ago

Additional Details

Job ID
16962
Job Views
87

Job Description

  • Application Deadline: Thu, 8 Sep 2022 00:00:00 GMT
  • Position: Site Reliability Engineer

  • Job Type Full Time

  • Qualification BA/BSc/HND

  • Experience 1 year

  • Location Lagos

  • Job Field ICT / Computer 



Job Description



  • Manage Availability and Capacity on the Core Applications. Provide support for the Applications and ensure their optimal performance. Implement setup of new Applications in the company’s environment.


Duties and Responsibilities



  • Deployment of Applications

  • Support the deployment of Applications on the production environment

  • Implement projects involving Setup and deployment of new Applications and enhancement of existing applications

  • Automation

  • Implement Automations of Activities that are involved in the management of Applications.

  • Application Environment Management

  • Ensure 24x7 Availability of all Core Applications

  • Carry out Capacity planning to ensure Applications are always available to meet demands.

  • Create visibility into site health and key performance indicators of the Application Systems

  • Ensure up-to date patching and full compliance to security standards of the Application Systems.

  • Ensure up-to date documentation on all Core Applications as well as changes made

  • Balance feature development speed and reliability with well-defined Service Level Objectives (SLO) and Service Level Indicators (SLI)

  • Monitor Systems

  • Monitor the performance, health, and capacity of:

    • Servers

    • Databases

    • Services

    • Storage

    • Network Links



  • Use a variety of monitoring tools like Nagios, Solarwinds, Kibana, PagerDuty, AppDynamics, etc.

  • Troubleshooting.

  • Troubleshoot reported issues, and proactively identify areas in need of optimization

  • Working with technical support engineers to resolve critical incidents

  • Create and update clear troubleshooting guides for Applications

  • Requests Fulfilment.

  • Implement Requests relevant to the operation and enhancement of the Core Processing Applications.


Qualifications



  • Academic Qualification(s) - Good First Degree in Computer Science / Computer Engineering or other related fields

  • Professional Qualification(s) - Service Management Certifications (eg ITIL) is an advantage.

  • Experience (Number of relevant years) - Minimum of (1) year relevant experience.


Requirements:



  • Expertise in Linux and Windows Operating systems and Shell scripting

  • Technical experience working with cloud technologies

  • Build and Deployment Management (Jenkins) in a CI/CD workflow

  • Experience with Chef, Puppet or Ansible, automating all aspects of system and server management

  • Good understanding of distributed systems and container technologies like Docker/Kubernetes container infrastructure and orchestration

  • Good understanding of SLO and SLI for Applications

  • Experience with DNS, Networking and High Availability solutions

  • Proficient in at least one of the following languages: Python, Ruby, Go Ability to work across teams to continuously analyze system performance in production, troubleshoot reported issues, and proactively identify areas in need of optimization

  • Previous experience with developing and driving real time monitoring solutions that provide visibility into site health and key performance indicators

  • Working knowledge of databases

  • Working understanding of Load balancing technologies.

  • Working understanding of IT service management (Incident, Problem, Change and Knowledge management).

  • Ability to work within a technical team of support engineers through day-to-day operations and critical incidents.


Similar Jobs

Cookies

This website uses cookies to ensure you get the best experience on our website. Cookie Policy

Accept