We are looking for an experienced Principal Site Reliability Engineer to join our Professional Services team and deliver Software and DevSecOps projects. You will report to a Site Reliability Engineering Manager. As a Principal Site Reliability Engineer you will be expected to fill the role of a technical lead on multiple projects simultaneously, representing the senior technical leadership within our organisation.
SRE / DevOps is one of our core competencies. You will be part of a highly-skilled team that continuously innovates and delivers high value solutions to clients across various industries on all public clouds (AWS, Azure, GCP, etc). Technologies we work with daily include Kuberenetes, Helm, Terraform, GitOps, OPA, Calico, Linkerd, just to name a few.
What you will be doing
Design and build advanced cloud-native infrastructure
Guide technical discussions with clients and build technical roadmaps
Collaborate with the Engineering Director(s) to (re)design architecture
Assist the Site Reliability Manager with resource planning
Assist engineering managers with building career paths for individuals wishing to be promoted to Principal Engineers
Teach, mentor, grow, and provide advice to other domain experts, individual contributors, and across several teams.
Document processes and monitor performance metrics
Guide conversations to remove blockers and encourage collaboration across teams.
Constantly improve the stability, scalability, security, cost-effectiveness, and operational excellence of our clients' systems.
Continuously discover, evaluate, and implement new technologies to maximize development efficiency and security.
Conduct infrastructure planning, testing, and development
Provide technical leadership on multiple projects.
What you must have
At least 7 or more years experience working in a DevOps/SRE team
Extensive experience in DevOps/SRE, team management and collaboration
Advanced knowledge of best practices related to data encryption and cybersecurity
Advanced knowledge of the general DevOps/SRE landscape, architectures, and emerging technologies
Cloud experience, preferably GCP, Azure and AWS
Experience in Observability Practices and Incident Management
Extensive experience with Prometheus, Grafana, the Elastic Stack and all versions of Beats, especially within Kubernetes
Experience with Infrastructure as Code, preferably Terraform
Experience with general automation and config management, preferably Ansible
Extensive experience building and maintaining Kubernetes clusters and workloads
Strong foundation of basic network and security concepts
Ability to build robust CICD pipelines
Familiarity with relational and non-relational databases
Solid understanding of Linux operating systems
Qualities & Behaviours
Exceptional interpersonal and communication skills
A zest for automation
Comfortable working as a remote team member and leader
Ability to keep up to date with DevOps/SRE best practices, trends and innovation
Passionate about mentoring and growing technical skills within the team