Job Description
Job Description
- Successful implementation and rollout of microservices' architecture and infrastructure
- Optimize engineers' software development throughput and performance.
- Help implement an international strategy across our infrastructure and services.
- Manage and maintain multiple datacenters across the global.
- Decouple older architectural, monolithic systems and update with modern approaches.
- Directly contribute to increasing platform uptime and SLA metrics.
- Provide engineers with scalable and safe service management.
- Design and build tools to improve deployment efficiency and gain visibility into production systems.
- Collaborate effectively with multiple development teams.
- Translate development team use cases into infrastructure requirements.
Experience & Qualification
- B.Sc.in Computer Science or equivalent.
- 5+ years of relevant large scale systems operations experience.
- 2+ experience automating production systems with Python, Go, or similar languages.
Skills & Knowledge:
- 3+ years experience as DevOps, SRE (Site Reliability Engineer), PE (Production Engineer), or similar.
- Production level Kubernetes, Helm, and Terraform experience.
- Experience with GitOps workflows for changes to environments.
- Experience with Observability tools such as Prometheus, Grafana, ELK, Tracing, cloudwatch etc.
- High level objectives, maintaining, developing infrastructure to support more efficient release cycle.
- Experience with cloud provider such as AWS, etc.
- Experience building automation workflows (CI-CD) such as GitHub Actions, Bitbucket Pieplines, Jenkins etc
- Competency in Python, Golang, Ruby or similar.
- Experience deploying and maintaining docker application and Kubernetes clusters in production
- Experience using AWS services programmatically (AWS certification preferred).
- Experience working with relational databases.
- Expert in Linux system administration, including storage, networks, and services.
- Experience deploying and maintaining large distributed system such as Hadoop, Kafka, Spark, Zookeeper, Cassandra or MongoDB is a big plus.
- Experience automating cloud infrastructure leveraging terraform a strong plus