Are you the Site Reliability Engineer who does not just ship the code, but learn and improve the code? Look no further!
You will:
- Oversee the running and performance of a large scale live Kubernetes cluster
- Understand, monitor, scale, and improve the systems like big data applications, integration components, and APIs
- Oversee the deployment and the application/system performance
- Plan workload accordingly
- Automate!
You should:
- have a relevant degree
- have 4 years of production experience in SRE
- have strong hands-on: Linux, networking, Bash/Python, Jenkins, Ansible, monitoring tools, and Kafka
- have Kubernetes administration experience; preferably have set up a cluster from scratch before