Site Reliability Engineer

Employment Type:
IT Jobs
Job Role:
Site Management Jobs, Software Engineering Jobs
Hong Kong
Mobile Jobs
Job Ref:

We’ll trust you to:

Investigate, triage, and troubleshoot production problems as they occur
Create and maintain common and integrated standards with respect to logging, latency, troubleshooting, and monitoring
Develop and maintain tools used in investigating production problems
Review and influence the design and standards of the software
Measure current capacity, predict future capacity needs and make suggestions accordingly
Automate deployment and configuration management, quality (including functional and capacity testing), and reaction to problems
Facilitate continuous integration/continuous deployment
You’ll need to have:

3+ years of experience programming in Python
Demonstrated understanding of how production systems are put together and experience with triaging and solving problems with them
Strong knowledge of Linux systems
We’d love to see:

Experience programming in C/C++
Familiarity with configuration management tools such as Chef, Puppet, Ansible or Saltstack
Practical knowledge of networking such as TCP/UDP/IP
Familiarity with monitoring tools such as Splunk, ELK, Grafana, Nagios
Perl, Java or JavaScript experience
Experience with virtualization technologies such as Vagrant, Terraform, VMWare, KVM
Knowledge of cloud technologies (OpenStack, AWS, Rackspace, CloudFoundry, OpenShift, WS02)
Experience with big data technologies such as Hadoop, Spark, Cassandra
Knowledge of containerization technologies such as Docker, Mesos, Core OS, Kubernetes

You may return to your current search results by clicking here.

Latest Job Listings