Site Reliability Engineer

What you’ll do

Build and run large-scale, highly distributed, fault-tolerant systems.
Develop tools and automated solutions in support of hosted services
Handle and resolve issues escalated in the production environment.
Troubleshoot performance, reliability, and scalability issues.
Collaborate with application engineers and train developers as needed.

What you need to succeed

This position requires strong technical skills, excellent communications and problem-solving skills, and the ability to engage and interact with other teams.
The ideal candidate will have skills and experience operating and supporting Internet hosted applications and protocols
Build on industry leading infrastructure tools and technologies such as Terraform, Chef, AWS to create tailored solutions solving challenging problems at scale

Must have:

BSc in Computer Science or equivalent
2+ years of experience Programming experience with web technologies, infrastructure automation
Knowledge of best engineering practices around building high performance, reliable and scalable Web Services
Experience in administration and automation of Linux Servers.
Ability to dig deep, debug and troubleshoot problems on distributed systems
Git proficiency
Willingness to be part of a team on-call rotation

Technologies we use:

Amazon Web Services.
DataDog, Nagios, New Relic
Terraform, Chef
Python, Ruby
Postgresql, ElasticSearch
Nginx, HAProxy

You may return to your current search results by clicking here.

Latest Job Listings