Overseeing the data collection platform. Recommending changes or updates and scale to fit the size of the data.
Use streaming and batch technologies to build a analytic pipeline at scale.
Writing code to parse log steams and gather the metric data needed to run graphs for both business and support of a global gaming environment.
Building out, Scaling, and supporting our graphing and metric environment. You just won't be crunching the numbers, but building the systems to do it.
Own the jobs and processes that collect the data from our cloud environments, and all the services inside.
Continuously work on improving and tuning the data collection, what we monitor, and how we alert.
Own the content that powers the operational dashboards displayed throughout our organization; work closely with developers and guide them to write usable metrics and logs.
Ability to lead and execute projects from start to finish.
Build and maintain systems used internally by both the operations teams and development teams; be part of a rotation for 24/7 on call support for monitoring and logging systems.
5+ years of professional experience in a high-availability data center environment for serving large web services.
5+ years of Linux administration in a production Saas/Cloud environment; understanding of high throughput message buses.
Understand log collection agents for pulling metric data from AWS, Cloudwatch, Cloudtrails, and other metrics stored in S3.
Knowledge of chef or other configuration management tools; experience with Grafana or other metric dashboard tools
Involvement in the Open Source community (code writing, patch submission, bug reporting, etc.) is a huge plus.
Proficiency in one or more programming languages, such as Java, Scala, Golang, Python
Experience with big data and distributed systems technologies like HBase, Hadoop, Kafka, Storm, ZooKeeper, Cassandra, Redis, Elasticsearch, OpenTSDB and InfluxDB