Principal System Reliability Engineer Jobs in Dubai – UAE at HungerStation
Title: Principal System Reliability Engineer
Location: Dubai – UAE
Type: Full Time
Category: IT/Tech, Engineering
Position: Principal System Reliability Engineer United Arab Emirates 2020-08-07
Technology – Engineering Dubai, Dubai, United Arab Emirates Full time System Reliability Engineers (also known as Site Reliability Engineers) are responsible for the keeping all user-facing services (most notably Hunger
Station.com) and many other Hunger
Station production systems running smoothly 24/7/365. SREs are a blend of operations gear-heads and software crafters that apply sound engineering principles, operational discipline and mature automation, specializing in systems, whether it be networking, the Linux kernel, or even a specific interest in scaling, algorithms, or distributed systems.
Keep abreast of latest hardware development methodologies in order to be able to provide best-in-class hardware solutions Assist in managing technical hardware support activities to internal customers in order to establish optimum customer service levels and incident resolution Provide hardware support to the Technology team in order to support continuous operations for the business Provide data to execute root cause analysis (RCA) Keep record of all issues/incidents and provide analytical reports (RCA/Impact Analysis) for resolution implementation and to keep the senior management informed Report on and resolve all hardware malfunctions, anomalies, and issues in a timely and accurate manner, and provide guidance for resolutions to increase the operational capability of the Department and the Business Improve continuous integration/continuous delivery (CICD) practices within the Organization to ensure reaching optimum operational levels
Requirements9- 12 years of relevant experience Bachelor Degree in a relevant field is required Master’s degree in a relevant field is preferred You may be a fit to this role if you:
Think about systems – edge cases, failure modes, behaviors, specific implementations. Know your way around Linux and the Unix Shell. Know what is the use of config management systems like Terraform, Ansible, Chef … etc. Have strong programming skills – Ruby and/or Go. Have an urge to collaborate and communicate asynchronously. Have an urge to document all the things so you don’t need to learn the same thing twice.
Have a proactive, go-for-it attitude. When you see something broken, you can’t help but fix it. Have an urge for delivering quickly and iterating fast. Share our values, and work in accordance with those values. Have experience with Docker, Kubernetes, Prometheus and other cloud-native tools.