Service Reliability engineer for the IT Service Management (ITSM) team .
Our team ensures that reliability and performance of both internal and externally-visible systems matches users' needs, while facilitating a fast rate of improvement.
We use a mix of open source, SaaS and internally developed tools to manage and support our services life cycle (deploy, configure, upgrade, monitor, optimize).
Responsibilities
- maintain live services by monitoring performance indicators such as availability, throughput and latency
- grow systems by advocating for changes that enhance reliability, performance and serviceability
- build, deploy, improve, scale and augment our kubernetes-based services
- work closely with development teams members to design and deploy new services
- work with the development teams on the CI/CD processes
- practice corrective incident response
- perform and automate deployment, maintenance and upgrades across a fleet of servers and devices
- draft and maintain service documentation and processes
- perform routine system audits
- determine bottlenecks and evolve automation of repetitive tasks
- you will be working with technologies like Cassandra, PostgreSQL, RabbitMQ, SaltStack, Kafka, MQTT, LAMP and more
- take part in 24/7 on-call rotation handling incidents and L2 support
Requirements
- 7+ years experience with Linux
- university Degree in Computer Science, Engineering or similar faculties
- understanding of IP networking
- strong command line skills
- strong scripting skills
- experience with virtualization technologies (Xen, KVM)
- experience with DB administration
- experience with Cloud (AWS/EC2 and Google Compute Engine)
- experience in one or more of the following: C, C++, Python, Go
- experience with monitoring tools (zabbix, prometheus, etc)
- experience with containers
- experience with kubernetes
- familiarity with storage technologies such as RAID, LVM, SAN etc. is a plus
Personality traits
- ability to take initiative and have the discipline to work reliably with little supervision
- be able to work in a fast paced environment without losing focus
- follow team policies, document all work and changes
- value visibility and maintainability over complex hacks
- be able to manage day-to-day assignments and deliver solutions as a proactive team member
- automate routine tasks with consideration for user feedback friendliness and serviceability
- have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
Benefits
- Private health insurance plan
- Educational expenses for courses, certifications and books
- You will have the opportunity to work closely with a highly motivated multicultural team in a dynamic and fast paced environment that provides the ground and the opportunity for international career development.