Cogent labs is looking for a Site Reliability Engineer (3+ years of relevant experience) to help create innovative and creative services based on AI. Successful candidates will join a highly skilled and growing team, and should be able to help plan out high-quality backend solutions, maintain service SLOs and cloud infrastructure, as well as set up effective tooling, monitoring and alerting.

Required experience and competencies

  • Understands large-scale complex systems from a reliability perspective
  • Experience working with Kubernetes and container-based applications
  • Deep network understanding and troubleshooting ability
  • Coding abilities in Python, JavaScript, or Go
  • Experience with Cloud Computing platforms (particularly GCP) a plus

Responsibilities

  • Setting up and maintaining service SLOs
  • Specifying and developing scalable and performant cloud infrastructure
  • Developing and maintaining a comprehensive continuous integration/deployment system
  • Maintaining monitoring/alerting and measuring availability, latency, and overall system health
  • System design consulting, developing software platforms and frameworks, capacity planning and launch reviews

Team culture

The Cogent Labs engineering department is continuously working towards developing a culture improving and rewarding the following qualities:

  • Team effort: A cohesive team can be more effective than an isolated prodigy. Engineers are expected to work well in groups and look for opportunities to empower their colleagues.
  • Responsibility: Take responsibility for your own tasks and hold others responsible for theirs.
  • Self-improvement: Create an environment where engineers can focus on their engineering tasks and self-improvement without excessive outside disturbances.
  • Experimentation: Engineers should have some freedom in experimenting with new ideas and technologies, as this ultimately could translate into building better products or the creation of valuable new IP.
  • Quality: Maintaining a mindset of developing high quality features and code.