




Summary: Seeking a Senior Site Reliability Engineer/DevOps to optimize, maintain, and scale IT infrastructure, combining software and systems engineering for distributed, fault-tolerant systems. Highlights: 1. Spearhead efforts in optimizing, maintaining, and scaling IT infrastructure 2. Design, build, and maintain infrastructure for speedy software development 3. Foster a culture of continuous improvement, testing, and automation We are looking for a **Senior Site Reliability Engineer (SRE)/DevOps** who will spearhead the efforts in optimizing, maintaining, and scaling our IT infrastructure and operations. This role combines software and systems engineering to build and run large\-scale, distributed, fault\-tolerant systems. The ideal candidate will have a strong background in software development, system administration, and a keen interest in network operations and architecture. **Responsibilities** * Design, build and maintain the infrastructure and tools to allow for the speedy development and release of software * Ensure continuous availability, performance and scalability of production systems and services * Implement automation tools for efficient operations and response to system alerts and issues * Collaborate closely with the development team to improve the reliability and performance of the system * Develop and maintain operational documentation and specifications on system builds and operational processes * Monitor and report on service level objectives for a given application's services * Establish key performance indicators in cooperation with business and product owners * Foster a culture of continuous improvement, testing and automation **Requirements** * Bachelor's or Master's degree in Computer Science, Information Technology or related field * 3\+ years of experience in an SRE/DevOps role with a proven track record of scaling and automating large\-scale systems * Understanding of cloud computing services, preferably AWS, Azure or GCP * Proficiency in scripting languages such as Python and Bash along with infrastructure as code tools such as Terraform and CloudFormation * Skills in container orchestration tools such as Kubernetes and Docker * Knowledge of CI/CD pipelines and tools such as Jenkins and GitLab CI * Familiarity with monitoring and alerting tools such as Prometheus, Grafana and New Relic * Excellent leadership and communication skills * English proficiency at B2 level or higher **We offer** * International projects with top brands * Work with global teams of highly skilled, diverse peers * Healthcare benefits * Employee financial programs * Paid time off and sick leave * Upskilling, reskilling and certification courses * Unlimited access to the LinkedIn Learning library and 22,000\+ courses * Global career opportunities * Volunteer and community involvement opportunities * EPAM Employee Groups * Award\-winning culture recognized by Glassdoor, Newsweek and LinkedIn


