




Summary: Seeking an experienced Lead Site Reliability Engineer to spearhead infrastructure reliability initiatives, guide a team, and drive operational excellence across cloud-based platforms. Highlights: 1. Lead design and evolution of resilient, scalable infrastructure 2. Mentor and guide a team of engineers, fostering technical growth 3. Shape technical strategy and drive operational excellence We are seeking an experienced **Lead Site Reliability Engineer** to spearhead our infrastructure reliability initiatives and guide a team of talented engineers. In this role, you will shape technical strategy, mentor team members and drive operational excellence across our cloud\-based platforms and distributed services. EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi\-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting\-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential. **Responsibilities** * Lead the design and evolution of resilient, scalable infrastructure across multiple cloud providers * Mentor and guide a team of engineers, fostering technical growth and best practices * Define reliability standards, SLOs and operational policies for production environments * Architect automation frameworks to streamline deployments and infrastructure management * Oversee CI/CD strategy and ensure efficient software delivery workflows * Coordinate incident response efforts and lead post\-mortem analyses to prevent recurrence * Partner with engineering leadership to align reliability goals with business priorities * Champion observability practices to enhance system visibility and proactive issue detection * Provide technical direction for microservices and event\-driven architecture initiatives * Evaluate emerging tools and technologies to enhance the reliability ecosystem * Drive capacity planning, cost optimization and performance tuning across platforms **Requirements** * 5\+ years of experience in DevOps or Site Reliability Engineering * Expertise in AWS, Azure and GCP * Competency in Kubernetes, Terraform and Ansible * Skills in GitHub and Jenkins * Knowledge of microservices, APIs and event\-driven processing * Strong written and verbal English communication skills (B2\+) **We offer** * Connectivity Bonus (25,000 ARS are paid with a salary receipt at the end of each month as a non\-wages concept). * Medicina Prepaga (It covers the collaborator and direct family group). * Paternity Leave (Two additional days are added to what is established by law, total of 4 days). * Discounts card. * English Training (English lessons, twice per week). * Training Program (Access to multiple customized training plans according to the needs of each role within the company). * Marriage bonus (The company doubles the allowance established by law that ANSES offers). * Referral Program (Referral bonus is paid when the referral of a collaborator joins the Company). * External Agreements and Discounts. * Vacations: 14 calendar days a year *By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.*


