




Summary: Seeking a Senior SRE to enhance cloud reliability, contribute to incident response, and drive continuous improvement in an Azure environment. Highlights: 1. Opportunity to build a brand through reliable, high-performing services 2. Hands-on role in high-stakes environments, shaping SRE process maturity 3. Chance to mentor and upskill team members in SRE principles and Azure We just launched services for our client in Azure, and service health is our top priority. As we build our brand through reliable, high\-performing services, we are seeking a **Senior SRE** who can immediately contribute to incident response, troubleshooting, and the ongoing improvement of our cloud reliability. This is a hands\-on role for someone who thrives in high\-stakes environments, can operate with minimal SRE process maturity, and is passionate about both firefighting and building for the future. **Responsibilities** * Develop and automate operational processes to improve system reliability, scalability, and performance * Collaborate with development and operations teams to embed reliability best practices into the SDLC * Rapidly respond to and resolve service incidents in our Azure environment, minimizing downtime and customer impact * Lead root cause analysis and post\-incident reviews, driving actionable improvements * Design, implement, and maintain robust monitoring, alerting, and observability solutions for all critical services * Proactively identify and address reliability risks before they impact customers * Help establish and mature SRE practices, including incident management, blameless postmortems, and SLO/SLI definition * Mentor and upskill team members in SRE principles and Azure best practices * Analyze trends in incidents and outages to drive long\-term improvements * Champion a culture of reliability, accountability, and continuous learning **Requirements** * 3\+ years in SRE, DevOps, or related roles, with a strong track record in cloud environments (Azure experience required) * Deep expertise in troubleshooting distributed systems, networking, and cloud\-native architectures * Hands\-on experience with Azure monitoring, logging, and automation tools (Azure Monitor, Log Analytics, Application Insights, ARM, Bicep, Terraform) * Proficiency in at least one scripting or programming language (Python, PowerShell, Bash) * Strong understanding of incident management, on\-call operations, and post\-incident analysis * Experience implementing observability solutions and defining SLOs/SLIs * Excellent communication skills and the ability to work cross\-functionally in high\-pressure situations * English proficiency at B2 level or higher **Nice to have** * Proficiency in Python * Azure certifications (Azure Solutions Architect, Azure DevOps Engineer) * Experience in environments with low SRE process maturity, building practices from the ground up * Familiarity with CI/CD pipelines and infrastructure as code * Experience mentoring or leading SRE/DevOps teams


