




Summary: We are seeking a Chief Cloud Engineer to lead operational excellence across the cloud platform by owning observability, incident response, resilience, and disaster recovery. Highlights: 1. Spearhead operational excellence across the cloud platform 2. Lead teams from a technical standpoint and influence project direction 3. Apply automation and AI-assisted tools for operational efficiency We are searching for a **Chief Cloud Engineer** to become part of our team. You will spearhead operational excellence across the cloud platform by taking ownership of observability, incident response, resilience, and disaster recovery. This position guarantees that the "run" side matches the strength of the "build" side, ensuring cloud workloads stay healthy, compliant, and high\-performing. **Responsibilities** * Take ownership of operational health dashboards, alert thresholds, and incident response playbooks for the cloud platform * Direct on\-call rotations, coordinate major incident resolution, and lead post\-incident reviews * Deploy and sustain Disaster Recovery (DR) solutions for core applications, covering DNS routing strategies and low\-RTO repositories * Oversee patching pipelines, golden images, container registries, backups, and automated resilience testing * Collaborate with platform engineers to channel operational insights into architecture enhancements and the roadmap * Apply automation and AI\-assisted tools to correlate anomalies, minimize noise, and speed up root\-cause analysis * Train product teams on DR patterns, operational best practices, and shared responsibilities **Requirements** * A Bachelor's or Master's degree in Computer Science, Computer Engineering, or equivalent professional background * A minimum of 7 years of relevant professional experience * At least 2 years of leadership and team management background, with the ability to lead teams from a technical standpoint, influence project direction, promote technical best practices, and deliver high\-quality outcomes * Involvement in at least 2 full\-cycle projects, or participation in multiple projects spanning different phases of the development lifecycle * Practical experience in cloud operations or SRE positions with deep exposure to AWS or similar hyperscale platforms * Advanced capabilities in monitoring, alerting, logging, and incident management tooling * Demonstrated record of carrying out disaster recovery strategies, backup regimes, and resilience testing * Strong understanding of patching processes, golden AMI and container image management, and change control governance * Hands\-on experience automating operational workflows to lower MTTR and toil using tools such as Python, Lambda, and runbooks * Acquaintance with AI\-assisted observability and correlation tooling, along with the ability to operationalize it * Strong communication abilities for on\-call coordination and stakeholder updates * Outstanding spoken and written English communication skills (B2\+ level or higher)


