DevOps Engineer - SRE & Observability

Indeed

Full-time

Onsite

No experience limit

No degree limit

79Q22222+22

Favourites

Some content was automatically translatedView Original

Description

Job Summary: Coderio is seeking a DevOps Engineer - SRE & Observability to design, implement, and maintain the observability strategy and ensure proactive system health. Key Highlights: 1. Strategic and highly visible role within a modern engineering culture 2. Collaborative international team and strong technical leadership 3. Career development and growth opportunities within Coderio **About Coderio** Coderio designs and delivers scalable digital solutions for global enterprises. With a solid technical foundation and a product-oriented mindset, our teams lead complex software projects from architecture through execution. We value autonomy, clear communication, and technical excellence. We collaborate closely with international teams and partners to build technology that makes an impact. Learn more: http://coderio.com **What We’re Looking For** We are seeking a **DevOps Engineer \- SRE \& Observability** to ensure proactive system health. This position focuses on end\-to\-end observability and efficient incident response, aiming to guarantee that end\-user experience remains unaffected. ### **Responsibilities:** * Design, implement, and maintain the observability strategy * Measure and optimize SLOs, contributing to reduced MTTR (Mean Time To Repair) * Lead incident analysis and produce actionable postmortems * Present reliability and performance metrics to stakeholders and executive leadership ### **Technical Requirements:** * **1\. Observability Stack** * Advanced experience in monitoring and metrics using Prometheus, Grafana, Datadog, or New Relic * Centralized log management using ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, or Graylog * Implementation of distributed tracing using OpenTelemetry, Jaeger, or Honeycomb to identify bottlenecks in microservices * **2\. Site Reliability Engineering (SRE Core)** * Ability to define and configure SLIs and SLOs aligned with business expectations * Knowledge of Error Budget management to determine when to prioritize stability over new features * Experience leading blameless postmortem processes and root cause analysis (RCA) * **3\. Automation and Platform** * Proficiency in Infrastructure as Code using Terraform or CloudFormation for automated deployment of monitoring agents * Solid knowledge of Kubernetes/OpenShift with cluster\-level metric collection (Kube\-state\-metrics, Node Exporter) * Ability to automate alert responses and runbooks using Python, Go, or Bash * **4\. Alert Management and Incident Response** * Configuration of intelligent alerts to reduce noise and operational fatigue using PagerDuty, Opsgenie, or VictorOps * Proficiency in rapid diagnostic techniques in production environments under pressure, with focus on reducing MTTR ### **Benefits** * 100% remote work * Long\-term commitment, with autonomy and impact * Strategic and highly visible role within a modern engineering culture * Collaborative international team and strong technical leadership * Career development and growth opportunities within Coderio **Why Join Coderio?** At Coderio, we value talent regardless of location. We are a remote\-first company passionate about technology, collaborative work, and fair compensation. We offer an inclusive, challenging environment with real growth opportunities. If you’re motivated to build impactful solutions, we’re waiting for you. Apply now. We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Source: indeed View original post