




Job Summary: We are looking for a DevOps/SRE Engineer to own the cloud infrastructure, designing resilient systems and automating processes to ensure continuous platform operation. Key Highlights: 1. Own the infrastructure, not just maintain it 2. Automate everything that can be automated 3. Direct impact on thousands of daily calls **About Kleva** Kleva is an AI-powered agent platform that automates collections management. We build intelligent systems capable of deciding when and how to contact customers, as well as holding real conversations with them. We do not build simple automated bots. https://youtu.be/PtoAX02XuO8 About the Role We are seeking the person who will own Kleva’s cloud infrastructure — building and scaling it from within. This is not a reactive operations role; it is for someone who treats infrastructure as a product: designing resilient systems, automating everything possible, and ensuring our voice agents never fail during thousands of simultaneous calls. You will be responsible for uninterrupted, scalable platform operation: enabling fast and safe deployments, ensuring monitoring detects issues before customers experience them, and allowing infrastructure to grow with the business without becoming unmanageable. Responsibilities * Design, implement, and maintain Kleva’s cloud infrastructure (AWS/GCP) * Build and optimize CI/CD pipelines for continuous and secure deployment * Manage container orchestration (Kubernetes/Docker) and ensure high availability * Implement end\-to\-end observability: logging, metrics, alerts, and dashboards * Secure the voice pipeline infrastructure: low latency, high concurrency, zero downtime * Manage multi\-tenant architecture and guarantee isolation between customers * Automate operational tasks and minimize manual work * Optimize infrastructure costs without sacrificing performance * Implement security and compliance practices across all stack layers **Consider applying if…** * You want to own the infrastructure, not just maintain it * You enjoy automating everything that can be automated * You seek direct impact: the infrastructure you build will support thousands of calls per day You prefer autonomy and ownership over bureaucracy and endless tickets * We’re not a good fit if… * You expect a stable, well-defined environment where everything is documented * You prefer executing what others design rather than making architectural decisions * You’re uncomfortable working with ambiguity. In early-stage companies, priorities shift rapidly **Requirements** * 3\-5 years of experience in DevOps, SRE, or cloud infrastructure * Solid experience with AWS or GCP (EC2, EKS, Cloud Run, RDS, or equivalents) * Production-level Kubernetes and Docker expertise * Experience building CI/CD pipelines (GitHub Actions, GitLab CI, or similar) * Proficiency with Infrastructure as Code (Terraform, Pulumi, or similar) * Ability to diagnose and resolve performance and availability issues under pressure Desirable * Production experience with voice or telephony systems * Familiarity with observability tools (Datadog, Grafana, Prometheus) * Experience with real-time messaging platforms (WebSockets, Kafka, or similar) * Background in fintech or high-availability SaaS * Experience in early\-stage startups **Benefits** * + Competitive salary in USD + Three weeks of paid vacation per year + Hubs in Buenos Aires and Mexico City for those opting for a hybrid model, with fully remote option available + In every interaction, we value results over egos or hierarchies


