···
Log in / Register
Middle DevOps Engineer
Indeed
Full-time
Onsite
No experience limit
No degree limit
79Q22222+22
Favourites
Share
Description

Summary: We are seeking a Middle DevOps Engineer to manage Kubernetes GPU orchestration with Volcano, ensuring stable Linux compute platforms for AI and research teams. Highlights: 1. Run Kubernetes GPU orchestration with Volcano for AI and research teams 2. Automate day-to-day operations with Python and UNIX shell scripting 3. Help build efficient, dependable compute infrastructure We are hiring a Middle DevOps Engineer to run Kubernetes GPU orchestration with Volcano and keep Linux compute platforms stable for AI and research teams. You will automate day\-to\-day operations with Python and UNIX shell scripting, tune scheduling and quotas, and work in a client\-facing delivery setup. Apply now to help build efficient, dependable compute infrastructure **Responsibilities** * Provision and support GPU\-capable Kubernetes clusters plus independent Linux compute nodes to maximize scheduling effectiveness and system performance * Operate Volcano scheduling by configuring queues, controlling POD lifecycle, allocating GPU resources, and applying namespace quota controls * Maintain Kubernetes environments by managing namespaces, RBAC, resource quotas, and workload isolation mechanisms * Automate operational workflows by writing and updating Python and Shell scripts for job submission, resource allocation, and monitoring * Partner with orchestration, optimization, and observability teams to improve scheduling performance, utilization, and researcher outcomes * Analyze and report on infrastructure health and resource usage to drive continuous optimization * Implement upgrades to infrastructure, tooling, and automation to improve scalability, performance, and user experience * Assist with operational processes that ensure researchers have an effective environment for AI and computational projects **Requirements** * Hands\-on background of 2\+ years in DevOps or infrastructure engineering for complex, large\-scale environments * Strong knowledge of Kubernetes operations, including namespaces, POD placement and balancing, PVC, NFS, and resource quota management * Practical experience operating Volcano for GPU workloads, including queue creation, priority handling, and Kubernetes integration * Demonstrated experience managing GPU clusters across Kubernetes and standalone Linux setups used for high\-performance computing * Advanced ability in Python scripting to automate infrastructure tasks, job processing, and monitoring workflows * Solid command of UNIX Shell scripting (Bash or similar) to automate system routines and improve operations * Strong Linux administration skills with troubleshooting, performance tuning, and configuration management experience * Deep understanding of automation and orchestration concepts and tools for reliable, scalable infrastructure * Excellent English communication skills (spoken and written) for direct interaction with clients and cross\-functional teams **Nice to have** * Helm experience for Kubernetes application packaging and releases * Observability knowledge with Prometheus, Grafana, and Loki for infrastructure monitoring * Terraform familiarity for Infrastructure as Code and cloud resource automation * Experience with Amazon EKS and Google GKE in multi\-cloud Kubernetes setups * Azure networking skills including VPN, ExpressRoute, and network security * Use of AI coding assistants such as GitHub Copilot, ChatGPT, and Claude to boost code quality and productivity * Knowledge of hybrid scheduling and optimization across cloud and on\-premises compute

Source:  indeed View original post
Sofía González
Indeed · HR

Company

Indeed
Sofía González
Indeed · HR
Similar jobs

Cookie
Cookie Settings
Our Apps
Download
Download on the
APP Store
Download
Get it on
Google Play
© 2025 Servanan International Pte. Ltd.