




Summary: Seeking a Lead DevOps Engineer to own and evolve an AWS platform for custom VDI and cloud playtesting services, driving infra-as-code, ECS/EKS operations, and CI/CD standards. Highlights: 1. Lead DevOps for custom VDI and cloud playtesting services on AWS 2. Drive infrastructure-as-code and optimize GPU EC2 cost/performance 3. Lead incident response and ensure platform reliability and scalability We are building a **Lead DevOps Engineer** role to own and evolve the AWS platform behind a custom VDI solution and cloud playtesting/streaming services. You will drive infrastructure\-as\-code, ECS/EKS operations, AWS Lambda automation, and GitHub Actions CI/CD standards while optimizing GPU EC2 cost/performance and leading incident response across the platform. Apply now to help keep the platform reliable, efficient, and scalable **Responsibilities** * Design, build, and maintain AWS infrastructure with Terraform * Manage Terraform workflows and remote state through HashiCorp Cloud Platform (HCP) * Own the end\-to\-end infrastructure lifecycle, including provisioning, upgrades, decommissioning, and operational hygiene * Operate ECS clusters to deploy and run microservices that support the platforms * Administer EKS clusters that host and enable GitHub Actions runners, including necessary platform customizations * Optimize and right\-size GPU\-enabled EC2 capacity to meet user experience goals under strict cloud cost controls * Assess scaling behavior continuously, monitor utilization, and identify performance bottlenecks * Implement and maintain AWS Lambda functions that automate cleanup tasks, on\-demand provisioning, and operational workflows * Standardize and improve GitHub Actions pipelines for Terraform plan/apply workflows, infrastructure releases, and container image build/publish/deploy processes * Lead troubleshooting and service restoration for platform\-wide degradations such as VDI session drops, authentication issues, and machine/storage failures * Coordinate incident resolution across teams by driving investigation, mitigation, and follow\-up actions * Create and keep current run books, operational documentation, and onboarding materials **Requirements** * Proven 7\+ years of experience in DevOps or platform engineering roles * Deep expertise in AWS infrastructure architecture, provisioning, and full lifecycle management * Hands\-on proficiency with Terraform and HashiCorp Cloud Platform (HCP) * Solid experience operating container orchestration using ECS and EKS * Strong knowledge of GPU\-enabled EC2 right\-sizing, cloud cost management, and performance tuning * Practical competency with AWS Lambda for event\-driven automation * Demonstrated background standardizing CI/CD using GitHub Actions pipelines * Proven track record leading reliability engineering, troubleshooting, and incident resolution * High ownership and accountability with the ability to work independently without close supervision * Strong troubleshooting and systems thinking, staying calm and methodical during incidents * Clear communication skills with both technical and non\-technical stakeholders * Effective prioritization in a Kanban workflow, balancing planned work with urgent interruptions * English proficiency at B2 (Upper\-Intermediate) level or higher **Nice to have** * Familiarity with Amazon GameLift Streams * Understanding of streaming and playtesting platform needs * Ability to triage urgent ad\-hoc requests that fall outside the standard Kanban flow **We offer** * International projects with top brands * Work with global teams of highly skilled, diverse peers * Healthcare benefits * Employee financial programs * Paid time off and sick leave * Upskilling, reskilling and certification courses * Unlimited access to the LinkedIn Learning library and 22,000\+ courses * Global career opportunities * Volunteer and community involvement opportunities * EPAM Employee Groups * Award\-winning culture recognized by Glassdoor, Newsweek and LinkedIn


