Cloud NOC Engineer - AR

Indeed

Full-time

Onsite

No experience limit

No degree limit

Pje. Centenario 130, C1405 Cdad. Autónoma de Buenos Aires, Argentina

Favourites

Some content was automatically translatedView Original

Description

Job Summary: We are seeking Cloud Support Engineers to proactively monitor the health of data centers, manage incidents, and ensure continuity of critical operations. Key Highlights: 1. Critical role in ensuring continuity of large data center operations 2. Proactive 24/7 monitoring and end-to-end incident management 3. Work on critical, uninterrupted telecommunications infrastructure ### **Overview** Whitestack deploys private clouds across multiple capitals in Latin America. At each of these sites, dozens or even hundreds of servers operate, interconnected via high-speed networks and designed to support mission-critical applications—including mobile operator voice traffic—requiring availability levels approaching 99.999%. For this reason, we are seeking top-tier engineers for our *Cloud Support* team—roles of high strategic importance to ensure continuity of large data center operations supporting the critical, uninterrupted telecommunications applications and infrastructure we deploy. The **Cloud NOC Engineer** is the guardian of this infrastructure. Their mission is proactive 24/7 monitoring of data center health, detecting anomalies before they impact service. They serve as the first line of response, responsible for end-to-end incident management: from detection and ticket creation through resolution of low- to medium-complexity failures, and structured technical escalation to L1/L2 levels. **This role is available for remote work from the following locations: Mexico, Chile, Argentina, Colombia, Uruguay, and Peru.** **Available shifts: Mexico, Colombia, Peru starting at 1 PM. / Argentina, Chile, Uruguay starting at 8 AM.** ### **Responsibilities** * Proactive Monitoring: Continuous surveillance of dashboards and alerts (physical infrastructure, virtual infrastructure, and services) to guarantee 99.999% availability. * Incident Management (Triage): Receiving, categorizing, and prioritizing alerts; rigorously opening and tracking tickets using ITIL methodologies. * Initial Technical Resolution: Diagnosing and resolving low- and medium-complexity failures (e.g., service restarts, log cleanup, quota adjustments, basic connectivity verification). * Structured Escalation: When complexity exceeds initial capability, escalate to L1/L2 with a complete technical report (logs, network traces, reproduction steps, and customer context). * Case Documentation: Maintaining up-to-date event logs and knowledge base (KB) entries for recurring incidents. * External Communication: Clearly and promptly notifying customers regarding system health status, maintenance windows, and ongoing incidents. * Health Checks: Executing periodic validation routines to assess the health of production platforms. * Ensuring compliance with SLAs for incidents and network/service availability. * Generating and analyzing platform availability reports. ### **Requirements** * Experience: + Minimum 1–2 years in Network Operations Centers (NOC), Tier-1 technical support, or system administration. + Experience handling tickets and support processes (Jira, ServiceNow, or similar), including clear documentation of diagnostics, evidence, and communications. + Experience with monitoring/observability tools such as Prometheus, Grafana, Elasticsearch, OpenSearch, OpenNMS; ability to read and interpret metrics, events, logs, and alerts. + Experience supporting mission-critical production systems, including incident management, coordination of production actions, escalation, and effective communication. * Education: + Degree in Computer Engineering, Systems Engineering, Electronics Engineering, or related field. * Specific Knowledge / Technical Requirements: + Linux in production environments: troubleshooting services and OS (systemd, journalctl), permissions/users, processes, filesystems, and networking. + Linux Networking: configuration and diagnostics of interfaces, VLANs, routes, bonding, and MTU; troubleshooting using tools such as tcpdump (sniffing), ip, ss, ethtool, ping/traceroute. + Kubernetes: production-level operation/administration and troubleshooting (Pods, Deployments/DaemonSets, Services, events/logs, readiness/liveness; familiarity with storage PV/PVC). + Virtualization: experience operating and supporting virtualized environments (KVM/VMware/Hyper-V or others), including diagnosis of common compute, network, and storage failures. + Automation: ability to automate repetitive tasks using Bash and Ansible and/or Python (information gathering, operational checks, basic remediation, safe production scripts). + Intermediate English proficiency to read/write technical documentation, update stakeholders, and interact with vendors/manufacturers during support cases. * Professional Requirements + Autonomy (to achieve optimal results) + Adherence to world-class standards + Goal orientation + Openness to learning new technologies + Analytical thinking + Teamwork (to coordinate with development and product deployment teams) + Rapid adaptation to a highly dynamic environment * Desired Technical Requirements + Experience with OpenStack (operation, troubleshooting, or administration) and/or KVM + Understanding of fixed or mobile network operations models + Experience integrating and operating open-source projects in production environments + Intermediate Networking: BGP, EVPN-VXLAN, etc. + Certifications: Linux, OpenStack, Kubernetes Administrator (CKA or equivalent) + Courses in Ansible and/or Bash scripting + Knowledge of ITIL (Incident, Request, Problem, Change Management) and/or Scrum #### **About Us** **Whitestack** is a leading Latin American company specializing in cloud solutions and hyper-scalable digital infrastructure. We leverage open-source technologies and industry-leading standards to drive digital transformation across the region. We are a **Great Place to Work**, where innovation, collaboration, and personal development are core to our identity. **Why join Whitestack?** International exposure: Participate in global initiatives and travel to collaborate with teams across countries. ️ Real work-life balance: Policies tailored to your lifestyle, enabling autonomous, purpose-driven work. Clear career growth: A robust career path in both leadership and technology. Health first: Private medical insurance for you and your family. Unlimited learning: Access to courses, books, materials, and certification reimbursement. Languages for the world: Language courses so your growth knows no borders. Technology in your hands: Equipment renewal every 3 years… and it’s yours at the end of the term! Recognition for effort: Performance and project success bonuses. Time for you: Minimum 15 vacation days, a birthday day off, and extra breaks before National Holidays, Christmas, and New Year. Connection and fun: Budget for recreational and team-building activities. Innovation culture: Your ideas matter. We encourage strategic participation from any role. Learn more about our benefits here.

Source: indeed View original post