




Summary: Seeking a Senior Data Engineer with expert-level PySpark skills and hands-on experience building ETL pipelines, data lake architectures, and data feed integrations on AWS to contribute to large-scale data solutions. Highlights: 1. Work with structured and unstructured data from multiple enterprise sources 2. Opportunity to contribute to large-scale data solutions 3. Collaborate with cross-functional teams in a dynamic environment We are seeking an experienced **Senior Data Engineer** with expert\-level skills in PySpark and hands\-on experience building ETL pipelines, data lake architectures, and data feed integrations on AWS to join our team. You will work with both structured and unstructured data, ingesting from multiple on\-premises and enterprise data sources such as SAP, Intelex, SQL, and OSI PI into AWS. This role offers the opportunity to contribute to large\-scale data solutions and collaborate with cross\-functional teams in a dynamic environment. **Responsibilities** * Design, develop, and optimize ETL pipelines using PySpark and AWS Glue Jobs to process large volumes of structured and unstructured data * Orchestrate data workflows with Apache Airflow, ensuring reliable scheduling, dependency management, and robust error handling * Build and maintain data feeds from on\-premises and enterprise systems into AWS data lake environments * Integrate with enterprise data sources including SAP for ERP and operational data, Intelex for environmental, health, safety, and quality data, SQL databases for relational data, and OSI PI for real\-time industrial and process historian data * Develop and manage API interactions to extract data from on\-premises services into AWS * Handle data extraction, transformation, and loading across various formats and protocols * Support the design and maintenance of AWS data lake architectures using Amazon S3, AWS Glue, and Lake Formation * Ensure data is cataloged, partitioned, and optimized for analytics and reporting * Implement data quality checks, validation, and lineage tracking across all pipelines **Requirements** * Minimum 3 years of experience in data engineering roles * Advanced proficiency in Python and PySpark for data processing and pipeline development * Strong background in Extract, Transform, Load (ETL) processes * Experience orchestrating workflows with Apache Airflow * Proven track record building production\-grade data pipelines on AWS * Hands\-on experience with AWS Glue Jobs for ETL processing * Familiarity with Amazon S3, data lake patterns, and data cataloging techniques * Experience using AWS\-native monitoring and operational tools * Skilled in integrating with enterprise systems via APIs, JDBC, or native connectors, including SAP, Intelex, SQL databases, and OSI PI * Ability to work with both structured and unstructured data formats * Excellent documentation, communication, and collaboration skills * English communication skills at B2\+ level or higher, both written and spoken **Nice to have** * Familiarity with energy, oil \& gas, or industrial data environments * Understanding of Drilling and Completions data flows and terminology


