AWS Glue Data Engineer / 2 months ago
Schaumburg, Illinois, United States
Job Description:
Job Details
Type: Full-time
Experience: 5-9 years
Functions: Consulting, Finance, Information Technology, Data Engineering & Analytics
Industries: Capital Markets, Investment Banking, Alternative Investments, Financial Services, Management Consulting, Information Technology and Services, Business Travel Healthcare
Job Description
We are looking for an AWS Data Engineer with primary skills on Python & PySpark development who will be able to design and build solutions for one of our Fortune 500 Client programs, which aims towards building an Enterprise Data Lake on AWS Cloud platform, build Data pipelines by developing several AWS Data Integration, Engineering & Analytics resources. There is no requirement for Machine Learning skills. This is a high visibility, fast-paced key initiative will integrate data across internal and external sources, provide analytical insights, and integrate with the customer’s critical systems.
Key Responsibilities
Design, build and unit test applications on Spark framework on Python.
Build Python and PySpark based applications based on data in both Relational databases (e.g. Oracle), NoSQL databases (e.g. DynamoDB, MongoDB) and filesystems (e.g. S3, HDFS)
- Build AWS Lambda functions on Python runtime leveraging pandas, json, boto3, requests, avro libraries
- Build PySpark based data pipeline jobs on AWS Glue ETL requiring in-depth knowledge on AWS Glue Dynamic Frames and Options
- Build Python based event-driven integration with Kafka Topics, leveraging Confluent Kafka libraries
Design and Build Generic, Reusable utility applications in Python
Build the Python programs across Glue ETL jobs and Lambda functions
Optimize performance for data access requirements by choosing the appropriate native Hadoop file formats (Avro, Parquet, ORC etc) and compression codec respectively.
Design & Build S3 buckets, tiers, lifecycle policies, as strategic storage layer for \\\\Data Lake\\\\
Optimize performance of Spark applications in Hadoop using configurations around Spark Context, Spark-SQL, Data Frame, and Pair RDD's
Setup the Glue crawlers in order to catalog OracleDB tables, MongoDB collections and S3 objects
Configure Athena tables and SQL views based on Glue Cataloged datasets
Ability to monitor, troubleshoot and debug failures using AWS CloudWatch and Datadog
Ability to solve complex data-driven scenarios and triage towards defects and production issues
Participate in code release and production deployment.
Key Responsibilities:
Key Responsibilities
Bachelor’s Degree or equivalent in computer science or related and minimum 5+ years of experience
Certified on one of - Solution Architect, Data Engineer or Data Analytics Specialty by AWS
Require 3+ hand-on experience on Python and PySpark programming
Require 2+ hands-on experience on AWS S3, Glue ETL & Catalog, Lamba Functions, Athena & Kafka
Require 1+ hands-on experience on Confluent Kafka integration
Require hands-on experience working on different file formats i.e. avro, parquet, orc, json, xml
Require hands-on experience on Python pandas, requests, boto3 module
Require hands-on experience in writing complex SQL queries
Require hands-on experience using REST APIs
Require Financial Services industry experience
Preferred expertise on Snowflake, AWS Redshift & DynamoDB
Ability to use AWS services, predict application issues and design proactive resolutions
Require to be part of Production Rollouts of successful implementation of workflows and Collibra products
Require Technical Coordination skills to drive requirements and technical design with multiple teams
Requires aptitude to help build skillset within organization
Education Level
Bachelor's Degree