Job Title

PySpark Developer

  • Position:
  • Salary: $63-66/HR
  • Location:
      Remote
  • Work Eligibility: USC, GC, GC-EAD, TN, H1, H4-EAD, OPT-EAD, CPT
  • Job ID: 06270
Share This Job
Required Skills:

Shivam 7 Kings Code

269 Active Positions

Job Description

Job Description
SUMMARY:

5+ years of experience in handling Data Warehousing and Business Intelligence projects in Banking, Finance, Credit card and Insurance industry.
Design and Developed real time streaming pipelines for sourcing data from IOT devices, defining strategy for data lakes, data flow, retention, aggregation, summarization for optimizing the performance of analytics products.
Extensive experience on Data analytics
Good knowledge on Hadoop Architecture and its ecosystem.
Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.
Experience on migrating on Premises ETL process to Cloud.
Work on various Hadoop file formats
Experience in Data Warehousing applications, responsible for the Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse
Experience in optimizing Hive SQL queries, Datastage and Spark Jobs.
Implemented various frameworks like Data Quality Analysis, Data Governance, Data Trending, Data Validation and Data Profiling with the help of technologies like Spark, Python and DB2
Experience with creation of Technical document for Functional Requirement, Impact Analysis, Technical Design documents, Data Flow Diagram with MS Visio.
Experience in delivering the highly complex project with Agile and Scrum methodology.
Quick learner and up-to-date with industry trends, Excellent written and oral communications, analytical and problem-solving skills and good team player, Ability to work independently and well-organized.
PROFESSIONAL EXPERIENCE:

Design and develop ETL integration patterns using Python on Spark.
Develop framework for converting existing Datastage mappings and to PySpark (Python and Spark) Jobs.
Create Pyspark frame to bring data from DB2
Translate business requirements into maintainable software components and understand impact (Technical and Business)
Provide guidance to development team working on PySpark as ETL platform
Optimize the Pyspark jobs to run on Kubernetes Cluster for faster data processing
Provide workload estimates to client
Migrate On prem ETL process to AWS cloud and Snowflakes
Implement CICD(Continuous Integration and Continuous Development) pipeline for Code Deployment
Reviews components developed by the team members