Job Description
1) Top 3 requirements:
a. Python (Object Oriented Programming experience)
b. Pyspark
c. Scala
d. SQL
e. Airflow
f. Experience with big data.
e. Are any of them flexible?
Required:
• Strong verbal and written communication skills to effectively articulate messages to internal and external teams.
Hands on experience with Object Oriented programming with Python
Experience with Python, Pyspark, Scala and SQL
Experience with designing, building, optimizing, troubleshooting end-to-end big data pipelines using structured (relational and files) and semi-structured data
Experience with building metadata driven data processing frameworks
Strong experience in SQL, python, Pyspark, Scala and shell scripting.
Experience working with Airflow.
Experience working with big data.
Ability to take ownership of a request from initial requirements, design, development and production deployment
Nice to have:
• Azure
• Azure EventHubs
• Apace Kafka
• Streaming data
• CosmosDB/NoSQL Database
common data model
workflow – talk to business partners, gather requirements
use azure stack to complete this work
current common data model is a dimensional data warehouse
data factory and functions – not necessarily needed
azure devops – can have something similar like github
all other azure requirements needed
dimensional modeling
SQL – coding
spark – questions on optimization
pyspark – coding
python – coding
shell scripting – if on resume, need to be ready to speak to