Spark would be nice to have
AWS (EMR, Glue, S3)
Things we are looking for:
– Experienced with object oriented and functional programming styles with Scala, Java, or C# with at least the knowledge of and desire to work with Scala
– Experienced with writing Spark ETLs (Spark/Scala, PySpark) and optimizing via the Spark UI
– Experienced with AWS infrastructure and infrastructure as code (terraform) with AWS EMR, Lambda, Step Functions, IAM, SES, SNS, S3, Elastic Beanstalk, DocDB
– Experience implementing and maintaining continuous integration and delivery pipelines (Gitlab CI/CD) to build, test, and deploy artifacts
– Believe in optimization, efficiency and testing
– Believe in incremental delivery and able to take complex business requirements and write software to achieve those requirements
We’re primarily looking for a Software Engineer who has a few years of experience being a Software Architect. We’re currently in need of someone with experience and desire to architect out solutions and possibly rework our current solutions. It is also important to be able to fully build out these solutions and work maintenance/upgrades for our current applications and jobs. We have a large company’s budget, but a small company’s freedoms. We’re able to decide how we want to run our team and build our solutions with no overhead or approvals while also avoiding unnecessary processes, hurdles and meetings.
Our team’s primary responsibility is to work in conjunction with the data science team to make their metrics and experimentation a reality with the data we process. The metrics and calculations are run over ~20-30Tbs of data daily within 2-3 hours. We are also data engineers: data extract, transformation, loading large amounts of data (~500Gbs a day) from multiple sources into readable and usable tables. All of our production solutions and jobs are running in AWS. Along with those responsibilities, we also create solutions to help our organization as a whole, if we see a need, we can design a solution in order ease everyone’s day to day. We have an automated documentation application that reads through team’s Gitlab DDL repos for their current table documentation and posts to our internal website, we own a ingest encryption process to provide teams with a single solution to encrypt their PII via configurations and more. Our data is generally batched processed daily and hourly, they include an event stream from our products and websites along with daily feeds of our company’s accounts, equipment, call data and much more.