Spark Python Developer

Company Overview

We are a Boston-based investment manager that provides global and international equity investment strategies and fund products to institutional investors such as pension plans, endowments, foundations, and registered/unregistered commingled investment funds. We are a registered investment adviser with the U.S. Securities and Exchange Commission (SEC), and a registered commodity trading advisor and commodity pool operator with the U.S. Commodity Futures Trading Commission (CFTC).  Our firm manages over $90 billion for over 175 client relationships in North America, Europe and Australasia.  Our offices are located at 200 Clarendon Street, Boston, Massachusetts.


Job Description

This is a ground floor opportunity for an experienced Python Spark Developer to join the first wave of our rapidly growing Data Engineering team. The individual will be responsible for expanding and optimizing our Spark data feeds. The ideal candidate is an experienced developer, data pipeline builder, and data wrangler. The Data Engineering team reliably supplies data to our business analysts and data scientists to make trading decisions to fund endowments and pension funds.



  • Implement and support data pipelines using PySpark.
  • Assemble large, complex data sets that meet functional / non-functional business requirements.
  • Build processes supporting data transformation, data structures, metadata, dependency and workload management.
  • Work with stakeholders to assist with data-related technical issues, and support their data infrastructure needs.
  • Implement solutions for management and governance across data quality metrics, data quality, metadata, lineage, and data profiling



  • Bachelor’s degree is a requirement. Accounting or Finance concentration or equivalent work experience.
  • 2+ years of experience building applications using Python.
  • Working SQL knowledge and experience working with relational databases, query authoring.
  • Strong analytical skills related to working with unstructured datasets.
  • A successful history of manipulating, processing and extracting value from large disconnected datasets.
  • 1+ years of experience building Spark applications is preferred.
  • Working knowledge of HDFS, YARN, Hive/Impala, Spark, Spark SQL, Spark Streaming, Flume, Kafka
  • Proficient in Linux OS (bash scripting)
  • Experience coding in Object Oriented Languages such as Scala, Java, C#, or C++ is preferred.
  • Experience building streaming data applications is a plus.


Qualified candidates can apply by sending their resume to No telephone calls please.