职位描述

#2436

Machine Learning Engineer

US Player

上海

香港

新加坡

量化-开发

Tower Research Capital is a leading quantitative trading firm founded in 1998. Tower has built its business on a high-performance platform and independent trading teams. We have a 25+ year track record of innovation and a reputation for discovering unique market opportunities. Tower is home to some of the world’s best systematic trading and engineering talent. We empower portfolio managers to build their teams and strategies independently while providing the economies of scale that come from a large, global organization. Engineers thrive at Tower while developing electronic trading infrastructure at a world class level. Our engineers solve challenging problems in the realms of low-latency programming, FPGA technology, hardware acceleration and machine learning. Our ongoing investment in top engineering talent and technology ensures our platform remains unmatched in terms of functionality, scalability and performance. At Tower, every employee plays a role in our success. Our Business Support teams are essential to building and maintaining the platform that powers everything we do — combining market access, data, compute, and research infrastructure with risk management, compliance, and a full suite of business services. Our Business Support teams enable our trading and engineering teams to perform at their best. At Tower, employees will find a stimulating, results-oriented environment where highly intelligent and motivated colleagues inspire each other to reach their greatest potential. Responsibilities - Architecting and developing the next generation of Tower’s machine learning research platform, with an emphasis on scalability, reliability, observability, and reproducibility - Building infrastructure that enables large-scale experimentation, model training, and simulation across on-premises HPC and multi-cloud environments - Partnering closely with quantitative researchers to understand evolving research workflows and translate them into robust platform capabilities - Designing and optimizing distributed training pipelines for high-throughput, GPU-accelerated workloads - Improving experiment management, model versioning, artifact tracking, and data lineage to ensure transparent and reproducible research - Developing tools and frameworks that streamline feature engineering, dataset generation, and large-scale backtesting - Leading initiatives to improve compute efficiency, resource scheduling, and workload isolation across heterogeneous environments - Enhancing platform observability, including metrics, logging, tracing, and debugging capabilities tailored to ML workloads - Supporting rapid iteration by implementing features and fixes on tight timelines while maintaining high engineering standards - Contributing to long-term architectural decisions that enable the platform to scale with increasing data volumes and model complexity Qualifications - 2+ years of experience designing and building large-scale distributed systems, ideally in support of research or data-intensive workloads - Strong programming experience in Python, with a focus on writing clean, maintainable, and high-performance code - Experience developing and operating applications on Linux-based HPC clusters and/or cloud platforms - Solid understanding of distributed computing concepts, parallel processing, and resource management - Experience with GPU-based workloads and familiarity with modern ML frameworks (e.g., PyTorch, TensorFlow, JAX) - Experience optimizing data pipelines and handling large-scale structured and unstructured datasets - Strong troubleshooting skills with the ability to debug complex, cross-layer system issues - Ability to work independently in a fast-paced, research-driven environment - Strong communication skills and experience collaborating directly with researchers or data scientists Preferred Attributes - Experience building internal ML platforms or research tooling at scale - Familiarity with experiment tracking systems, workflow orchestration frameworks, and model lifecycle management - Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes) - Exposure to quantitative finance, simulation systems, or other latency- and performance-sensitive domains

Contact Our Consultant

Renee Yang

Surrienta Consulting Ltd. @2024