Machine Learning Engineer - Multi-Modality Foundation Model
Company: Zoox
Location: Boston
Posted on: April 1, 2026
|
|
|
Job Description:
The Perception team is pioneering the development of a
multi-modality foundation model to drive the next generation of
autonomous system intelligence. As a Multi-modality Foundation
Model Engineer, you will focus on building highly efficient,
production-ready multi-modality models. We are looking for experts
who have hands-on experience building multi-modality foundation
models—whether that involves AV-centric modalities (Vision, LiDAR,
Radar) or broader domains (Vision, Language, Text, Audio). You will
design, train, and deploy these models using Knowledge Distillation
(KD) to transfer capabilities from large-scale proprietary teacher
models to efficient student models capable of real-time, on-vehicle
inference. In this role, you will: Build, pre-train, and evaluate
large-scale multi-modality foundation models from the ground up,
successfully aligning diverse data streams (e.g., Vision, LiDAR,
Radar, Language, Audio). Define and execute the ML roadmap for
deploying these multi-modality representations to the vehicle.
Architect and implement Knowledge Distillation pipelines to
compress large-capacity multi-modal teacher models into highly
efficient, production-ready student models. Build high-quality
training and evaluation datasets, applying advanced data-centric
techniques to maximize cross-modal representation learning and
student model convergence. Collaborate with downstream perception
teams to integrate and validate the performance, robustness, and
latency of your models in on-board production systems.
Qualifications: MS or PhD in Computer Science, Machine Learning, or
a related technical field with demonstrated professional
experience. Deep, proven expertise in building and training
large-scale multi-modality foundation models (e.g., Vision-Language
Models (VLMs), Vision-Audio-Text, or Vision-LiDAR-Radar
architectures). Strong understanding of cross-modal alignment,
multi-modal attention mechanisms, and large-scale pre-training
techniques. Proven experience in Knowledge Distillation (KD), model
compression, and training highly efficient student models for
production environments. Proficiency in ML frameworks (e.g.,
PyTorch) and experience building large-scale ML training and
evaluation pipelines. Bonus Qualifications: Experience in the
Autonomous Driving or robotics industry. Experience with model
deployment, optimization, and hardware constraints (e.g., C++ for
inference, TensorRT, quantization, pruning). Publications in
top-tier conferences (CVPR, ICCV, NeurIPS, ICLR, ACL) related to
multi-modality foundation models, cross-modal learning, or model
compression. $189,000 - $258,000 a year Base Salary Range There are
three major components to compensation for this position: salary,
Amazon Restricted Stock Units (RSUs), and Zoox Stock Appreciation
Rights. A sign-on bonus may be offered as part of the compensation
package. The listed range applies only to the base salary.
Compensation will vary based on geographic location and level.
Leveling, as well as positioning within a level, is determined by a
range of factors, including, but not limited to, a candidate's
relevant years of experience, domain knowledge, and interview
performance. The salary range listed in this posting is
representative of the range of levels Zoox is considering for this
position. Zoox also offers a comprehensive package of benefits,
including paid time off (e.g. sick leave, vacation, bereavement),
unpaid time off, Zoox Stock Appreciation Rights, Amazon RSUs,
health insurance, long-term care insurance, long-term and
short-term disability insurance, and life insurance. About Zoox
Zoox is developing the first ground-up, fully autonomous vehicle
fleet and the supporting ecosystem required to bring this
technology to market. Sitting at the intersection of robotics,
machine learning, and design, Zoox aims to provide the next
generation of mobility-as-a-service in urban environments. We’re
looking for top talent that shares our passion and wants to be part
of a fast-moving and highly execution-oriented team. Follow us on
LinkedIn Accommodations If you need an accommodation to participate
in the application or interview process please reach out to [email
protected] or your assigned recruiter. A Final Note: We may use
artificial intelligence (AI) tools to support parts of the hiring
process, such as reviewing applications, analyzing resumes, or
assessing responses. These tools assist our recruitment team but do
not replace human judgment. Final hiring decisions are ultimately
made by humans. If you would like more information about how your
data is processed, please contact us.
Keywords: Zoox, Malden , Machine Learning Engineer - Multi-Modality Foundation Model, Engineering , Boston, Massachusetts