Company logo

Vice President of AI Infrastructure & Engineering

San Francisco County, CA 1 day ago ✦ New

Job Description

Vice President of AI Infrastructure & Engineering

Reporting to: CEO


Position Overview

*Build and Overscale Training Platform:* Design and maintain high-performance training architecture supporting GPU clusters at the scale of tens of thousands of cards.

*Key Focus:* Establish a unified task scheduling and model management system (MLOps). All model training, checkpoint storage, and code version control must be exclusively conducted through this platform.


*Engineering-Led Data Governance:*

Develop end-to-end data processing pipelines covering cleansing, annotation, versioning, and secure storage.

*Key Focus:* Implement rigorous data access audit logs and data lineage tracking to ensure security and compliance of core data assets.



*Developer Experience as Control Mechanism:*

Maximize experimental efficiency through standardized toolchains, enabling researchers to focus solely on algorithm development without managing underlying configurations.

*Key Focus:* Codify “best practices” into reusable code templates.


*System Reliability and Disaster Recovery:*

Implement fault tolerance, automated model snapshot archiving, and continuity protocols to prevent loss of critical assets due to single points of failure (hardware or human).



Qualifications

*Background:* 10+ years in distributed systems, cloud computing, or high-performance computing (HPC), with prior experience in core infrastructure teams at leading firms such as Google, Meta, AWS, or NVIDIA.

*Mindset:* Exceptional engineering rigor with a focus on building stable, scalable systems rather than solely pursuing algorithmic innovation. Service-oriented attitude with a commitment to empowering top-tier scientists.

*Technical Skills:* Proficiency in orchestration systems such as Kubernetes, Ray, or Slurm; familiarity with PyTorch distributed training frameworks; deep understanding of data security and access control mechanisms.

More Ctcae Version 5 Pdf Jobs in San Francisco, CA