- Infrastructure as-a-Service
- MLOps as-a-Service
- Data Management as-a-Service
- Synthetic Data Generation as-a-Service
Specifics:
- Design and implement efficient throughput, utilization and scalable environments (both in Cloud and On-Prem) running highly parallelized jobs across multiple clusters
- Design for optimized performance and “push-button” automation for AI Model Training and Inferencing test runs including considerations for network switch & fabric, storage & caching, cluster orchestration, and run-time optimization
- Lead the development of customizing an MLOps platform that is specific for AV use cases
- Architect and implement data pipelines that efficient ingest petabytes of fleet car data into a central data lake while ensuring the proper indexing, data quality checks, and coarse labeling
- Leverage third-party synthetic data generation software to be specific for AV
- Architect solutions that resolve painpoints within the AV simulation space
- Work directly with key AV customers to understand their technology and deliver the best solutions
- Masters or equivalent experience in Computer Architecture, Computer Science, Electrical Engineering or related field
- 6+ years of proven experience in designing and developing production level software that includes distributed backend systems, AI, and web application development
- Must have the trail-blazing DNA (i.e. passion, skills, bootstrap with learning and research) and strong executive presence and communication skils
- Familiarity with the Autonomous Vehicle Development lifecycle
- Experience in HPC/AI distributed computing environments leveraging Kubernetes orchestration and SLURM schedulers + optimization
- Well-versed with orchestration and scheduling of multiple parallel experiments (AI models for training for example) in pooled GPU resources in a Kubernetes cluster for maximizing utilization, throughput, and priorities
- Extensive experience with Cloud, On-Prem, and HPC technologies
- Strong understanding of production-grade data architectures
- Possess advanced programming skills to build distributed storage and compute systems, backend services, microservices, and web technologies
- Knowledge of software in the loop and hardware in the loop testing
- Knowledge of TensorFlow, PyTorch, and other Deep Learning frameworks
- Ability to travel up to 50% on average, based on the work you do and the clients and industries/sectors you serve
- Limited immigration sponsorship may be available.