Autonomous Vehicle Infrastructure Systems Lead, Manager - Managed AI

AI & Analytics | Strategy & Analytics
Same job available in 26 locations

Atlanta, Georgia, United States

Austin, Texas, United States

Boston, Massachusetts, United States

Charlotte, North Carolina, United States

Chicago, Illinois, United States

Cincinnati, Ohio, United States

Cleveland, Ohio, United States

Costa Mesa, California, United States

Dallas, Texas, United States

Detroit, Michigan, United States

Houston, Texas, United States

Kansas City, Missouri, United States

Los Angeles, California, United States

McLean, Virginia, United States

Miami, Florida, United States

Minneapolis, Minnesota, United States

Nashville, Tennessee, United States

Parsippany, New Jersey, United States

Philadelphia, Pennsylvania, United States

Pittsburgh, Pennsylvania, United States

Sacramento, California, United States

San Diego, California, United States

San Jose, California, United States

Seattle, Washington, United States

St. Louis, Missouri, United States

Tampa, Florida, United States

Position Summary

Autonomous Vehicle Infrastructure Systems Lead, Manager - Managed AI

The Team
The Deloitte Connected and Autonomous Vehicle (CAV) team is catalyzing and shaping the Autonomous Vehicle (AV) market through a suite of turnkey, as-a-service solutions that deliver improved performance and lower total cost of ownership. These solutions will empower Automotive customers to realize their autonomy ambitions as efficiently as possible.

High Level Role
We are looking for a seasoned, “hands-on” HPC/AI infrastructure systems leader who will drive the scope, detailed design, and deployment of AV infrastructure across on-prem, cloud, and hybrid environments. The key success measure of this prototype will be the delivery of Deloitte’s offering in POD configurations as a service for our customers with guaranteed SLAs and TCO targets.

Specifics:
  • Establish the detailed specification of the DGX A100 that reflects a representative customer’s planning, deployment, and on-going operations optimization requirements on TCO, throughput, scalability, and flexibility with their varied workloads
  • Set up the DGX/Super POD reference environment including DGX A100 compute nodes, fabrics (storage/compute), management networks & software (DeepOps), key system software for optimizing GPU communications I/O and application performance, and user run-time tools for SLURM and Kubernetes containers
  • Design and document the most efficient setup to meet success metrics (TCO, performance, scale). Specific areas of focus:
    • Network switch & fabric considerations for non-blocking, scalable bandwidth needs for best performance with varying dataset sizes & locations
    • Storage and caching hierarchy implementations based on training vs inferencing workloads. Establish storage management guidelines for RAM/NVMe (internal storage) and external high speed storage (DDN, Netapp, etc.) allocation to optimize performance and cost of running varying data-sets and workloads. Establish rules for when to trigger GPU Direct Storage (GDS) feature for lower latency and faster I/O workloads.
    • Management Servers - infrastructure design & setup for enabling– user logins, provisioning (OS images & other internal infrastructure services for the pod), Work-load management (resource management and scheduling/orchestration), container mgmt., system monitors/logs
    • Operations/run-time optimization of A100 compute resources (MIG partitions) for varying workloads to maximize the utilization and throughput of jobs being scheduled in a given node cluster
  • Validate the commercial model with the MVP operational run/playbook
Minimum Qualifications: 
  • Bachelor's Degree equivalent experience in Computer Architecture, Computer Science, Electrical Engineering or related field. Advanced degree preferred
  • 6+ years of proven experience in design, deployment, and operations of HPC production grade environments leveraging both SLURM and Kubernetes clusters
  • Deep understanding of scale out compute, networking, and external storage architectures for optimizing performance and acceleration of AI/HPC workloads
  • Proven experience deploying, upgrading, migrating, and driving user adoption of sophisticated enterprise scale systems.
  • Prior software, solutions development background and proven ability to demonstrate complex new technologies
  • Programming skills to build distributed storage and compute systems, backend services, microservices, and web technologies
  • Well versed in agile methodology
  • Comfortable with a customer focused, high paced environment
  • Ability to travel up to 50% on average, based on the work you do and the clients and industries/sectors you serve
  • Limited immigration sponsorship may be available

AI&DE23

Our people and culture

Our diverse, equitable, and inclusive culture empowers our people to be who they are, contribute their unique perspectives, and make a difference individually and collectively. It enables us to leverage different ideas and perspectives, and bring more creativity and innovation to help solve our client most complex challenges. This makes Deloitte one of the most rewarding places to work. Learn more about our inclusive culture.

Professional development

From entry-level employees to senior leaders, we believe there’s always room to learn. We offer opportunities to build new skills, take on leadership opportunities and connect and grow through mentorship. From on-the-job learning experiences to formal development programs, our professionals have a variety of opportunities to continue to grow throughout their career.


As used in this posting, "Deloitte" means Deloitte Consulting LLP, a subsidiary of Deloitte LLP. Please see www.deloitte.com/us/about for a detailed description of the legal structure of Deloitte LLP and its subsidiaries.

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law.

Requisition code: 103234

SCAM ALERT

Caution against fraudulent job offers!

We have been informed of instances where jobseekers are led to believe of fictitious job opportunities with Deloitte US (“Deloitte”). In one or more such cases, false promises of actual or potential selection, or initiation or completion of the recruitment formalities appear to have been or are being made. Some jobseekers appear to have been asked to pay money to specified bank accounts of individuals or entities as a condition of their selection for a ‘job’ with Deloitte. These individuals or entities are in no way connected with Deloitte and do not represent or otherwise act on behalf of Deloitte.

We would like to clarify that:

  • At Deloitte, ethics and integrity are fundamental and not negotiable.
  • We are against corruption and neither offer bribes nor accept them, nor induce or permit any other party to make or receive bribes on our behalf.
  • We have not authorized any party or person to collect any money from jobseekers in any form whatsoever for promises of getting jobs in Deloitte.
  • We consider candidates on merit and that we provide an equal opportunity to eligible applicants.
  • No one other than designated Deloitte personnel (e.g., a Deloitte recruiter or Deloitte hiring partner) is permitted to extend any job offer from Deloitte.

Anyone who at any time has made or makes any payment to any party in exchange for promises of job or selection for a job with Deloitte or any matter related to this (including those for ‘registration’, ‘verification’ or ‘security deposit’) or otherwise engages with any such person who has made or makes fraudulent promises or offers, does so (or has done so) entirely at their own risk. Deloitte takes no responsibility or liability for any such unauthorized or fraudulent actions or engagements. We encourage jobseekers to exercise caution.