Large-Scale AI Engineering
Abstract
This course focuses on the engineering principles and practices required to develop and optimize large-scale AI systems. Students will gain hands-on experience with high-performance computing (HPC) infrastructures, emphasizing the deployment and scaling of AI models on advanced GPU clusters.
Learning Objectives
By the end of this course, students will be able to:
- Understand the architecture and components of large-scale AI systems.
- Apply HPC techniques to enhance the performance of AI model training and inference.
- Implement optimizations, such as model parallelization, in AI workflows.
- Collaborate effectively in teams to improve AI system throughput and scalability.
Course Catalog
Learn more in the ETH Zurich course catalog entry.