Large-Scale AI Engineering

Contact

In case of any questions not covered by the information on these webpages or on Moodle (for enrolled participants), please reach out to the teaching assistants (TAs), preferably first through the Moodle course Q&A forum and second via email. They will involve the lecturers when necessary.

Contact details (LSAIE Fall 2025):

head TA Junling Wang ()
TA Philipp ()

Abstract

This course focuses on the engineering principles and practices required to develop and optimize large-scale AI systems. Students will gain hands-on experience with high-performance computing (HPC) infrastructures, emphasizing the deployment and scaling of AI models on advanced GPU clusters.

Learning Objectives

By the end of this course, students will be able to:

Understand the architecture and components of large-scale AI systems.
Apply HPC techniques to enhance the performance of AI model training and inference.
Implement optimizations, such as model parallelization, in AI workflows.
Collaborate effectively in teams to improve AI system throughput and scalability.

“This course is great because it shows the industry-side of things and makes topics like dragonfly network super interesting.”
Master's Student in Computer Science

“I really enjoyed the course and I had a chance to use it in my research!”
PhD student in Computer Science at ETH Zurich

Course Catalog

Learn more in the ETH Zurich course catalog entry.