Daniel Felix Ritchie School of Engineering and Computer ScienceDaniels College of Business

Multimodal AI Essentials: Merging Text, Image, and Audio for Next-Generation AI Applications

This course shows you how combining modalities like text, audio, video, and images can enable AI systems to achieve remarkable capabilities. Gain hands-on experience building visual question-and-answer models, generating personalized images with diffusion, designing end to end multimodal applications, and even fine-tuning multimodal models for specific tasks. This course gives you the tools, knowledge, and confidence to design and deploy your own state-of-the-art multimodal AI systems.

Learn More