This course shows you how combining modalities like text, audio, video, and images can enable AI systems to achieve remarkable capabilities. Gain hands-on experience building visual question-and-answer models, generating personalized images with diffusion, designing end to end multimodal applications, and even fine-tuning multimodal models for specific tasks. This course gives you the tools, knowledge, and confidence to design and deploy your own state-of-the-art multimodal AI systems.
Learn More