NVIDIA Deep Learning Institute: Building AI Agents with Multimodal Models
Learn how to build neural network agents that reason across multiple data types using advanced fusion techniques, OCR, and NVIDIA AI Blueprints for real-world applications like robotics and healthcare.
In this course, you will learn about:
- Different data types and how to make them neural network ready
- Model fusion, and the differences between early, late, and intermediate fusion
- PDF extraction using OCR
- The difference between modality and agent orchestration
- Customization of NVIDIA AI Blueprints with Video Search and Summarization (VSS)