Maker
Type
Tags
Event Trigger
Action
Link
nodeType
multiModalLLMNode
Status
Available
Batch Trigger
Jump to:
About
The Multimodal Node returns textual output from selected large language models (LLMs). It supports both text and image inputs. This node is particularly useful for applications that require a seamless integration of textual and visual data processing, such as image captioning where the text is generated based on the content of an image.
What can I build?
- Develop applications that seamlessly integrate text and image data processing for tasks like image captioning.
- Create tools for automatic generation of descriptive content for visual data, enhancing accessibility.
- Build interactive applications where user actions on images are analyzed to generate contextual feedback.
- Design systems that maintain consistent tone and style across generated content based on previous interactions.
Available Functionality
Action
✅ Generates text output programmatically by submitting a prompt that includes multimodal content to selected LLMs.
Setup Steps
- Drag / Select the Node as the Trigger node.
- Fill in the required parameters.
- Build the desired flow
- Deploy the Project
- Click
Setup
on the workflow editor to get the automatically generated instruction and add it in your application.
Configuration
‣
Action
Troubleshooting Common Issues
‣
‣
Built with this
Google Drive Sync
Google Drive Sync
Slack Ask Bot
Slack Ask Bot