Configuring Vision Operations

Configure the [Image] Read by (Url or Base64) operation.

Configure the Image Read by (Url or Base64) Operation

The [Image] Read by (Url or Base64) operation reads and interprets an image based on a prompt.

Apply the [Image] Read by (Url or Base64) operation in various scenarios, such as for:

Image Analysis

Analyze images in business reports, presentations, or customer service scenarios.
Content Generation

Describe images for blog posts, articles, or social media.
Visual Insights

Extract insights from images in research or design projects.

To configure the [Image] Read by (Url or Base64) operation:

Select the operation on the Anypoint Code Builder or Studio canvas.
In the General properties tab for the operation, enter these values:
- Prompt
  
  Enter the prompt for the operation.
- Image
  
  Enter the URL or Base64 String of the image file that is to be read.

In the Additional Request Attributes (optional) field, you can pass any additional request attributes in the request payload to generate more relevant and precise LLM outputs.

This is the XML for this operation:

<ms-inference:read-image
  doc:id="dfbd1a61-6e98-4b5b-b77a-bfe031e70d45"
  config-ref="OpenAIConfig"
  doc:name="Read image">
    <ms-inference:prompt>
      <![CDATA[Describe what you see in this image in detail]]>
    </ms-inference:prompt>
    <ms-inference:image-url>
      <![CDATA[https://example.com/image.png]]>
    </ms-inference:image-url>
</ms-inference:read-image>

Output Configuration

This operation responds with a JSON payload containing the main LLM response. This is an example response:

{
    "payload": {
        "response": "The image depicts the Eiffel Tower in Paris during a snowy day. The tower is partially covered in snow, and the surrounding trees and ground are also blanketed in snow. There is a pathway leading towards the Eiffel Tower, with a lamppost and some fencing along the sides. The overall scene has a serene and picturesque winter atmosphere."
    }
}

The operation also returns attributes that aren’t within the main JSON payload, that include information about token usage, for example:

{
  "attributes": {
      "tokenUsage": {
          "inputCount": 267,
          "outputCount": 68,
          "totalCount": 335
      },
      "additionalAttributes": {
          "finish_reason": "stop",
          "model": "gpt-4o-mini",
          "id": "604ae573-8265-4dc0-b06e-457422f2fbd8"
      }
  }
}

tokenUsage: Token usage metadata returned as attributes
- inputCount: Number of tokens used to process the input
- outputCount: Number of tokens used to generate the output
- totalCount: Total number of tokens used for input and output
additionalAttributes: Additional metadata from the LLM provider
- finish_reason: The finish reason for the LLM response
- model: The ID of the model used
- id: The ID of the request

For Gemini Inference, additional parameters must be included within the GenerationConfig property of the request payload.

Configuring Vision Operations

Configure the Image Read by (Url or Base64) Operation

Output Configuration

See Also