Configuring Embeddings Operations

Embeddings operations include:

Embedding new store
Embedding add document to store
Embedding add folder to store
Embedding query from store
Embedding get info from store

Configure the Embedding New Store Operation

The Embedding new store operation creates a new in-memory embedding and exports it to a physical file. The in-memory embedding in MuleSoft AI Chain persists its data through file exports upon any changes.

To configure the Embedding new store operation:

Select the operation on the Anypoint Code Builder or Studio canvas.
In the General properties section, configure the full file path for the embedding store to save in Store Name.

If the file already exists, it is overwritten. Ensure the file path is accessible.

You can also use a DataWeave expression for this field, for example:
```
mule.home ++ "/apps/" ++ app.name ++ payload.storeNamedataweave
```

This is the XML configuration for this operation:

<ms-aichain:embedding-new-store
  doc:name="Embedding new store"
  doc:id="e3b52f5f-b765-4fad-9ecc-34755f386db4"
  storeName='#[mule.home ++ "/apps/" ++ app.name ++ payload.storeName]'
/>xml

Output Configuration

This operation responds with a JSON payload that contains the status of the store.

This is an example response of the JSON payload:

{
    "status": "created"
}json

The operation also returns attributes that aren’t within the main JSON payload, which include information about token usage, for example:

{
  "storeName": "knowledge-store"
}json

Configure the Embedding Add Document to Store Operation

The Embedding add document to store operation adds a document into an embedding store and exports it to a file. The document is ingested into the embedding using in-memory embedding. The in-memory embedding in MuleSoft AI Chain persists its data through file exports upon any changes.

To configure the Embedding add document to store operation:

Select the operation on the Anypoint Code Builder or Studio canvas.
In the General properties tab for the operation, enter these values:
- Store Name
  
  Contains the full file path for the embedding store to be saved. If the file already exists, it is overwritten. Ensure the file path is accessible.
- Context Path
  
  Contains the full file path for the document to be ingested into the embedding store.
- Max Segment Size
  
  Specifies the maximum size, in tokens or characters, of each segment that the document will be split into before being ingested into the embedding store.
- Max Overlap Size
  
  Defines the number of tokens or characters that will overlap between consecutive segments when splitting the document.
In the Context section for the operation, select File Type:
- any
  
  Automatically detects the file format and processes it accordingly. This option allows flexibility when the file type is unknown or varied.
- text
  
  Text files, such as JSON, XML, TXT, and CSV.
- url
  
  A single URL pointing to web content to ingest.

This is the XML configuration for this operation:

<ms-aichain:embedding-add-document-to-store
  doc:name="Embedding add document to store"
  doc:id="e8b73dbe-c897-4f77-85d0-aaf59476c408"
  storeName='#["/Users/john.wick/Desktop/mac-demo/stores/" ++ payload.storeName]'
  contextPath="#[payload.filePath]"
  maxSegmentSizeInChars="#[payload.maxChunkSize]"
  maxOverlapSizeInChars="#[payload.maxOverlapSize]"
  fileType="#[payload.fileType]"
/>xml

Output Configuration

This operation responds with a JSON payload containing the status of the store.

This is an example response of the JSON payload:

{
    "status": "updated"
}json

In addition to the main payload, file-related metadata attributes, such as the file path, store name, and file type, are returned separately, for example:

{
  "filePath": "/Users/john.wick/Downloads/mulechain.txt",
  "storeName": "/Users/john.wick/Downloads/knowledge-store",
  "fileType": "text"
}json

Configure the Embedding Add Folder to Store Operation

The Embedding add folder to store operation adds a complete folder with subfolder files into an embedding store and exports it to a file. The documents are ingested into the embedding using in-memory embedding. The in-memory embedding in MuleSoft AI Chain persists its data through file exports upon any changes.

To configure the Embedding add folder to store operation:

Select the operation on the Anypoint Code Builder or Studio canvas.
In the General properties tab for the operation, enter these values:
- Store Name
  
  Contains the full file path for the embedding store to be saved. If the file already exists, it is overwritten. Ensure the file path is accessible.
- Context Path
  
  Contains the full folder path to use for ingesting into the embedding store.
- Max Segment Size
  
  Specifies the maximum size, in tokens or characters, of each segment that the document is split into before the embedding store ingests it.
- Max Overlap Size
  
  Defines the number of tokens or characters that will overlap between consecutive segments when splitting the document.
In the Context section for the operation, select the File Type:
- any
  
  Automatically detects the file format and processes it accordingly. This option allows flexibility when the file type is unknown or varied.
- text
  
  Text files, such as JSON, XML, TXT, and CSV.
- url
  
  A single URL pointing to web content to ingest.

This is the XML configuration for this operation:

<ms-aichain:embedding-add-folder-to-store
  doc:name="Embedding add folder to store"
  doc:id="231a2afd-8cec-4a70-96c1-3ecef19d02db"
  config-ref="MAC_AI_Llm_configuration"
  storeName='#[mule.home ++ "/apps/" ++ app.name ++ "/knowledge-center.store"]'
  folderPath="#[payload.folderPath]"
/>xml

Output Configuration

This operation returns a json payload containing the status of the store. In addition, folder-related metadata attributes, such as the folder path, file count, and store name, are provided separately from the main payload.

This is an example response of the JSON payload:

{
    "status": "updated"
}json

Along with the JSON payload, the operation returns attributes, which include information about the ingested folder, for example:

{
  "folderPath": "/Users/john.wick/Downloads/files", (1)
  "filesCount": 3, (2)
  "storeName": "/Users/john.wick/Downloads/knowledge-store" (3)
}json

1	`folderPath` Absolute path to the folder where the files are located
2	`filesCount` Total number of files in the specified folder
3	`storeName` Name or path of the knowledge store where the processed document is stored

Configure the Embedding Query From Store Operation

The Embedding query from store operation retrieves information based on a plain text prompt using semantic search from an in-memory embedding store. This operation does not involve the use of a large language model (LLM). Instead, it directly searches the embedding store for relevant text segments based on the prompt. The embedding store is loaded into memory prior to retrieval.

To configure the Embedding query from store operation:

Select the operation on the Anypoint Code Builder or Studio canvas.
In the General properties tab for the operation, enter these values:
- Store Name
  
  Contains the full file path for the embedding store to be saved. If the file already exists, it is overwritten. Ensure the file path is accessible.
- Question
  
  The plain text prompt to send to the in-memory vector store, which is converted into embedding and used for semantic search to find similar text segments.
- Max Results
  
  Specifies the maximum number of results to be returned with the query.
- Min Score
  
  Defines the minimum score to be used to identify and return results.
- Get Latest
  
  If true, the store file is loaded each time before running this operation, which might slow down performance. It is best to use this flag only when building the knowledge store. After your app is deployed, set it to false for better performance.

This is the XML configuration for this operation:

<ms-aichain:embedding-query-from-store
  doc:name="Embedding query from store"
  doc:id="1ee361ea-e62a-4e0f-9c74-0363f8721052"
  storeName="#[mule.home ++ "/apps/" ++ app.name ++ payload.storeName]"
  question="#[payload.question]"
  maxResults="#[payload.maxResults]"
  minScore="#[payload.minScore]"
  getLatest="true"
/>xml

Output Configuration

This operation returns a JSON payload that contains the main response and a list of relevant sources retrieved from the knowledge store. Each source includes details such as the file path, text segment, and similarity score.

This is an example response of the JSON payload:

{
  "response": "Networking Guide for more information on how to access an application in a specific CloudHub worker.",
  "sources": [
      {
          "absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
          "textSegment": "Networking Guide for more information on how to access an application in a specific CloudHub worker.",
          "individualScore": 0.7865373025380039,
          "file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc"
      },
      {
          "absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
          "textSegment": "= CloudHub High Availability Features",
          "individualScore": 0.7845498154294348,
          "file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc"
      },
      {
          "absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
          "textSegment": "[%header,cols=\"2*a\"]|===|VM Queues in On-Premises Applications |VM Queues in Applications deployed to CloudHub",
          "individualScore": 0.757268680397361,
          "file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc"
      }
  ]
}json

Additionally, token usage and query-related metadata are returned separately as attributes, for example:

{
  "minScore": 0.7, (1)
  "question": "Who is Amir", (2)
  "maxResults": 3, (3)
  "storeName": "/Users/john.wick/Downloads/embedding.store" (4)
}json

1	`minScore` Minimum similarity score required for a result to be included in the response
2	`question` The original query or question submitted by the user
3	`maxResults` Maximum number of results that can be returned for the query
4	`storeName` The path or name of the knowledge store used to retrieve the data

Configure the Embedding Get Info from Store Operation

The Embedding get info from store operation retrieves information from an in-memory embedding store based on a plain text prompt. This operation uses a large language model (LLM) to enhance the response by interpreting the retrieved information and generating a more comprehensive or contextually enriched answer. The embedding store is loaded into memory prior to retrieval, and the LLM processes the results to refine the final response.

To configure the Embedding get info from store operation:

Select the operation on the Anypoint Code Builder or Studio canvas.
In the General properties tab for the operation, enter these values:
- Data
  
  The plain text prompt to send to the in-memory vector store, which is converted into embedding and used for semantic search to find similar text segments.
- Store Name
  
  Contains the full file path for the embedding store to be saved. If the file already exists, it is overwritten. Ensure the file path is accessible.
- Get Latest
  
  If true, the store file is loaded each time before running this operation, which might slow down performance. It is best to use this flag only when building the knowledge store. After your app is deployed, set it to false for better performance.

This is the XML configuration for this operation:

<ms-aichain:embedding-get-info-from-store
    doc:name="Embedding get info from store"
    doc:id="913ed660-0b4a-488a-8931-26c599e859b5"
    config-ref="MuleSoft_AI_Chain_Config"
    storeName='#["/Users/john.wick/Desktop/mac-demo/stores/" ++ payload.storeName]'
    getLatest="true">
    <ms-aichain:data><![CDATA[#[payload.prompt]]]></ms-aichain:data>
</ms-aichain:embedding-get-info-from-store>xml

Output Configuration

This operation returns a JSON payload that contains the main LLM response, along with a list of relevant sources retrieved from the knowledge store. Each source includes information such as the file path, file name, and a segment of relevant text. Additionally, token usage and query-related metadata are provided separately as attributes.

This is an example response of the JSON payload:

{
  "response": "Runtime Manager is a feature within CloudHub that provides scalability, workload distribution, and added reliability to applications.",
  "sources": [
      {
          "absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
          "fileName": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc",
          "textSegment": "= CloudHub High Availability Features..."
      }
  ]
}json

Additionally, token usage and query-related metadata are returned separately as attributes, for example:

{
  "tokenUsage": { (1)
      "outputCount": 89,
      "totalCount": 702,
      "inputCount": 613
  },
  "additionalAttributes": { (2)
      "getLatest": "true",
      "question": "What is MuleChain",
      "storeName": "/Users/john.wick/Downloads/knowledge-store"
  }
}json

1	tokenUsage: Provides information on the token usage for the operation: `outputCount` Number of tokens generated in the response `totalCount` Total number of tokens processed for the entire operation, including input and output `inputCount` Number of tokens processed from the input query or document
2	`additionalAttributes` includes metadata related to the query and store: `getLatest` Indicates whether the knowledge store is reloaded for each operation (true/false). `question` The original query or question submitted by the user. `storeName` The path or name of the knowledge store used in the operation.

Configuring Embeddings Operations

Configure the Embedding New Store Operation

Output Configuration

Configure the Embedding Add Document to Store Operation

Output Configuration

Configure the Embedding Add Folder to Store Operation

Output Configuration

Configure the Embedding Query From Store Operation

Output Configuration

Configure the Embedding Get Info from Store Operation

Output Configuration

See Also