Contact Us 1-800-596-4880

Configuring Embeddings Operations

Configure the Embedding New Store Operation

The Embedding new store operation creates a new in-memory embedding and exports it to a physical file. The in-memory embedding in MuleSoft AI Chain persists its data through file exports upon any changes.

To configure the Embedding new store operation:

  1. Select the operation on the Anypoint Code Builder or Studio canvas.

  2. In the General properties section, configure the full file path for the embedding store to save in Store Name.

    If the file already exists, it is overwritten. Ensure the file path is accessible.

    You can also use a DataWeave expression for this field, for example:

    mule.home ++ "/apps/" ++ app.name ++ payload.storeName

This is the XML configuration for this operation:

<ms-aichain:embedding-new-store
  doc:name="Embedding new store"
  doc:id="e3b52f5f-b765-4fad-9ecc-34755f386db4"
  storeName='#[mule.home ++ "/apps/" ++ app.name ++ payload.storeName]'
/>

Output Configuration

This operation responds with a JSON payload that contains the status of the store.

This is an example response of the JSON payload:

{
    "status": "created"
}

The operation also returns attributes that aren’t within the main JSON payload, which include information about token usage, for example:

{
  "storeName": "knowledge-store"
}

Configure the Embedding Add Document to Store Operation

The Embedding add document to store operation adds a document into an embedding store and exports it to a file. The document is ingested into the embedding using in-memory embedding. The in-memory embedding in MuleSoft AI Chain persists its data through file exports upon any changes.

To configure the Embedding add document to store operation:

  1. Select the operation on the Anypoint Code Builder or Studio canvas.

  2. In the General properties tab for the operation, enter these values:

    • Store Name

      Contains the full file path for the embedding store to be saved. If the file already exists, it is overwritten. Ensure the file path is accessible.

    • Context Path

      Contains the full file path for the document to be ingested into the embedding store.

    • Max Segment Size

      Specifies the maximum size, in tokens or characters, of each segment that the document will be split into before being ingested into the embedding store.

    • Max Overlap Size

      Defines the number of tokens or characters that will overlap between consecutive segments when splitting the document.

  3. In the Context section for the operation, select File Type:

    • any

      Automatically detects the file format and processes it accordingly. This option allows flexibility when the file type is unknown or varied.

    • text

      Text files, such as JSON, XML, TXT, and CSV.

    • url

      A single URL pointing to web content to ingest.

This is the XML configuration for this operation:

<ms-aichain:embedding-add-document-to-store
  doc:name="Embedding add document to store"
  doc:id="e8b73dbe-c897-4f77-85d0-aaf59476c408"
  storeName='#["/Users/john.wick/Desktop/mac-demo/stores/" ++ payload.storeName]'
  contextPath="#[payload.filePath]"
  maxSegmentSizeInChars="#[payload.maxChunkSize]"
  maxOverlapSizeInChars="#[payload.maxOverlapSize]"
  fileType="#[payload.fileType]"
/>

Output Configuration

This operation responds with a JSON payload containing the status of the store.

This is an example response of the JSON payload:

{
    "status": "updated"
}

In addition to the main payload, file-related metadata attributes, such as the file path, store name, and file type, are returned separately, for example:

{
  "filePath": "/Users/john.wick/Downloads/mulechain.txt",
  "storeName": "/Users/john.wick/Downloads/knowledge-store",
  "fileType": "text"
}

Configure the Embedding Add Folder to Store Operation

The Embedding add folder to store operation adds a complete folder with subfolder files into an embedding store and exports it to a file. The documents are ingested into the embedding using in-memory embedding. The in-memory embedding in MuleSoft AI Chain persists its data through file exports upon any changes.

To configure the Embedding add folder to store operation:

  1. Select the operation on the Anypoint Code Builder or Studio canvas.

  2. In the General properties tab for the operation, enter these values:

    • Store Name

      Contains the full file path for the embedding store to be saved. If the file already exists, it is overwritten. Ensure the file path is accessible.

    • Context Path

      Contains the full folder path to use for ingesting into the embedding store.

    • Max Segment Size

      Specifies the maximum size, in tokens or characters, of each segment that the document is split into before the embedding store ingests it.

    • Max Overlap Size

      Defines the number of tokens or characters that will overlap between consecutive segments when splitting the document.

  3. In the Context section for the operation, select the File Type:

    • any

      Automatically detects the file format and processes it accordingly. This option allows flexibility when the file type is unknown or varied.

    • text

      Text files, such as JSON, XML, TXT, and CSV.

    • url

      A single URL pointing to web content to ingest.

This is the XML configuration for this operation:

<ms-aichain:embedding-add-folder-to-store
  doc:name="Embedding add folder to store"
  doc:id="231a2afd-8cec-4a70-96c1-3ecef19d02db"
  config-ref="MAC_AI_Llm_configuration"
  storeName='#[mule.home ++ "/apps/" ++ app.name ++ "/knowledge-center.store"]'
  folderPath="#[payload.folderPath]"
/>

Output Configuration

This operation returns a json payload containing the status of the store. In addition, folder-related metadata attributes, such as the folder path, file count, and store name, are provided separately from the main payload.

This is an example response of the JSON payload:

{
    "status": "updated"
}

Along with the JSON payload, the operation returns attributes, which include information about the ingested folder, for example:

{
  "folderPath": "/Users/john.wick/Downloads/files", (1)
  "filesCount": 3, (2)
  "storeName": "/Users/john.wick/Downloads/knowledge-store" (3)
}
1 folderPath Absolute path to the folder where the files are located
2 filesCount Total number of files in the specified folder
3 storeName Name or path of the knowledge store where the processed document is stored

Configure the Embedding Query From Store Operation

The Embedding query from store operation retrieves information based on a plain text prompt using semantic search from an in-memory embedding store. This operation does not involve the use of a large language model (LLM). Instead, it directly searches the embedding store for relevant text segments based on the prompt. The embedding store is loaded into memory prior to retrieval.

To configure the Embedding query from store operation:

  1. Select the operation on the Anypoint Code Builder or Studio canvas.

  2. In the General properties tab for the operation, enter these values:

    • Store Name

      Contains the full file path for the embedding store to be saved. If the file already exists, it is overwritten. Ensure the file path is accessible.

    • Question

      The plain text prompt to send to the in-memory vector store, which is converted into embedding and used for semantic search to find similar text segments.

    • Max Results

      Specifies the maximum number of results to be returned with the query.

    • Min Score

      Defines the minimum score to be used to identify and return results.

    • Get Latest

      If true, the store file is loaded each time before running this operation, which might slow down performance. It is best to use this flag only when building the knowledge store. After your app is deployed, set it to false for better performance.

This is the XML configuration for this operation:

<ms-aichain:embedding-query-from-store
  doc:name="Embedding query from store"
  doc:id="1ee361ea-e62a-4e0f-9c74-0363f8721052"
  storeName="#[mule.home ++ "/apps/" ++ app.name ++ payload.storeName]"
  question="#[payload.question]"
  maxResults="#[payload.maxResults]"
  minScore="#[payload.minScore]"
  getLatest="true"
/>

Output Configuration

This operation returns a JSON payload that contains the main response and a list of relevant sources retrieved from the knowledge store. Each source includes details such as the file path, text segment, and similarity score.

This is an example response of the JSON payload:

{
  "response": "Networking Guide for more information on how to access an application in a specific CloudHub worker.",
  "sources": [
      {
          "absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
          "textSegment": "Networking Guide for more information on how to access an application in a specific CloudHub worker.",
          "individualScore": 0.7865373025380039,
          "file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc"
      },
      {
          "absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
          "textSegment": "= CloudHub High Availability Features",
          "individualScore": 0.7845498154294348,
          "file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc"
      },
      {
          "absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
          "textSegment": "[%header,cols=\"2*a\"]|===|VM Queues in On-Premises Applications |VM Queues in Applications deployed to CloudHub",
          "individualScore": 0.757268680397361,
          "file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc"
      }
  ]
}

Additionally, token usage and query-related metadata are returned separately as attributes, for example:

{
  "minScore": 0.7, (1)
  "question": "Who is Amir", (2)
  "maxResults": 3, (3)
  "storeName": "/Users/john.wick/Downloads/embedding.store" (4)
}
1 minScore

Minimum similarity score required for a result to be included in the response

2 question

The original query or question submitted by the user

3 maxResults

Maximum number of results that can be returned for the query

4 storeName

The path or name of the knowledge store used to retrieve the data

Configure the Embedding Get Info from Store Operation

The Embedding get info from store operation retrieves information from an in-memory embedding store based on a plain text prompt. This operation uses a large language model (LLM) to enhance the response by interpreting the retrieved information and generating a more comprehensive or contextually enriched answer. The embedding store is loaded into memory prior to retrieval, and the LLM processes the results to refine the final response.

To configure the Embedding get info from store operation:

  1. Select the operation on the Anypoint Code Builder or Studio canvas.

  2. In the General properties tab for the operation, enter these values:

    • Data

      The plain text prompt to send to the in-memory vector store, which is converted into embedding and used for semantic search to find similar text segments.

    • Store Name

      Contains the full file path for the embedding store to be saved. If the file already exists, it is overwritten. Ensure the file path is accessible.

    • Get Latest

      If true, the store file is loaded each time before running this operation, which might slow down performance. It is best to use this flag only when building the knowledge store. After your app is deployed, set it to false for better performance.

This is the XML configuration for this operation:

<ms-aichain:embedding-get-info-from-store
    doc:name="Embedding get info from store"
    doc:id="913ed660-0b4a-488a-8931-26c599e859b5"
    config-ref="MuleSoft_AI_Chain_Config"
    storeName='#["/Users/john.wick/Desktop/mac-demo/stores/" ++ payload.storeName]'
    getLatest="true">
    <ms-aichain:data><![CDATA[#[payload.prompt]]]></ms-aichain:data>
</ms-aichain:embedding-get-info-from-store>

Output Configuration

This operation returns a JSON payload that contains the main LLM response, along with a list of relevant sources retrieved from the knowledge store. Each source includes information such as the file path, file name, and a segment of relevant text. Additionally, token usage and query-related metadata are provided separately as attributes.

This is an example response of the JSON payload:

{
  "response": "Runtime Manager is a feature within CloudHub that provides scalability, workload distribution, and added reliability to applications.",
  "sources": [
      {
          "absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
          "fileName": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc",
          "textSegment": "= CloudHub High Availability Features..."
      }
  ]
}

Additionally, token usage and query-related metadata are returned separately as attributes, for example:

{
  "tokenUsage": { (1)
      "outputCount": 89,
      "totalCount": 702,
      "inputCount": 613
  },
  "additionalAttributes": { (2)
      "getLatest": "true",
      "question": "What is MuleChain",
      "storeName": "/Users/john.wick/Downloads/knowledge-store"
  }
}
1 tokenUsage: Provides information on the token usage for the operation:
  • outputCount

    Number of tokens generated in the response totalCount

    Total number of tokens processed for the entire operation, including input and output

  • inputCount

    Number of tokens processed from the input query or document

2 additionalAttributes includes metadata related to the query and store:
  • getLatest

    Indicates whether the knowledge store is reloaded for each operation (true/false).

  • question

    The original query or question submitted by the user.

  • storeName

    The path or name of the knowledge store used in the operation.

View on GitHub