<ms-aichain:embedding-new-store
doc:name="Embedding new store"
doc:id="e3b52f5f-b765-4fad-9ecc-34755f386db4"
storeName='#[mule.home ++ "/apps/" ++ app.name ++ payload.storeName]'
/>
Configuring Embeddings Operations
Embeddings operations include:
Configure the Embedding New Store Operation
The Embedding new store operation creates a new in-memory embedding and exports it to a physical file. The in-memory embedding in MuleSoft AI Chain persists its data through file exports upon any changes.
To configure the Embedding new store operation:
-
Select the operation on the Anypoint Code Builder or Studio canvas.
-
In the General properties section, configure the full file path for the embedding store to save in Store Name.
If the file already exists, it is overwritten. Ensure the file path is accessible.
You can also use a DataWeave expression for this field, for example:
#[mule.home "/apps/" app.name ++ payload.storeName]
This is the XML configuration for this operation:
Output Configuration
This operation responds with a JSON payload that contains the status of the store.
This is an example response of the JSON payload:
{
"status": "created"
}
The operation also returns attributes that aren’t within the main JSON payload, which include information about token usage, for example:
{
"storeName": "knowledge-store"
}
Configure the Embedding Add Document to Store Operation
The Embedding add document to store operation adds a document into an embedding store and exports it to a file. The document is ingested into the embedding using in-memory embedding. The in-memory embedding in MuleSoft AI Chain persists its data through file exports upon any changes.
To configure the Embedding add document to store operation:
-
Select the operation on the Anypoint Code Builder or Studio canvas.
-
In the General properties tab for the operation, enter these values:
-
Store Name
Contains the full file path for the embedding store to be saved. If the file already exists, it is overwritten. Ensure the file path is accessible.
-
Context Path
Contains the full file path for the document to be ingested into the embedding store.
-
Max Segment Size
Specifies the maximum size, in tokens or characters, of each segment that the document will be split into before being ingested into the embedding store.
-
Max Overlap Size
Defines the number of tokens or characters that will overlap between consecutive segments when splitting the document.
-
-
In the Context section for the operation, select File Type:
-
any
Automatically detects the file format and processes it accordingly. This option allows flexibility when the file type is unknown or varied.
-
text
Text files, such as JSON, XML, TXT, and CSV.
-
url
A single URL pointing to web content to ingest.
-
This is the XML configuration for this operation:
<ms-aichain:embedding-add-document-to-store
doc:name="Embedding add document to store"
doc:id="e8b73dbe-c897-4f77-85d0-aaf59476c408"
storeName='#["/Users/john.wick/Desktop/mac-demo/stores/" ++ payload.storeName]'
contextPath="#[payload.filePath]"
maxSegmentSizeInChars="#[payload.maxChunkSize]"
maxOverlapSizeInChars="#[payload.maxOverlapSize]"
fileType="#[payload.fileType]"
/>
Output Configuration
This operation responds with a JSON payload containing the status of the store.
This is an example response of the JSON payload:
{
"status": "updated"
}
In addition to the main payload, file-related metadata attributes, such as the file path, store name, and file type, are returned separately, for example:
{
"filePath": "/Users/john.wick/Downloads/mulechain.txt",
"storeName": "/Users/john.wick/Downloads/knowledge-store",
"fileType": "text"
}
Configure the Embedding Add Folder to Store Operation
The Embedding add folder to store operation adds a complete folder with subfolder files into an embedding store and exports it to a file. The documents are ingested into the embedding using in-memory embedding. The in-memory embedding in MuleSoft AI Chain persists its data through file exports upon any changes.
To configure the Embedding add folder to store operation:
-
Select the operation on the Anypoint Code Builder or Studio canvas.
-
In the General properties tab for the operation, enter these values:
-
Store Name
Contains the full file path for the embedding store to be saved. If the file already exists, it is overwritten. Ensure the file path is accessible.
-
Context Path
Contains the full folder path to use for ingesting into the embedding store.
-
Max Segment Size
Specifies the maximum size, in tokens or characters, of each segment that the document is split into before the embedding store ingests it.
-
Max Overlap Size
Defines the number of tokens or characters that will overlap between consecutive segments when splitting the document.
-
-
In the Context section for the operation, select the File Type:
-
any
Automatically detects the file format and processes it accordingly. This option allows flexibility when the file type is unknown or varied.
-
text
Text files, such as JSON, XML, TXT, and CSV.
-
url
A single URL pointing to web content to ingest.
-
This is the XML configuration for this operation:
<ms-aichain:embedding-add-folder-to-store
doc:name="Embedding add folder to store"
doc:id="231a2afd-8cec-4a70-96c1-3ecef19d02db"
config-ref="MAC_AI_Llm_configuration"
storeName='#[mule.home ++ "/apps/" ++ app.name ++ "/knowledge-center.store"]'
folderPath="#[payload.folderPath]"
/>
Output Configuration
This operation returns a json payload containing the status of the store. In addition, folder-related metadata attributes, such as the folder path, file count, and store name, are provided separately from the main payload.
This is an example response of the JSON payload:
{
"status": "updated"
}
Along with the JSON payload, the operation returns attributes, which include information about the ingested folder, for example:
{
"folderPath": "/Users/john.wick/Downloads/files", (1)
"filesCount": 3, (2)
"storeName": "/Users/john.wick/Downloads/knowledge-store" (3)
}
1 | folderPath Absolute path to the folder where the files are located |
2 | filesCount Total number of files in the specified folder |
3 | storeName Name or path of the knowledge store where the processed document is stored |
Configure the Embedding Query From Store Operation
The Embedding query from store operation retrieves information based on a plain text prompt using semantic search from an in-memory embedding store. This operation does not involve the use of a large language model (LLM). Instead, it directly searches the embedding store for relevant text segments based on the prompt. The embedding store is loaded into memory prior to retrieval.
To configure the Embedding query from store operation:
-
Select the operation on the Anypoint Code Builder or Studio canvas.
-
In the General properties tab for the operation, enter these values:
-
Store Name
Contains the full file path for the embedding store to be saved. If the file already exists, it is overwritten. Ensure the file path is accessible.
-
Question
The plain text prompt to send to the in-memory vector store, which is converted into embedding and used for semantic search to find similar text segments.
-
Max Results
Specifies the maximum number of results to be returned with the query.
-
Min Score
Defines the minimum score to be used to identify and return results.
-
Get Latest
If true, the store file is loaded each time before running this operation, which might slow down performance. It is best to use this flag only when building the knowledge store. After your app is deployed, set it to false for better performance.
-
This is the XML configuration for this operation:
<ms-aichain:embedding-query-from-store
doc:name="Embedding query from store"
doc:id="1ee361ea-e62a-4e0f-9c74-0363f8721052"
storeName="#[mule.home ++ "/apps/" ++ app.name ++ payload.storeName]"
question="#[payload.question]"
maxResults="#[payload.maxResults]"
minScore="#[payload.minScore]"
getLatest="true"
/>
Output Configuration
This operation returns a JSON payload that contains the main response and a list of relevant sources retrieved from the knowledge store. Each source includes details such as the file path, text segment, and similarity score.
This is an example response of the JSON payload:
{
"response": "Networking Guide for more information on how to access an application in a specific CloudHub worker.",
"sources": [
{
"absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
"textSegment": "Networking Guide for more information on how to access an application in a specific CloudHub worker.",
"individualScore": 0.7865373025380039,
"file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc"
},
{
"absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
"textSegment": "= CloudHub High Availability Features",
"individualScore": 0.7845498154294348,
"file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc"
},
{
"absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
"textSegment": "[%header,cols=\"2*a\"]|===|VM Queues in On-Premises Applications |VM Queues in Applications deployed to CloudHub",
"individualScore": 0.757268680397361,
"file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc"
}
]
}
Additionally, token usage and query-related metadata are returned separately as attributes, for example:
{
"minScore": 0.7, (1)
"question": "Who is Amir", (2)
"maxResults": 3, (3)
"storeName": "/Users/john.wick/Downloads/embedding.store" (4)
}
1 | minScore
Minimum similarity score required for a result to be included in the response |
2 | question
The original query or question submitted by the user |
3 | maxResults
Maximum number of results that can be returned for the query |
4 | storeName
The path or name of the knowledge store used to retrieve the data |
Configure the Embedding Get Info from Store Operation
The Embedding get info from store operation retrieves information from an in-memory embedding store based on a plain text prompt. This operation uses a large language model (LLM) to enhance the response by interpreting the retrieved information and generating a more comprehensive or contextually enriched answer. The embedding store is loaded into memory prior to retrieval, and the LLM processes the results to refine the final response.
To configure the Embedding get info from store operation:
-
Select the operation on the Anypoint Code Builder or Studio canvas.
-
In the General properties tab for the operation, enter these values:
-
Data
The plain text prompt to send to the in-memory vector store, which is converted into embedding and used for semantic search to find similar text segments.
-
Store Name
Contains the full file path for the embedding store to be saved. If the file already exists, it is overwritten. Ensure the file path is accessible.
-
Get Latest
If true, the store file is loaded each time before running this operation, which might slow down performance. It is best to use this flag only when building the knowledge store. After your app is deployed, set it to false for better performance.
-
This is the XML configuration for this operation:
<ms-aichain:embedding-get-info-from-store
doc:name="Embedding get info from store"
doc:id="913ed660-0b4a-488a-8931-26c599e859b5"
config-ref="MuleSoft_AI_Chain_Config"
storeName='#["/Users/john.wick/Desktop/mac-demo/stores/" ++ payload.storeName]'
getLatest="true">
<ms-aichain:data><![CDATA[#[payload.prompt]]]></ms-aichain:data>
</ms-aichain:embedding-get-info-from-store>
Output Configuration
This operation returns a JSON payload that contains the main LLM response, along with a list of relevant sources retrieved from the knowledge store. Each source includes information such as the file path, file name, and a segment of relevant text. Additionally, token usage and query-related metadata are provided separately as attributes.
This is an example response of the JSON payload:
{
"response": "Runtime Manager is a feature within CloudHub that provides scalability, workload distribution, and added reliability to applications.",
"sources": [
{
"absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
"fileName": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc",
"textSegment": "= CloudHub High Availability Features..."
}
]
}
Additionally, token usage and query-related metadata are returned separately as attributes, for example:
{
"tokenUsage": { (1)
"outputCount": 89,
"totalCount": 702,
"inputCount": 613
},
"additionalAttributes": { (2)
"getLatest": "true",
"question": "What is MuleChain",
"storeName": "/Users/john.wick/Downloads/knowledge-store"
}
}
1 | tokenUsage: Provides information on the token usage for the operation:
|
2 | additionalAttributes includes metadata related to the query and store:
|