Creating an LLM Proxy

You can configure the LLM Proxy to use different models and different routes.

A large Flex Gateway supports up to 50 LLM Proxies.

Before You Begin

Deploy a Flex Gateway version 1.11.4 or later where you want to deploy your LLM Proxy.

See Deploy a Managed Flex Gateway.
Ensure you have the API Manager API Creator permission.
Retrieve your API keys from your LLM Providers.

Create an LLM Proxy

From API Manager, click LLM Proxies.
Click + Add LLM Proxy.
Configure the Inbound Endpoint of the LLM Proxy:
1. Define a LLM Proxy Name.
2. Select an endpoint Format:
  - OpenAI: Select the OpenAI API format to send requests to all supported LLM Providers (including Gemini).
  - Gemini: Select the Gemini API format to send requests to only Gemini.
3. Define a Base path.
4. Select Advanced options if necessary.
5. Click Next.
Select a Flex Gateway to deploy the server instance to from Select a gateway.

Configure the routes that comprise the Outbound Endpoint:

Select your LLM Provider.
Ensure the URL for your provider is correct. Edit if necessary.
Configure access details for the provider endpoint.

Select a Target Model to override the model version specified in the payload. Selecting Not Applicable sends the request to the specified model. A Target Model is required for semantic routing.

To configure a target model for Amazon Bedrock Claude Modes, you must enter the provider and model ID formatted as [provider_prefix]/[internal_model_id].

To learn how to find the model ID, see Amazon Bedrock Model Names.

Click Add LLM Route to add additional routes. Complete the previous steps to configure the new route.

Each LLM Provider can support one route.

If adding multiple routes, select a Routing strategy. To configure your routing strategy, see:
1. Configure Model-Based Routing
2. Configure Semantic Routing .
Click Save & Deploy.

Configure Model-Based Routing

Configure multiple routes. Click Add LLM Route to create new routes.
Select Model-based for Routing strategy.
Choose to enable a Fallback route for the request to be sent to if the provider or model is incorrectly sepcified. If enabling a fallback route:
1. Select a Route to fallback to.
2. Select a target model for the fallback route to use.
If no fall back route is configured and a route fails, a error response is returned.
Return to Create an LLM Proxy step 7 to finish configuring your LLM Proxy.

Configure Semantic Routing

For semantic routing, define and apply prompt topics to each route. Define deny list topics to block certain requests.

To configure semantic routing:

Configure multiple routes. Click Add LLM Route to create new routes.
Select Semantic for Routing strategy.
If you haven’t already, click Configure Semantic Service.

To create a semantic service, see Create and Edit a Semantic Service.
Select a Target Model for each route.
Define a prompt topics for the routes:
1. Click the Select prompt topics.
2. Click + Create prompt topic.
3. Define a Prompt topic name.
4. Define a Prompt utterances or click Upload utterances to upload a plain text file containing your prompt utterances.
5. Click Create.
6. Create multiple prompt topics for each route as needed.
Configure a Fallback route for the request to be sent to if it doesn’t match a semantic route:
1. Specify an accuracy threshold. When the accuracy of the semantic match is less than this threshold, traffic is sent to the fallback route.
2. Select a Route to fallback to.
3. Select a Target model for the fallback route to use.
Create a Semantic prompt guard to block users from asking the server about specific topics:
1. Click + Create deny list.
2. Define a Prompt topic name.
3. Define a Prompt utterances or click Upload utterances to upload a plain text file containing your prompt utterances.
4. Click Create.
5. Create multiple deny list topics to better protect your LLM Proxy.
  
  Creating a semantic prompt guard automatically applies the Semantic Prompt Guard policy.
Return to Create an LLM Proxy step 7 to finish configuring your LLM Proxy.

Semantic Routing Limits

Limit	Value
Prompt topics (across all routes of an LLM Proxy)	6
Utterances per prompt topic	10
Deny list topics	6
Utterances per deny list topic	10
Maximum characters per utterance	500

Create and Edit a Semantic Service

A semantic service compares the request to the defined prompt topic utterances and sends the request to the route that best matches it. The semantic service also compares the request to deny list topic utterances to block certain requests. Only one semantic service is support for each environment.

To define a semantic service:

From API Manager, click Semantic Service Setup.
Click + Create Semantic Service.
Configure the semantic service parameters:
- Embedding Service Provider: The provider of the embedding model. OpenAI or Hugging Face.
- URL: The URL of the embedding service.
- Model: The embedding model to use.
- Auth key: The API authentication key for the embedding service.
Click Deploy.

To edit a semantic service:

From Semantic Service Setup, click the three-dots menu () of the semantic service you want to edit.
Make the necessary edits.
Click Redeploy.

Edit and Delete an LLM Proxy

To edit an LLM Proxy:

From API Manager, click LLM Proxies.
Click the name of the LLM Proxy you want to edit.
Click Configuration.
Switch between the Inbound, Gateway, and Outbound configurations to make the necessary edits.
Click Save & Deploy.

To delete an LLM Proxy:

From API Manager, click LLM Proxies.
Click the three-dots menu () of the LLM Proxy you want to delete.
Click Delete LLM Proxy.
Click Yes, Delete.