Contact Us 1-800-596-4880

LLM Proxy Overview

LLM Proxy provides a unified access layer for multiple Large Language Model (LLM) providers. LLM Proxies are deployed to Flex Gateway to enable governance, intelligent routing, and cost management for AI applications.

LLM Proxy is supported on Managed Flex Gateway and Self-Managed Flex Gateway running in Connected Mode.

By creating a proxy, the user defines a singular LLM service that can receive requests for multiple providers. This simplifies the developer experience. You can add new models to the service seamlessly without changing the endpoint.

An LLM Proxy with a single endpoint routing requests to multiple LLM providers

Depending on configuration, the proxy then sends the request to the model defined by the user or dynamically sends the request to the provider that best matches the request:

  • Model-Based Routing: Static routing. The user specifies what model the LLM Proxy should send the request to.

  • Semantic Routing: Dynamic routing. The LLM Proxy chooses which model to send the request to based on the request content.

Supported LLM Providers

LLM Proxy supports these LLM Providers and API endpoints:

LLM Provider /chat/completions /responses

OpenAI

Y

Y

Gemini

Y

Y

Azure (OpenAI)

Y

Y

Bedrock (Anthropic Claude Models)

Y

Y

Model-Based Routing

Model-based routing is static routing. In the request, the user specifies what model the LLM Proxy should send the request to. By specifying a target model, LLM Proxy can override the model version provided by the user.

Semantic Routing

Semantic routing is dynamic routing where the LLM Proxy chooses which model to send the request to. For Sematic Routing, the user creates prompt topics for each route. When a request is sent to the LLM Proxy, a semantic service compares the request to the define topic utterances and sends the request to the route that best matches it.

LLM Proxy Limits

Up to 50 LLM Proxies are supported per Large Flex Gateway.

Semantic Routing Limits

Limit Value

Prompt topics (across all routes of an LLM Proxy)

6

Utterances per prompt topic

10

Deny list topics

6

Utterances per deny list topic

10

Maximum characters per utterance

500