Contact Us 1-800-596-4880

Configuring Moderation Operations

Configure the [Toxicity] Detection by Text operation.

Configure the Toxicity Detection by Text Operation

The [Toxicity] Detection by Text operation classifies and scores any harmful content by the user or the LLM.

Apply the [Toxicity] Detection by Text operation in various scenarios, such as for:

  • Toxic Inputs Detection

    Detect and block toxic input by the user to prevent sending it to the LLM.

  • Harmful Responses Detection

    Filter out LLM responses that could be considered toxic or offensive by users.

To configure the [Toxicity] Detection by Text operation:

  1. Select the operation on the Anypoint Code Builder or Studio canvas.

  2. In the General properties tab for the operation, enter these values:

    • Text

      Text to check for harmful content.

  1. In the Additional Request Attributes (optional) field, you can pass any additional request attributes in the request payload to generate more relevant and precise LLM outputs.

This is the XML for this operation:

<ms-inference:toxicity-detection-text
  doc:name="Toxicity detection text"
  doc:id="b5770a5b-d3f9-47ba-acec-ab0bd41e4188"
  config-ref="OpenAIConfig">
    <ms-inference:text>
      <![CDATA[You are fat]]>
    </ms-inference:text>
</ms-inference:toxicity-detection-text>

Output Configuration

This operation responds with a JSON payload containing the toxicity detection and rating. This is an example response:

{
  "payload": {
    "flagged": true,
    "categories": [
      {
        "illicit/violent": 0.0000025466403947055455,
        "self-harm/instructions": 0.00023480495744356635,
        "harassment": 0.9798945372458964,
        "violence/graphic": 0.000005920916517463734,
        "illicit": 0.000013552078562406772,
        "self-harm/intent": 0.0002233150331012493,
        "hate/threatening": 0.0000012029639084557005,
        "sexual/minors": 0.0000024300240743279605,
        "harassment/threatening": 0.0007499928075102617,
        "hate": 0.00720390551996062,
        "self-harm": 0.0004822186797755494,
        "sexual": 0.00012644219446392274,
        "violence": 0.0004960569708019355
      }
    ]
  }
}
View on GitHub