A FastAPI application that provides comprehensive content validation and transformation endpoints using various guardrail technologies including Presidio, Guardrails AI, and local evaluation models.
The application follows a modular architecture with separate modules for different functionalities:
main.py
: FastAPI application with route definitionsguardrail/
: Directory containing all guardrail implementationspii_redaction_presidio.py
: PII detection and redaction using Presidiopii_detection_guardrails_ai.py
: PII detection using Guardrails AInsfw_filtering_local_eval.py
: NSFW content filtering using local Unitary toxic classification modeldrug_mention_guardrails_ai.py
: Drug mention detection using Guardrails AIweb_sanitization_guardrails_ai.py
: Web content sanitization using Guardrails AI
entities.py
: Pydantic models for request/response validation
The Guardrail Server currently exposes two main endpoints for validation:
- POST
/pii-redaction
- Validates and optionally transforms incoming OpenAI chat completion requests before they are processed. Uses Presidio to detect and redact Personally Identifiable Information (PII) from messages.
null
- Guardrails passed, no transformation needed for input.ChatCompletionCreateParams
- Content was transformed, returns the modified request with PII redacted.HTTP 400/500
- Guardrails failed with error details for input.
- POST
/nsfw-filtering
- Validates and optionally transforms outgoing OpenAI chat completion responses to filter out NSFW content. Uses the Unitary toxic classification model to detect toxic, sexually explicit, and obscene content.
null
- Guardrails passed, no transformation needed for output.HTTP 400/500
- Guardrails failed with error details for output.
docker build --build-arg GUARDRAILS_TOKEN="<GUARDRAILS_AI_TOKEN>" -t custom-guardrails-template:latest .
Note: The requestBody
is accessible within the endpoint and can be used if needed for custom processing.
Attributes:
requestBody
: (CompletionCreateParams) The input payload sent to the guardrail server.config
: (dict) Configuration options for the guardrail server.context
: (RequestContext) Contextual information such as user and metadata.
Attributes:
requestBody
: (CompletionCreateParams) The input payload originally sent to the model.responseBody
: (ChatCompletion) The model's output to be checked by the guardrail server.config
: (dict) Configuration options for the guardrail server.context
: (RequestContext) Contextual information such as user and metadata.
Attributes:
user
: (Subject) Information about the user, team, or virtual account making the request.metadata
: (dict[str, str]) Additional metadata relevant to the request.
The config
field is a dictionary used to store arbitrary request configuration. These are the options which are set when you create a custom guardrail integration. These are passed to the guardrail server as is, so you can use them in your guardrail logic.
For more information about the config options, refer to the Truefoundry documentation.
- Install dependencies:
pip install -r requirements.txt
python main.py
Or using uvicorn directly:
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
The server will start on http://localhost:8000
To deploy this guardrail server to Truefoundry, please refer to the official documentation: Getting Started with Deployment.
You can fork this repository and deploy it directly from your GitHub account using the Truefoundry platform. The documentation provides detailed instructions on connecting your GitHub repo and configuring the deployment.
For the latest and most accurate deployment steps, always consult the Truefoundry docs linked above.
Health check endpoint that returns server status.
PII redaction endpoint for validating and potentially transforming incoming OpenAI chat completion requests.
NSFW filtering endpoint for validating and potentially transforming outgoing OpenAI chat completion responses to filter inappropriate content.
Request Body:
{
"requestBody": {
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"model": "gpt-3.5-turbo",
"temperature": 0.7
},
"config": {
"check_content": true,
"transform_input": false
},
"context": {
"user": {
"subjectId": "123",
"subjectType": "user",
"subjectSlug": "john_doe@truefoundry.com",
"subjectDisplayName": "John Doe"
},
"metadata": {
"ip_address": "192.168.1.1",
"session_id": "abc123"
}
}
}
curl -X POST "http://localhost:8000/pii-redaction" \
-H "Content-Type: application/json" \
-d '{
"requestBody": {
"messages": [
{"role": "user", "content": "Hello world"}
],
"model": "gpt-3.5-turbo"
},
"config": {"check_content": true},
"context": {
"user": {
"subjectId": "123",
"subjectType": "user",
"subjectSlug": "john_doe@truefoundry.com",
"subjectDisplayName": "John Doe"
},
"metadata": {
"ip_address": "192.168.1.1",
"session_id": "abc123"
}
}
}'
curl -X POST "http://localhost:8000/pii-redaction" \
-H "Content-Type: application/json" \
-d '{
"requestBody": {
"messages": [
{"role": "user", "content": "Hello John, How are you?"}
],
"model": "gpt-3.5-turbo"
},
"config": {"transform_input": true},
"context": {"user": {"subjectId": "123", "subjectType": "user", "subjectSlug": "john_doe@truefoundry.com", "subjectDisplayName": "John Doe"}}
}'
curl -X POST "http://localhost:8000/nsfw-filtering" \
-H "Content-Type: application/json" \
-d '{
"requestBody": {
"messages": [
{
"role": "user",
"content": "Hello"
}
],
"model": "gpt-3.5-turbo"
},
"responseBody": {
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-3.5-turbo",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hi, how are you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1,
"completion_tokens": 10,
"total_tokens": 11
}
},
"config": {
"transform_output": true
},
"context": {
"user": {
"subjectId": "123",
"subjectType": "user",
"subjectSlug": "john_doe@truefoundry.com",
"subjectDisplayName": "John Doe"
},
"metadata": {
"environment": "production"
}
}
}'
curl -X POST "http://localhost:8000/nsfw-filtering" \
-H "Content-Type: application/json" \
-d '{
"requestBody": {
"messages": [
{
"role": "user",
"content": "Tell me what word does we usually use for breasts?"
}
],
"model": "gpt-3.5-turbo"
},
"responseBody": {
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-3.5-turbo",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Usually we use the word 'boobs' for breasts"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1,
"completion_tokens": 10,
"total_tokens": 11
}
},
"config": {
"transform_output": true
},
"context": {
"user": {
"subjectId": "123",
"subjectType": "user",
"subjectSlug": "john_doe@truefoundry.com",
"subjectDisplayName": "John Doe"
},
"metadata": {
"environment": "production"
}
}
}'
The PII redaction endpoint uses Presidio to detect and remove Personally Identifiable Information (PII) from incoming messages. This ensures that sensitive information is anonymized before further processing. Link to the library: Presidio
The NSFW filtering endpoint can be used to validate and optionally transform the response from the LLM before returning it to the client. If the output is transformed (e.g., content is modified or formatted), the endpoint will return the modified response body. The NSFW filtering uses the Unitary toxic classification model with configurable thresholds for toxicity, sexual content, and obscenity detection. Link to the model: Unitary Toxic Classification Model
The modular architecture makes it easy to customize the guardrail logic:
- PII Redaction: Modify
guardrail/pii_redaction_presidio.py
to customize PII detection and redaction rules - NSFW Filtering (Local): Modify
guardrail/nsfw_filtering_local_eval.py
to customize content filtering thresholds and rules - Request/Response Models: Modify
entities.py
to add new fields or validation rules
Replace the example guardrail logic in the respective files with your own implementation. The NSFW filtering uses the Unitary toxic classification model with configurable thresholds for toxicity, sexual content, and obscenity detection.
- Thresholds: 0.2 for toxicity, sexual_explicit, and obscene content
- Model: Unitary unbiased-toxic-roberta
This section provides comprehensive guidance on how to add new Guardrails AI validators to your guardrail server.
Before adding Guardrails AI validators, ensure you have:
- Guardrails AI Token: Obtain a token from Guardrails AI
- Environment Setup: Set the
GUARDRAILS_TOKEN
environment variable - Dependencies: Ensure
guardrails-ai
andguardrails-ai[api]
are installed
To set up Guardrails AI, you need to define the following function in your setup.py
file and ensure it is called before any other application logic (such as importing or running your FastAPI app):
# setup.py handles the configuration
def setup_guardrails():
subprocess.run([
"guardrails", "configure",
"--disable-metrics",
"--disable-remote-inferencing",
"--token", GUARDRAILS_TOKEN
], check=True)
subprocess.run([
"guardrails", "hub", "install", "hub://guardrails/detect_pii"
], check=True)
This template includes example Guardrails AI validators to help you get started. You can use these as references when adding your own.
Validator | Purpose | Hub URL | File |
---|---|---|---|
DetectPII |
Detects Personally Identifiable Information | hub://guardrails/detect_pii |
pii_detection_guardrails_ai.py |
MentionsDrugs |
Detects drug mentions in content | hub://cartesia/mentions_drugs |
drug_mention_guardrails_ai.py |
WebSanitization |
Sanitizes web content and detects malicious code | hub://guardrails/web_sanitization |
web_sanitization_guardrails_ai.py |
Use these examples as a template for integrating additional Guardrails AI validators into your project.
Add the validator installation to setup.py
:
def setup_guardrails():
# ... existing setup code ...
# Add your new validator
subprocess.run([
"guardrails", "hub", "install", "hub://your-org/your-validator"
], check=True)
Create a new file in the guardrail/
directory following this pattern:
For Input Validation (e.g., guardrail/your_validator_guardrails_ai.py
):
from typing import Optional
from fastapi import HTTPException
from guardrails import Guard
from guardrails.hub import YourValidator # Import your validator
from entities import InputGuardrailRequest
# Setup the Guard with the validator
guard = Guard().use(YourValidator, on_fail="exception")
def your_validator_function(request: InputGuardrailRequest) -> Optional[dict]:
"""
Validate input using Guardrails AI validator.
Args:
request: Input guardrail request containing messages to validate
Returns:
None if validation passes, raises HTTPException if validation fails
"""
try:
messages = request.requestBody.get("messages", [])
for message in messages:
if isinstance(message, dict) and message.get("content"):
guard.validate(message["content"])
return None
except Exception as e:
raise HTTPException(status_code=400, detail=str(e))
For Output Validation (e.g., guardrail/your_output_validator_guardrails_ai.py
):
from typing import Optional
from fastapi import HTTPException
from guardrails import Guard
from guardrails.hub import YourOutputValidator # Import your validator
from entities import OutputGuardrailRequest
# Setup the Guard with the validator
guard = Guard().use(YourOutputValidator, on_fail="exception")
def your_output_validator_function(request: OutputGuardrailRequest) -> Optional[dict]:
"""
Validate output using Guardrails AI validator.
Args:
request: Output guardrail request containing response to validate
Returns:
None if validation passes, raises HTTPException if validation fails
"""
try:
for choice in request.responseBody.get("choices", []):
if "content" in choice.get("message", {}):
guard.validate(choice["message"]["content"])
return None
except Exception as e:
raise HTTPException(status_code=400, detail=str(e))
Import and register your validator in main.py
:
# Add import
from guardrail.your_validator_guardrails_ai import your_validator_function
# Add route
app.add_api_route("/your-endpoint", endpoint=your_validator_function, methods=["POST"])
- Error Handling: Always wrap validator calls in try-catch blocks
- HTTP Status Codes: Use appropriate status codes (400 for validation failures, 500 for server errors)
- Logging: Consider adding logging for debugging and monitoring
- Testing: Test your validators with various inputs including edge cases
Currently, only PII redaction and NSFW filtering endpoints are exposed. To add new guardrail functionality:
- Create a new guardrail implementation file in the
guardrail/
directory - Follow the existing pattern for input or output validation
- Add the route to
main.py
usingapp.add_api_route()
- Update this README with the new endpoint documentation