[CLI][SDK] Add supports to CServe V3 #109

V2arK · 2025-08-29T19:29:49Z

Changes Made

SDK Updates (`centml/sdk/api.py`)

CServe v3 Support

✅ Added CreateCServeV3DeploymentRequest import
✅ Enhanced get_cserve() with auto-detection (tries V3 first, falls back to V2)
✅ Changed create_cserve() to default to V3 API (breaking change)
✅ Added backward compatibility with explicit create_cserve_v2() method
✅ Enhanced update_cserve() with version validation and unified interface

Inference v3 Support

✅ Added CreateInferenceV3DeploymentRequest import
✅ Enhanced get_inference() with auto-detection (tries V3 first, falls back to V2)
✅ Changed create_inference() to default to V3 API
✅ Added explicit create_inference_v2() and create_inference_v3() methods
✅ Added update_inference() with version validation and unified interface
✅ Added version detection utilities: detect_inference_deployment_version()

CLI Updates (`centml/cli/cluster.py`)

✅ Added CSERVE_V3 and INFERENCE_V3 deployment type mappings
✅ Made both cserve and inference commands default to V3
✅ Implemented robust _get_replica_info() helper for field mapping (min_scale ↔ min_replicas)
✅ Enhanced display logic for both V2 and V3 deployments with unified interface
✅ Enhanced error handling for missing recipe properties

Examples

✅ Updated create_cserve.py to use V3 by default with rollout strategy demonstration
✅ Updated create_inference.py to use V3 by default with V3-specific features

Key Features

Feature	V2	V3
Field Names	`min_scale`, `max_scale`	`min_replicas`, `max_replicas`
Rollout Strategy	❌	✅ `max_surge`, `max_unavailable`
Initial Replicas	❌	✅ `initial_replicas`
Default API	❌	✅
Auto-Detection	❌	✅
CLI Command	Works via auto-detection	Default behavior

🧪 Testing Results

🔬 Inference v3 Creation & Auto-Detection

$ python -c "SDK creation test for inference v3..."
Creating inference v3 deployment...
Created V3 deployment ID: 4237
Testing get_inference auto-detection with V3...
Retrieved deployment: nginx-v3-test
Deployment type: DeploymentType.INFERENCE_V3
Min replicas: 1
Max replicas: 2
Image URL: nginxinc/nginx-unprivileged
Port: 8080

✅ V3 SDK creation test successful! Deployment ID: 4237

🖥️ CLI Display - Inference v3

$ centml cluster get inference 4237
╭────────────┬────────────────────────────────────────╮
│ Name       │ nginx-v3-test                          │
│ Status     │ ready                                  │
│ Endpoint   │ nginx-v3-test.d691afed.c-09.centml.com │
│ Created at │ 2025-09-02 22:25:00                    │
│ Hardware   │ small (1x H200)                        │
│ Cost       │ 4.1 credits/hr                          │
╰────────────┴────────────────────────────────────────╯
Additional deployment configurations:
╭───────────────────────┬─────────────────────────────╮
│ Image                 │ nginxinc/nginx-unprivileged │
│ Container port        │ 8080                        │
│ Healthcheck           │ /                           │
│ Replicas              │ {'min': 1, 'max': 2}        │  ← V3 fields
│ Environment variables │ {'TEST': 'inference_v3'}    │
│ Max concurrency       │ 10                          │
╰───────────────────────┴─────────────────────────────╯

🔬 CServe v3 Creation & Testing

$ python -c "SDK creation test for cserve v3..."
Creating cserve v3 deployment...
Created V3 deployment ID: 4240
Testing get_cserve auto-detection with V3...
Retrieved deployment: cserve-v3-test
Deployment type: DeploymentType.CSERVE_V3
Model: microsoft/DialoGPT-small
Min replicas: 1
Max replicas: 2

✅ V3 CServe SDK creation test successful! Deployment ID: 4240

🖥️ CLI Display - CServe v3

$ centml cluster get cserve 4240
╭────────────┬─────────────────────────────────────────╮
│ Name       │ cserve-v3-test                          │
│ Status     │ unknown                                 │
│ Endpoint   │ cserve-v3-test.d691afed.c-09.centml.com │
│ Created at │ 2025-09-02 22:29:01                     │
│ Hardware   │ small (1x H200)                         │
│ Cost       │ 4.1 credits/hr                          │
╰────────────┴─────────────────────────────────────────╯
Additional deployment configurations:
╭────────────────────┬──────────────────────────────╮
│ Hugging face model │ microsoft/DialoGPT-small     │
│ Parallelism        │ {'tensor': 1, 'pipeline': 1} │
│ Replicas           │ {'min': 1, 'max': 2}         │  ← V3 fields
│ Max concurrency    │ 50                           │
╰────────────────────┴──────────────────────────────╯

🔬 Backward Compatibility - V2 Still Works

$ python -c "SDK creation test for inference v2..."
Creating inference v2 deployment...
Created V2 deployment ID: 4238
Testing get_inference auto-detection with V2...
Retrieved deployment: nginx-v2-test
Deployment type: DeploymentType.INFERENCE_V2
Min scale: 1          ← V2 terminology
Max scale: 2          ← V2 terminology

✅ V2 SDK creation test successful! Deployment ID: 4238

🔧 Version Detection Working

$ python -c "Version detection tests..."
Testing cserve version detection...
V3 Deployment 4240 detected as: v3
V2 Deployment 4241 detected as: v2

Testing deployment version detection from response objects...
V3 deployment response detected as: v3
V2 deployment response detected as: v2

Testing both deployments can be retrieved with get_cserve()...
V3 deployment name: cserve-v3-test, type: DeploymentType.CSERVE_V3
V2 deployment name: cserve-v2-test, type: DeploymentType.CSERVE_V2

✅ All version detection tests passed!

📝 Update Functionality - V3 Updates Working

$ python -c "Update tests for inference v3..."
✅ Version detection test successful!
Detected version: v3

Testing update functionality (keeping same name)...
Updating deployment...
✅ Update successful!
Verifying update...
Max replicas: 3         # Updated from 2 → 3
Healthcheck: /health    # Updated from / → /health  
Concurrency: 20         # Updated from 10 → 20
Env vars: {'TEST': 'updated_v3', 'NEW_VAR': 'added'}

🛡️ Cross-Version Validation - Error Handling

$ python -c "Cross-version validation tests..."
Testing cross-version validation (should fail)...
✅ V2→V3 validation working correctly: Deployment 4238 is Inference V2, but you provided a V3 request. Please use CreateInferenceDeploymentRequest instead.

✅ V3→V2 validation working correctly: Deployment 4239 is Inference V3, but you provided a V2 request. Please use CreateInferenceV3DeploymentRequest instead.

✅ V2→V3 validation working correctly: Deployment 4241 is CServe V2, but you provided a V3 request. Please use CreateCServeV2DeploymentRequest instead.

✅ V3→V2 validation working correctly: Deployment 4240 is CServe V3, but you provided a V2 request. Please use CreateCServeV3DeploymentRequest instead.

🔄 CLI After Updates - Field Mapping Working

$ centml cluster get inference 4237  # After V3 update
Additional deployment configurations:
╭───────────────────────┬────────────────────────────────────────────╮
│ Image                 │ nginxinc/nginx-unprivileged                │
│ Container port        │ 8080                                       │
│ Healthcheck           │ /health                                    │  ✅ Updated
│ Replicas              │ {'min': 1, 'max': 3}                       │  ✅ Updated
│ Environment variables │ {'TEST': 'updated_v3', 'NEW_VAR': 'added'} │  ✅ Updated  
│ Max concurrency       │ 20                                         │  ✅ Updated
╰───────────────────────┴────────────────────────────────────────────╯

$ centml cluster get cserve 4240  # After V3 update  
Additional deployment configurations:
╭────────────────────┬──────────────────────────────╮
│ Hugging face model │ microsoft/DialoGPT-small     │
│ Parallelism        │ {'tensor': 1, 'pipeline': 1} │
│ Replicas           │ {'min': 1, 'max': 3}         │  ✅ Updated from 2 → 3
│ Max concurrency    │ 100                          │  ✅ Updated from 50 → 100
╰────────────────────┴──────────────────────────────╯

🧹 Testing Summary

✅ Test Environment: Cluster 1011, Hardware Instance 1090 (same as deployment 4221)
✅ Test Deployments Created: 6 total (3 cserve: 4240-v3, 4241-v2, temp; 3 inference: 4237-v3, 4238-v2, temp)
✅ All deployments tested successfully and cleaned up
✅ Zero linting errors introduced
✅ All imports work correctly
✅ Auto-detection working seamlessly for both deployment types
✅ Cross-version validation prevents user errors with helpful messages

🎯 CServe & Inference V3 in Action - CLI & SDK Snippets

⚡ SDK Examples

1. Inference V3 with Rollout Strategy (New Default)

from centml.sdk.api import get_centml_client
from centml.sdk import CreateInferenceV3DeploymentRequest

with get_centml_client() as cclient:
    config = CreateInferenceV3DeploymentRequest(
        name='nginx-v3',
        cluster_id=1011,
        hardware_instance_id=1090,
        image_url='nginxinc/nginx-unprivileged',
        port=8080,
        min_replicas=1,      # V3 terminology
        max_replicas=3,      # V3 terminology
        initial_replicas=1,  # V3 field - initial number of replicas
        # V3 rollout strategy parameters
        max_surge=1,         # Allow 1 extra pod during updates
        max_unavailable=0,   # Keep all pods available during updates
        healthcheck='/health',
        concurrency=20,
    )
    
    # Uses V3 API by default now!
    response = cclient.create_inference(config)  
    print(f"✅ V3 Inference Deployment: {response.id}")

2. CServe V3 with Rollout Strategy (Default)

from centml.sdk import CreateCServeV3DeploymentRequest, CServeV2Recipe

config = CreateCServeV3DeploymentRequest(
    name='llm-zero-downtime',
    cluster_id=1011,
    hardware_instance_id=1090,
    recipe=CServeV2Recipe(model='microsoft/DialoGPT-small'),
    min_replicas=2,      # V3 terminology
    max_replicas=5,      # V3 terminology
    max_surge=1,         # V3 rollout parameter - allows 1 extra pod during updates
    max_unavailable=0,   # V3 rollout parameter - keep all pods available
    concurrency=100,
)

response = cclient.create_cserve(config)  # Uses V3 by default
deployment = cclient.get_cserve(response.id)  # Auto-detects V3
print(f"Replicas: min={deployment.min_replicas}, max={deployment.max_replicas}")
# Output: Replicas: min=2, max=5

3. Auto-Detection in Action

# Works seamlessly with both V2 and V3 deployments
def get_deployment_info(deployment_id):
    with get_centml_client() as cclient:
        # Single method works for both versions
        deployment = cclient.get_inference(deployment_id)  # Auto-detects V2/V3
        
        # Field mapping handled automatically
        replica_info = cclient._get_replica_info(deployment)
        print(f"Replicas: {replica_info}")
        
        # V3 deployment: {'min': 1, 'max': 3} from min_replicas/max_replicas
        # V2 deployment: {'min': 1, 'max': 3} from min_scale/max_scale

4. V2 Backward Compatibility

from centml.sdk import CreateInferenceDeploymentRequest, CreateCServeV2DeploymentRequest

# V2 still works with explicit methods
inference_v2 = CreateInferenceDeploymentRequest(
    name='legacy-inference',
    min_scale=1,      # V2 terminology
    max_scale=4,      # V2 terminology  
    # ... other fields
)
response = cclient.create_inference_v2(inference_v2)  # Explicit V2 call

cserve_v2 = CreateCServeV2DeploymentRequest(
    name='legacy-cserve',
    min_scale=1,      # V2 terminology
    max_scale=3,      # V2 terminology
    # ... other fields
)
response = cclient.create_cserve_v2(cserve_v2)  # Explicit V2 call

Migration Guide

For SDK Users

# Before (V2 was default)
CreateInferenceDeploymentRequest(min_scale=1, max_scale=5)
CreateCServeV2DeploymentRequest(min_scale=1, max_scale=5)

# After (V3 is now default) 
CreateInferenceV3DeploymentRequest(
    min_replicas=1, max_replicas=5,    # V3 terminology
    max_surge=1, max_unavailable=0     # New V3 rollout capabilities
)
CreateCServeV3DeploymentRequest(
    min_replicas=1, max_replicas=5,    # V3 terminology  
    max_surge=1, max_unavailable=0     # New V3 rollout capabilities
)

For CLI Users

# Both inference and cserve now default to V3
centml cluster get inference 123   # Auto-detects V2/V3
centml cluster get cserve 456      # Auto-detects V2/V3

# Field mapping works seamlessly for both versions
# V2: Shows {'min': 1, 'max': 3} from min_scale/max_scale
# V3: Shows {'min': 1, 'max': 3} from min_replicas/max_replicas

Rollout Strategy Note

V3 rollout parameters (max_surge, max_unavailable) are accepted during creation and influence platform orchestration behavior for zero-downtime deployments, but are not returned in deployment responses (by platform design).

Backward Compatibility ✅

✅ All existing V2 workflows continue to work without changes
✅ V2 API methods remain available (create_inference_v2, create_cserve_v2)
✅ Auto-detection seamlessly handles both V2 and V3 deployments
✅ No breaking changes to existing V2 deployments
✅ CLI provides unified interface regardless of deployment version

The test outputs above demonstrate that:

✅ Both v2 and v3 deployments create successfully for both CServe and Inference
✅ Auto-detection works seamlessly across all deployment types and versions
✅ CLI displays all versions using unified field mapping
✅ Updates work correctly and are reflected in CLI immediately
✅ Cross-version validation prevents errors with helpful messages
✅ All functionality tested end-to-end with real deployments on production hardware

centml/cli/cluster.py

examples/sdk/create_cserve.py

centml/sdk/utils/client_certs.py

centml/sdk/api.py

michaelshin

Looks good with me

centml/sdk/api.py

anandj91 · 2025-09-10T18:12:36Z

centml/sdk/api.py

+    def create_inference_v2(self, request: CreateInferenceDeploymentRequest):
        return self._api.create_inference_deployment_deployments_inference_post(request)

+    def create_inference_v3(self, request: CreateInferenceV3DeploymentRequest):
+        return self._api.create_inference_v3_deployment_deployments_inference_v3_post(request)
+


we don't need these two. we should only allow creating v3 deployments

implement v3

03b1152

V2arK requested review from gflarity, anandj91 and Dhruv-Mehndhiratta-Centml August 29, 2025 19:29

V2arK marked this pull request as draft August 29, 2025 19:29

V2arK self-assigned this Aug 29, 2025

change default to v3

7f58300

V2arK marked this pull request as ready for review August 29, 2025 19:36

V2arK added 3 commits August 29, 2025 16:07

format

2f37c56

black format

8b37022

black format

059bb0c