Deploying ML models with Modal and Ouro

Deploy your machine learning model to Modal and add it to Ouro for others to use

11 minute read

Last updated August 12, 2025

This guide demonstrates how to deploy any type of ML model to Modal's serverless platform and create APIs ready for Ouro integration. Modal provides serverless GPU infrastructure that automatically scales from zero to thousands of containers, making it ideal to host ML models for sharing on Ouro.

Add your model to Ouro for a user-friendly interface, rate limiting, monetization, and integration with datasets and files across the Ouro platform.

Deploying ML models with Modal and Ouro

Deploying to Modal

Prerequisites and setup

Before starting, you'll need a Modal account and basic Python development environment.

Install Modal CLI:

pip install modal
modal setup

Modal uses GitHub for authentication. Run modal setup and follow the prompts to connect your GitHub account.

Core workflow

Prepare a container image, expose FastAPI endpoints (with OpenAPI), add simple auth for Ouro, deploy with modal deploy, then monitor. Modal handles infra and scaling; you focus on code and functionality.

Model containerization and environment setup

Modal uses a declarative approach to define container environments. Create a base image with your model's dependencies:

import modal
 
# Define container image with ML dependencies
ml_image = (
    modal.Image.debian_slim(python_version="3.12")
    .pip_install(
        "fastapi[standard]==0.115.4",
        "torch>=2.0.0",
        "torchvision",
        "transformers>=4.30.0",
        "scikit-learn>=1.3.0",
        "tensorflow>=2.13.0",
        "numpy>=1.24.0",
        "pandas>=2.0.0",
        "pillow>=10.0.0",
        "pydantic>=2.0.0"
    )
    .apt_install("git", "wget", "curl")
)
 
app = modal.App("ml-model-api", image=ml_image)

Modal caches images, so repeated deploys with the same deps are fast. If you have model artifacts, you can download them once to a persistent volume and mount it to your container.

Modal's guide is a great resource for a more detailed example.

FastAPI and OpenAPI

FastAPI generates an OpenAPI spec that Ouro can consume.

  • JSON https://your-modal-app.modal.run/openapi.json
  • Docs https://your-modal-app.modal.run/docs
  • ReDoc https://your-modal-app.modal.run/redoc

We recommend copying the JSON from the /openapi.json endpoint into a .json file for Ouro integration.

Header-based authentication for Ouro integration

In order to prevent others from using your API without permission, you can implement header-based authentication that Ouro can verify. You may not need this if you don't care about who uses your API or are not monetizing it.

In the service setting, make sure to set authentication to Ouro and add the secret as OURO_SERVICE_SECRET. Implement simple header-based authentication that Ouro can verify:

from fastapi import FastAPI, Header, HTTPException, Depends, Request
import os
import time
import logging
 
# Authentication utilities
async def validate_ouro_authentication(
    authorization: str = Header(None, alias="Authorization"),
    request: Request = None,
):
    """Validate Ouro platform authentication using `Authorization: Basic <token>` header"""
 
    # Required header: Authorization: Basic <token>
    if not authorization or not authorization.lower().startswith("basic "):
        client_host = getattr(getattr(request, "client", None), "host", "unknown")
        logging.warning(f"Missing or invalid Authorization header from {client_host}")
        raise HTTPException(status_code=401, detail="Missing Ouro auth header")
 
    token = authorization.split(" ", 1)[1].strip()
 
    # Validate against stored token
    expected_secret = os.environ.get("OURO_SERVICE_SECRET")
 
    if not expected_secret:
        logging.error("Missing env var OURO_SERVICE_SECRET for Ouro authentication")
        raise HTTPException(status_code=500, detail="Server configuration error")
 
    if token != expected_secret:
        logging.warning("Invalid Ouro token")
        raise HTTPException(status_code=401, detail="Invalid Ouro credentials")
 
    # Log successful authentication
    logging.info("Successful Ouro authentication")
 
    return {"platform": "ouro", "authenticated_at": time.time()}
 
# Apply authentication to endpoints
@web_app.post("/inference/ouro")
async def inference_for_ouro(
    request: PredictionInput,
    auth: dict = Depends(validate_ouro_authentication)
):
    """ML inference endpoint specifically for Ouro integration"""
    try:
        # Your ML processing logic
        result = await process_ml_inference(request)
 
        return {
            **result,
            "authenticated_via": auth["platform"],
            "request_id": f"ouro_{int(time.time())}"
        }
 
    except Exception as e:
        logging.error(f"Ouro inference error: {str(e)}")
        raise HTTPException(status_code=500, detail="Inference processing failed")

PyTorch model deployment pattern

PyTorch models often benefit from GPU resources and proper state management. Here's a production-ready pattern:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from typing import List, Optional
import torch
import torch.nn.functional as F
import logging
 
# Request/Response schemas for OpenAPI generation
class PredictionInput(BaseModel):
    features: List[float] = Field(..., description="Input feature vector")
    model_version: Optional[str] = Field("v1.0", description="Model version to use")
 
    class Config:
        json_schema_extra = {
            "example": {
                "features": [1.0, 2.0, 3.0, 4.0],
                "model_version": "v1.0"
            }
        }
 
class PredictionOutput(BaseModel):
    prediction: float = Field(..., description="Model prediction")
    confidence: float = Field(..., description="Prediction confidence score")
    model_version: str = Field(..., description="Model version used")
    inference_time_ms: float = Field(..., description="Inference time in milliseconds")
 
@app.cls(
    image=ml_image,
    gpu="T4",  # Use T4 GPU for cost-effective inference
    memory=4096,  # 4GB RAM
    volumes={"/cache": modal.Volume.from_name("model-cache", create_if_missing=True)},
    secrets=[modal.Secret.from_name("ml-model-secrets")]
)
class PyTorchModelService:
    def __enter__(self):
        """Load model once when container starts"""
        import time
        start_time = time.time()
 
        # Load model from cache or download
        model_path = "/cache/pytorch_model.pt"
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
        try:
            self.model = torch.load(model_path, map_location=self.device)
        except FileNotFoundError:
            # Download model if not cached
            self.model = self._download_and_cache_model(model_path)
 
        self.model.eval()
        self.model_version = "v1.0"
 
        load_time = time.time() - start_time
        logging.info(f"Model loaded in {load_time:.2f}s on {self.device}")
 
    def _download_and_cache_model(self, cache_path: str):
        """Download model and save to cache"""
        # Implementation depends on your model source
        model = torch.nn.Linear(4, 1)  # Example model
        torch.save(model, cache_path)
        return model
 
    @modal.method()
    def predict(self, input_data: PredictionInput) -> PredictionOutput:
        import time
        start_time = time.time()
 
        try:
            # Prepare input tensor
            features = torch.FloatTensor([input_data.features]).to(self.device)
 
            # Perform inference
            with torch.no_grad():
                output = self.model(features)
                prediction = output.item()
 
                # Calculate confidence using sigmoid
                confidence = torch.sigmoid(output).item()
 
            inference_time = (time.time() - start_time) * 1000  # Convert to ms
 
            return PredictionOutput(
                prediction=prediction,
                confidence=confidence,
                model_version=self.model_version,
                inference_time_ms=inference_time
            )
 
        except Exception as e:
            logging.error(f"Prediction error: {str(e)}")
            raise HTTPException(status_code=500, detail=f"Inference failed: {str(e)}")
 
    @modal.fastapi_app()
    def create_fastapi_app(self):
        web_app = FastAPI(
            title="PyTorch ML Model API",
            description="Production PyTorch model serving with Modal",
            version="1.0.0",
            docs_url="/docs",
            openapi_url="/openapi.json"
        )
 
        @web_app.post("/predict", response_model=PredictionOutput)
        async def predict_endpoint(request: PredictionInput):
            return self.predict(request)
 
        @web_app.get("/health")
        async def health_check():
            return {"status": "healthy", "model_loaded": True, "device": str(self.device)}
 
        return web_app

Key concepts: stateful model load, GPU detection, volume cache, structured errors.

Deployment commands and workflow

Development workflow: Run a local dev server, then send a test request against your Ouro-authenticated route.

# Start development server with live reload
modal serve app.py

You can test your endpoints with tools like Postman, curl, or add a private version of your API to Ouro and test it there.

Production deployment: Deploy to production and use logs/stats to monitor behavior and costs.

# Deploy to production
modal deploy app.py
 
# View logs
modal logs app.py
 
# Monitor function performance
modal stats app.py

Secrets management: Store your Ouro credentials and other tokens as Modal secrets.

# Create secrets for Ouro integration
modal secret create ouro-api-credentials \
  OURO_API_KEY=your-ouro-key \
  OURO_SERVICE_SECRET=your-ouro-secret

OpenAPI notes

OpenAPI is auto-generated; use the JSON at /openapi.json when integrating with Ouro. Copy the object at this endpoint and paste it into an openapi.json file. You'll be able to upload this file directly to Ouro to import your API.

Monitoring and cost optimization

Use Modal dashboards for metrics, logs, resource usage, and cost by function. Tune resources and scaling for savings:

# Optimize for cost-effectiveness
@app.function(
    # Choose appropriate GPU for workload
    gpu="T4",          # Most cost-effective for small models
    # gpu="A100",      # For large models requiring more VRAM
 
    # Optimize scaling parameters
    min_containers=0,          # Scale to zero when idle
    scaledown_window=60,       # Quick scale-down for cost savings
    max_containers=10,         # Limit maximum spend
 
    # Optimize resource allocation
    memory=2048,               # Only allocate needed memory
    cpu=2                      # Adjust CPU cores based on workload
)

Add your Modal model to Ouro

Publish your Modal-hosted model on Ouro to monetize per request. Ouro helps make your model discoverable, usable, and monetizable.

How Ouro works

Ouro is a collaborative marketplace for APIs, alongside datasets and files. Its asset-linking system connects your API to relevant resources for discovery and workflows.

Optimize your Modal deployment for Ouro

Modify your Modal function to use FastAPI with comprehensive OpenAPI documentation. Here's the recommended pattern:

import modal
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional, List
 
app = modal.App("ml-model-ouro")
web_app = FastAPI(
    title="Your ML Model API",
    description="Detailed description of your model's capabilities, use cases, and limitations",
    version="1.0.0",
    contact={
        "name": "Your Name",
        "email": "your.email@example.com"
    }
)
 
class PredictionRequest(BaseModel):
    input_data: List[float]
    model_parameters: Optional[dict] = None
 
    class Config:
        schema_extra = {
            "example": {
                "input_data": [1.0, 2.5, 3.2],
                "model_parameters": {"temperature": 0.7}
            }
        }
 
class PredictionResponse(BaseModel):
    prediction: float
    confidence: float
    model_version: str
    processing_time_ms: float
 
@web_app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    """
    Generate predictions using the ML model.
 
    This endpoint processes input data through our trained model
    and returns predictions with confidence scores.
    """
    try:
        # Your model inference logic here
        result = your_model_function(request.input_data)
 
        return PredictionResponse(
            prediction=result["prediction"],
            confidence=result["confidence"],
            model_version="1.0.0",
            processing_time_ms=result["processing_time"]
        )
    except Exception as e:
        raise HTTPException(status_code=400, detail=f"Prediction failed: {str(e)}")
 
@app.function(
    image=modal.Image.debian_slim().pip_install(["fastapi", "your-ml-dependencies"]),
    container_idle_timeout=300
)
@modal.asgi_app()
def fastapi_app():
    return web_app

Deploy this to Modal and note your endpoint URL (e.g., https://your-org--ml-model-ouro-fastapi-app.modal.run).

Enhanced OpenAPI specification for Ouro integration

Ouro provides utilities to enhance your OpenAPI specification with custom metadata. Use the ouro-py package to add Ouro-specific extensions:

from fastapi import FastAPI, Header
from fastapi.openapi.utils import get_openapi
from ouro.utils import get_custom_openapi, ouro_field
from typing import Optional
 
web_app = FastAPI(
    title="Your ML Model API",
    description="Production ML model serving with Ouro integration",
    version="1.0.0"
)
 
# Apply Ouro's custom OpenAPI enhancements
web_app.openapi = get_custom_openapi(web_app, get_openapi)
 
@web_app.post("/predict")
@ouro_field("x-ouro-input-asset-type", "file")      # Specify input type
@ouro_field("x-ouro-output-asset-type", "file")  # Specify output type
async def predict(
    request: PredictionRequest,
    # Ouro passes these headers to identify context
    ouro_route_id: Optional[str] = Header(None, alias="ouro-route-id"),
    ouro_route_org_id: Optional[str] = Header(None, alias="ouro-route-org-id"),
    ouro_route_team_id: Optional[str] = Header(None, alias="ouro-route-team-id"),
    ouro_action_id: Optional[str] = Header(None, alias="ouro-action-id"),
):
    """
    Process input and generate predictions.
 
    The Ouro headers provide context about:
    - ouro_route_id: The service route being accessed
    - ouro_route_org_id: Organization ID for asset creation context
    - ouro_route_team_id: Team ID for asset creation context
    - ouro_action_id: Unique action ID for logging and tracking
    """
    from ouro import Ouro
    import os
 
    # Initialize Ouro SDK with your API key
    ouro = Ouro(api_key=os.environ["OURO_API_KEY"])
 
    # Log progress to Ouro's action system
    if ouro_action_id:
        ouro.client.post(
            f"/actions/{ouro_action_id}/log",
            json={
                "message": "Starting model inference...",
                "asset_id": ouro_route_id,
                "level": "info",
            }
        )
 
    # Your model inference logic here
    result = your_model_function(request)
 
    # Return result with Ouro metadata for asset creation
    return {
        "prediction": result,
        "file": {  # This creates a file asset in Ouro
            "name": f"Prediction for {request.input_name}",
            "description": "Model prediction output",
            "filename": "prediction.json",
            "type": "application/json",
            "extension": "json",
            "base64": base64_encode(result),
            "org_id": ouro_route_org_id,  # Use Ouro context
            "team_id": ouro_route_team_id,
        }
    }

The ouro_field decorator adds custom OpenAPI extensions that enable:

  • Asset type specification: Define input/output asset types (file, dataset, post, etc.)
  • Asset filtering: Specify compatible input asset formats
  • Automatic integration: Your API appears in relevant asset workflows

Access your enhanced OpenAPI spec at https://your-modal-endpoint/openapi.json.

Account setup and API configuration

Create your Ouro account and configure API access:

  1. Account registration: Sign up at ouro.foundation
  2. API key generation: Navigate to Settings → API Keys to create an API key for the Ouro SDK
  3. Store API key in Modal: Add your Ouro API key as a Modal secret:
    modal secret create ouro-credentials OURO_API_KEY=your-api-key-here

For enhanced security, implement domain whitelisting in your Modal function to restrict access to Ouro's infrastructure:

from fastapi import Request, HTTPException
 
ALLOWED_DOMAINS = ["api.ouro.foundation", "ouro.foundation"]
 
@web_app.middleware("http")
async def verify_origin(request: Request, call_next):
    origin = request.headers.get("origin", "")
    referer = request.headers.get("referer", "")
 
    # Allow direct API testing and Ouro platform access
    if any(domain in origin or domain in referer for domain in ALLOWED_DOMAINS):
        response = await call_next(request)
        return response
 
    # Allow requests without origin (direct API calls)
    if not origin and not referer:
        response = await call_next(request)
        return response
 
    raise HTTPException(status_code=403, detail="Access denied")

Creating your service on Ouro

Navigate to Ouro's service creation interface:

  1. Access service creation: Go to ouro.foundation/services/create
  2. Upload OpenAPI spec: Upload your OpenAPI JSON file
  3. Service configuration:
    • Name: Choose a descriptive, discoverable name
    • Description: Detailed explanation of your model's capabilities, use cases, and performance characteristics
    • Team: Select the team you want to add the service to
    • Base URL: Your Modal endpoint URL
    • Authentication: Choose Ouro authentication. The platform issues a secret token; validate incoming requests using the Authorization: Basic <token> header (as shown above).

Ouro automatically extracts all endpoint request and response schemas from your OpenAPI specification. You'll still be able to modify all configuration after importing your API using the UI.

Monetization setup

Anyone can start monetizing their APIs on Ouro. Configure pricing on each route by editing the route:

  • Access control: Set to monetized
  • Pay-per-use: Set per-request pricing
  • Usage quotas: Optional rate limiting and usage caps.

Now, anytime someone uses your API, they'll be charged and you'll receive a Bitcoin payment to your wallet.

Community and discoverability

Adding your API is just the beginning. Maximize your API's success through Ouro's collaborative features:

Join relevant teams:

  • Machine Learning Team: Core community for ML practitioners
  • Domain-specific teams: Join teams related to your API's application area (finance, healthcare, etc.)
  • Data Science Team: Broader community interested in data processing APIs

Create supporting content:

  • Posts, example workflows with datasets, tutorials, and Q&A

Asset integration strategy:

  • Link to relevant datasets, build example workflows, and cross-promote with dataset owners

Integration patterns and best practices

Recommended architecture: A quick view of the request flow through Ouro to your Modal service.

User Request → Ouro Platform → Your Modal API → ML Model → Response

Advanced features and asset integration

Custom asset workflows: Your ML API can automatically appear in Ouro's asset transformation recommendations. When users have compatible assets, your API will be suggested in the sidebar for data processing workflows.

With these steps, your Modal model can be published, discovered, and monetized on Ouro.


Deploying ML models with Modal and Ouro · Guides