Deploying ML models with Modal and Ouro
Deploy your machine learning model to Modal and add it to Ouro for others to use
Last updated August 12, 2025
This guide demonstrates how to deploy any type of ML model to Modal's serverless platform and create APIs ready for Ouro integration. Modal provides serverless GPU infrastructure that automatically scales from zero to thousands of containers, making it ideal to host ML models for sharing on Ouro.
Add your model to Ouro for a user-friendly interface, rate limiting, monetization, and integration with datasets and files across the Ouro platform.

Deploying to Modal
Prerequisites and setup
Before starting, you'll need a Modal account and basic Python development environment.
Install Modal CLI:
pip install modal
modal setup
Modal uses GitHub for authentication. Run modal setup
and follow the prompts to connect your GitHub account.
Core workflow
Prepare a container image, expose FastAPI endpoints (with OpenAPI), add simple auth for Ouro, deploy with modal deploy
, then monitor. Modal handles infra and scaling; you focus on code and functionality.
Model containerization and environment setup
Modal uses a declarative approach to define container environments. Create a base image with your model's dependencies:
import modal
# Define container image with ML dependencies
ml_image = (
modal.Image.debian_slim(python_version="3.12")
.pip_install(
"fastapi[standard]==0.115.4",
"torch>=2.0.0",
"torchvision",
"transformers>=4.30.0",
"scikit-learn>=1.3.0",
"tensorflow>=2.13.0",
"numpy>=1.24.0",
"pandas>=2.0.0",
"pillow>=10.0.0",
"pydantic>=2.0.0"
)
.apt_install("git", "wget", "curl")
)
app = modal.App("ml-model-api", image=ml_image)
Modal caches images, so repeated deploys with the same deps are fast. If you have model artifacts, you can download them once to a persistent volume and mount it to your container.
Modal's guide is a great resource for a more detailed example.
FastAPI and OpenAPI
FastAPI generates an OpenAPI spec that Ouro can consume.
- JSON
https://your-modal-app.modal.run/openapi.json
- Docs
https://your-modal-app.modal.run/docs
- ReDoc
https://your-modal-app.modal.run/redoc
We recommend copying the JSON from the /openapi.json
endpoint into a .json
file for Ouro integration.
Header-based authentication for Ouro integration
In order to prevent others from using your API without permission, you can implement header-based authentication that Ouro can verify. You may not need this if you don't care about who uses your API or are not monetizing it.
In the service setting, make sure to set authentication to Ouro
and add the secret as OURO_SERVICE_SECRET
.
Implement simple header-based authentication that Ouro can verify:
from fastapi import FastAPI, Header, HTTPException, Depends, Request
import os
import time
import logging
# Authentication utilities
async def validate_ouro_authentication(
authorization: str = Header(None, alias="Authorization"),
request: Request = None,
):
"""Validate Ouro platform authentication using `Authorization: Basic <token>` header"""
# Required header: Authorization: Basic <token>
if not authorization or not authorization.lower().startswith("basic "):
client_host = getattr(getattr(request, "client", None), "host", "unknown")
logging.warning(f"Missing or invalid Authorization header from {client_host}")
raise HTTPException(status_code=401, detail="Missing Ouro auth header")
token = authorization.split(" ", 1)[1].strip()
# Validate against stored token
expected_secret = os.environ.get("OURO_SERVICE_SECRET")
if not expected_secret:
logging.error("Missing env var OURO_SERVICE_SECRET for Ouro authentication")
raise HTTPException(status_code=500, detail="Server configuration error")
if token != expected_secret:
logging.warning("Invalid Ouro token")
raise HTTPException(status_code=401, detail="Invalid Ouro credentials")
# Log successful authentication
logging.info("Successful Ouro authentication")
return {"platform": "ouro", "authenticated_at": time.time()}
# Apply authentication to endpoints
@web_app.post("/inference/ouro")
async def inference_for_ouro(
request: PredictionInput,
auth: dict = Depends(validate_ouro_authentication)
):
"""ML inference endpoint specifically for Ouro integration"""
try:
# Your ML processing logic
result = await process_ml_inference(request)
return {
**result,
"authenticated_via": auth["platform"],
"request_id": f"ouro_{int(time.time())}"
}
except Exception as e:
logging.error(f"Ouro inference error: {str(e)}")
raise HTTPException(status_code=500, detail="Inference processing failed")
PyTorch model deployment pattern
PyTorch models often benefit from GPU resources and proper state management. Here's a production-ready pattern:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from typing import List, Optional
import torch
import torch.nn.functional as F
import logging
# Request/Response schemas for OpenAPI generation
class PredictionInput(BaseModel):
features: List[float] = Field(..., description="Input feature vector")
model_version: Optional[str] = Field("v1.0", description="Model version to use")
class Config:
json_schema_extra = {
"example": {
"features": [1.0, 2.0, 3.0, 4.0],
"model_version": "v1.0"
}
}
class PredictionOutput(BaseModel):
prediction: float = Field(..., description="Model prediction")
confidence: float = Field(..., description="Prediction confidence score")
model_version: str = Field(..., description="Model version used")
inference_time_ms: float = Field(..., description="Inference time in milliseconds")
@app.cls(
image=ml_image,
gpu="T4", # Use T4 GPU for cost-effective inference
memory=4096, # 4GB RAM
volumes={"/cache": modal.Volume.from_name("model-cache", create_if_missing=True)},
secrets=[modal.Secret.from_name("ml-model-secrets")]
)
class PyTorchModelService:
def __enter__(self):
"""Load model once when container starts"""
import time
start_time = time.time()
# Load model from cache or download
model_path = "/cache/pytorch_model.pt"
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
try:
self.model = torch.load(model_path, map_location=self.device)
except FileNotFoundError:
# Download model if not cached
self.model = self._download_and_cache_model(model_path)
self.model.eval()
self.model_version = "v1.0"
load_time = time.time() - start_time
logging.info(f"Model loaded in {load_time:.2f}s on {self.device}")
def _download_and_cache_model(self, cache_path: str):
"""Download model and save to cache"""
# Implementation depends on your model source
model = torch.nn.Linear(4, 1) # Example model
torch.save(model, cache_path)
return model
@modal.method()
def predict(self, input_data: PredictionInput) -> PredictionOutput:
import time
start_time = time.time()
try:
# Prepare input tensor
features = torch.FloatTensor([input_data.features]).to(self.device)
# Perform inference
with torch.no_grad():
output = self.model(features)
prediction = output.item()
# Calculate confidence using sigmoid
confidence = torch.sigmoid(output).item()
inference_time = (time.time() - start_time) * 1000 # Convert to ms
return PredictionOutput(
prediction=prediction,
confidence=confidence,
model_version=self.model_version,
inference_time_ms=inference_time
)
except Exception as e:
logging.error(f"Prediction error: {str(e)}")
raise HTTPException(status_code=500, detail=f"Inference failed: {str(e)}")
@modal.fastapi_app()
def create_fastapi_app(self):
web_app = FastAPI(
title="PyTorch ML Model API",
description="Production PyTorch model serving with Modal",
version="1.0.0",
docs_url="/docs",
openapi_url="/openapi.json"
)
@web_app.post("/predict", response_model=PredictionOutput)
async def predict_endpoint(request: PredictionInput):
return self.predict(request)
@web_app.get("/health")
async def health_check():
return {"status": "healthy", "model_loaded": True, "device": str(self.device)}
return web_app
Key concepts: stateful model load, GPU detection, volume cache, structured errors.
Deployment commands and workflow
Development workflow: Run a local dev server, then send a test request against your Ouro-authenticated route.
# Start development server with live reload
modal serve app.py
You can test your endpoints with tools like Postman, curl, or add a private version of your API to Ouro and test it there.
Production deployment: Deploy to production and use logs/stats to monitor behavior and costs.
# Deploy to production
modal deploy app.py
# View logs
modal logs app.py
# Monitor function performance
modal stats app.py
Secrets management: Store your Ouro credentials and other tokens as Modal secrets.
# Create secrets for Ouro integration
modal secret create ouro-api-credentials \
OURO_API_KEY=your-ouro-key \
OURO_SERVICE_SECRET=your-ouro-secret
OpenAPI notes
OpenAPI is auto-generated; use the JSON at /openapi.json
when integrating with Ouro.
Copy the object at this endpoint and paste it into an openapi.json
file.
You'll be able to upload this file directly to Ouro to import your API.
Monitoring and cost optimization
Use Modal dashboards for metrics, logs, resource usage, and cost by function. Tune resources and scaling for savings:
# Optimize for cost-effectiveness
@app.function(
# Choose appropriate GPU for workload
gpu="T4", # Most cost-effective for small models
# gpu="A100", # For large models requiring more VRAM
# Optimize scaling parameters
min_containers=0, # Scale to zero when idle
scaledown_window=60, # Quick scale-down for cost savings
max_containers=10, # Limit maximum spend
# Optimize resource allocation
memory=2048, # Only allocate needed memory
cpu=2 # Adjust CPU cores based on workload
)
Add your Modal model to Ouro
Publish your Modal-hosted model on Ouro to monetize per request. Ouro helps make your model discoverable, usable, and monetizable.
How Ouro works
Ouro is a collaborative marketplace for APIs, alongside datasets and files. Its asset-linking system connects your API to relevant resources for discovery and workflows.
Optimize your Modal deployment for Ouro
Modify your Modal function to use FastAPI with comprehensive OpenAPI documentation. Here's the recommended pattern:
import modal
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional, List
app = modal.App("ml-model-ouro")
web_app = FastAPI(
title="Your ML Model API",
description="Detailed description of your model's capabilities, use cases, and limitations",
version="1.0.0",
contact={
"name": "Your Name",
"email": "your.email@example.com"
}
)
class PredictionRequest(BaseModel):
input_data: List[float]
model_parameters: Optional[dict] = None
class Config:
schema_extra = {
"example": {
"input_data": [1.0, 2.5, 3.2],
"model_parameters": {"temperature": 0.7}
}
}
class PredictionResponse(BaseModel):
prediction: float
confidence: float
model_version: str
processing_time_ms: float
@web_app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
"""
Generate predictions using the ML model.
This endpoint processes input data through our trained model
and returns predictions with confidence scores.
"""
try:
# Your model inference logic here
result = your_model_function(request.input_data)
return PredictionResponse(
prediction=result["prediction"],
confidence=result["confidence"],
model_version="1.0.0",
processing_time_ms=result["processing_time"]
)
except Exception as e:
raise HTTPException(status_code=400, detail=f"Prediction failed: {str(e)}")
@app.function(
image=modal.Image.debian_slim().pip_install(["fastapi", "your-ml-dependencies"]),
container_idle_timeout=300
)
@modal.asgi_app()
def fastapi_app():
return web_app
Deploy this to Modal and note your endpoint URL (e.g., https://your-org--ml-model-ouro-fastapi-app.modal.run
).
Enhanced OpenAPI specification for Ouro integration
Ouro provides utilities to enhance your OpenAPI specification with custom metadata.
Use the ouro-py
package to add Ouro-specific extensions:
from fastapi import FastAPI, Header
from fastapi.openapi.utils import get_openapi
from ouro.utils import get_custom_openapi, ouro_field
from typing import Optional
web_app = FastAPI(
title="Your ML Model API",
description="Production ML model serving with Ouro integration",
version="1.0.0"
)
# Apply Ouro's custom OpenAPI enhancements
web_app.openapi = get_custom_openapi(web_app, get_openapi)
@web_app.post("/predict")
@ouro_field("x-ouro-input-asset-type", "file") # Specify input type
@ouro_field("x-ouro-output-asset-type", "file") # Specify output type
async def predict(
request: PredictionRequest,
# Ouro passes these headers to identify context
ouro_route_id: Optional[str] = Header(None, alias="ouro-route-id"),
ouro_route_org_id: Optional[str] = Header(None, alias="ouro-route-org-id"),
ouro_route_team_id: Optional[str] = Header(None, alias="ouro-route-team-id"),
ouro_action_id: Optional[str] = Header(None, alias="ouro-action-id"),
):
"""
Process input and generate predictions.
The Ouro headers provide context about:
- ouro_route_id: The service route being accessed
- ouro_route_org_id: Organization ID for asset creation context
- ouro_route_team_id: Team ID for asset creation context
- ouro_action_id: Unique action ID for logging and tracking
"""
from ouro import Ouro
import os
# Initialize Ouro SDK with your API key
ouro = Ouro(api_key=os.environ["OURO_API_KEY"])
# Log progress to Ouro's action system
if ouro_action_id:
ouro.client.post(
f"/actions/{ouro_action_id}/log",
json={
"message": "Starting model inference...",
"asset_id": ouro_route_id,
"level": "info",
}
)
# Your model inference logic here
result = your_model_function(request)
# Return result with Ouro metadata for asset creation
return {
"prediction": result,
"file": { # This creates a file asset in Ouro
"name": f"Prediction for {request.input_name}",
"description": "Model prediction output",
"filename": "prediction.json",
"type": "application/json",
"extension": "json",
"base64": base64_encode(result),
"org_id": ouro_route_org_id, # Use Ouro context
"team_id": ouro_route_team_id,
}
}
The ouro_field
decorator adds custom OpenAPI extensions that enable:
- Asset type specification: Define input/output asset types (file, dataset, post, etc.)
- Asset filtering: Specify compatible input asset formats
- Automatic integration: Your API appears in relevant asset workflows
Access your enhanced OpenAPI spec at https://your-modal-endpoint/openapi.json
.
Account setup and API configuration
Create your Ouro account and configure API access:
- Account registration: Sign up at ouro.foundation
- API key generation: Navigate to Settings → API Keys to create an API key for the Ouro SDK
- Store API key in Modal: Add your Ouro API key as a Modal secret:
modal secret create ouro-credentials OURO_API_KEY=your-api-key-here
For enhanced security, implement domain whitelisting in your Modal function to restrict access to Ouro's infrastructure:
from fastapi import Request, HTTPException
ALLOWED_DOMAINS = ["api.ouro.foundation", "ouro.foundation"]
@web_app.middleware("http")
async def verify_origin(request: Request, call_next):
origin = request.headers.get("origin", "")
referer = request.headers.get("referer", "")
# Allow direct API testing and Ouro platform access
if any(domain in origin or domain in referer for domain in ALLOWED_DOMAINS):
response = await call_next(request)
return response
# Allow requests without origin (direct API calls)
if not origin and not referer:
response = await call_next(request)
return response
raise HTTPException(status_code=403, detail="Access denied")
Creating your service on Ouro
Navigate to Ouro's service creation interface:
- Access service creation: Go to ouro.foundation/services/create
- Upload OpenAPI spec: Upload your OpenAPI JSON file
- Service configuration:
- Name: Choose a descriptive, discoverable name
- Description: Detailed explanation of your model's capabilities, use cases, and performance characteristics
- Team: Select the team you want to add the service to
- Base URL: Your Modal endpoint URL
- Authentication: Choose Ouro authentication. The platform issues a secret token; validate incoming requests using the
Authorization: Basic <token>
header (as shown above).
Ouro automatically extracts all endpoint request and response schemas from your OpenAPI specification. You'll still be able to modify all configuration after importing your API using the UI.
Monetization setup
Anyone can start monetizing their APIs on Ouro. Configure pricing on each route by editing the route:
- Access control: Set to monetized
- Pay-per-use: Set per-request pricing
- Usage quotas: Optional rate limiting and usage caps.
Now, anytime someone uses your API, they'll be charged and you'll receive a Bitcoin payment to your wallet.
Community and discoverability
Adding your API is just the beginning. Maximize your API's success through Ouro's collaborative features:
Join relevant teams:
- Machine Learning Team: Core community for ML practitioners
- Domain-specific teams: Join teams related to your API's application area (finance, healthcare, etc.)
- Data Science Team: Broader community interested in data processing APIs
Create supporting content:
- Posts, example workflows with datasets, tutorials, and Q&A
Asset integration strategy:
- Link to relevant datasets, build example workflows, and cross-promote with dataset owners
Integration patterns and best practices
Recommended architecture: A quick view of the request flow through Ouro to your Modal service.
User Request → Ouro Platform → Your Modal API → ML Model → Response
Advanced features and asset integration
Custom asset workflows: Your ML API can automatically appear in Ouro's asset transformation recommendations. When users have compatible assets, your API will be suggested in the sidebar for data processing workflows.
With these steps, your Modal model can be published, discovered, and monetized on Ouro.