Deploy your machine learning model to Modal and add it to Ouro for others to use
Last updated May 5, 2026
12 minute readThis guide demonstrates how to deploy any type of ML model to Modal's serverless platform and create APIs ready for Ouro integration. Modal provides serverless GPU infrastructure that automatically scales from zero to thousands of containers, making it ideal to host ML models for sharing on Ouro.
Add your model to Ouro for a user-friendly interface, rate limiting, monetization, and integration with datasets and files across the Ouro platform.
Before starting, you'll need a Modal account and basic Python development environment.
Install Modal CLI:
pip install modal
modal setupModal uses GitHub for authentication. Run modal setup and follow the prompts to connect your GitHub account.
Prepare a container image, expose FastAPI endpoints (with OpenAPI), add simple auth for Ouro, deploy with modal deploy, then monitor. Modal handles infra and scaling; you focus on code and functionality.
Modal uses a declarative approach to define container environments. Create a base image with your model's dependencies:
On this page
import modal
# Define container image with ML dependencies
ml_image = (
modal.Image.debian_slim(python_version="3.12")
.pip_install(
"fastapi[standard]==0.115.4",
"torch>=2.0.0",
"torchvision",
"transformers>=4.30.0",
"scikit-learn>=1.3.0",
"tensorflow>=2.13.0",
"numpy>=1.24.0",
"pandas>=2.0.0",
"pillow>=10.0.0",
"pydantic>=2.0.0"
)
.apt_install("git", "wget", "curl")
)
app = modal.App("ml-model-api", image=ml_image)Modal caches images, so repeated deploys with the same deps are fast. If you have model artifacts, you can download them once to a persistent volume and mount it to your container.
Modal's guide is a great resource for a more detailed example.
FastAPI generates an OpenAPI spec that Ouro can consume.
https://your-modal-app.modal.run/openapi.jsonhttps://your-modal-app.modal.run/docshttps://your-modal-app.modal.run/redocWe recommend copying the JSON from the /openapi.json endpoint into a .json file for Ouro integration.
In order to prevent others from using your API without permission, you can implement header-based authentication that Ouro can verify. You may not need this if you don't care about who uses your API or are not monetizing it.
In the service setting, make sure to set authentication to Ouro and add the secret as OURO_SERVICE_SECRET.
Implement simple header-based authentication that Ouro can verify:
PyTorch models often benefit from GPU resources and proper state management. Here's a production-ready pattern:
Key concepts: stateful model load, GPU detection, volume cache, structured errors.
Development workflow: Run a local dev server, then send a test request against your Ouro-authenticated route.
# Start development server with live reload
modal serve app.pyYou can test your endpoints with tools like Postman, curl, or add a private version of your API to Ouro and test it there.
Production deployment: Deploy to production and use logs/stats to monitor behavior and costs.
# Deploy to production
modal deploy app.py
# View logs
modal logs app.py
# Monitor function performance
modal stats app.pySecrets management: Store your Ouro credentials and other tokens as Modal secrets.
# Create secrets for Ouro integration
modal secret create ouro-api-credentials \
OURO_API_KEY=your-ouro-key \
OURO_SERVICE_SECRET=your-ouro-secretOpenAPI is auto-generated; use the JSON at /openapi.json when integrating with Ouro.
Copy the object at this endpoint and paste it into an openapi.json file.
You'll be able to upload this file directly to Ouro to import your API.
Use Modal dashboards for metrics, logs, resource usage, and cost by function. Tune resources and scaling for savings:
# Optimize for cost-effectiveness
@app.function(
# Choose appropriate GPU for workload
gpu="T4", # Most cost-effective for small models
# gpu="A100", # For large models requiring more VRAM
# Optimize scaling parameters
min_containers=0, # Scale to zero when idle
scaledown_window=60, # Quick scale-down for cost savings
max_containers=10, # Limit maximum spend
# Optimize resource allocation
memory=2048, # Only allocate needed memory
cpu=2 # Adjust CPU cores based on workload
)Publish your Modal-hosted model on Ouro to monetize per request. Ouro helps make your model discoverable, usable, and monetizable.
Ouro is a collaborative marketplace for APIs, alongside datasets and files. Its asset-linking system connects your API to relevant resources for discovery and workflows.
Modify your Modal function to use FastAPI with comprehensive OpenAPI documentation. Here's the recommended pattern:
Deploy this to Modal and note your endpoint URL (e.g., https://your-org--ml-model-ouro-fastapi-app.modal.run).
Ouro provides utilities to enhance your OpenAPI specification with custom metadata.
Use the ouro-py package to add Ouro-specific extensions:
The ouro_field decorator adds custom OpenAPI extensions that enable:
For file inputs, use:
input_filter for a broad category like audio, video, or imageinput_file_extensions for one or more exact extensions such as ["xy", "xye"]Ouro still accepts older single-input fields like x-ouro-input-asset-type, but keyed x-ouro-input-assets and x-ouro-output-assets are recommended for new specs. See the route input and output assets guide for multiple input/output examples.
Access your enhanced OpenAPI spec at https://your-modal-endpoint/openapi.json.
Create your Ouro account and configure API access:
modal secret create ouro-credentials OURO_API_KEY=your-api-key-hereFor enhanced security, implement domain whitelisting in your Modal function to restrict access to Ouro's infrastructure:
Navigate to Ouro's service creation interface:
Authorization: Basic <token> header (as shown above).Ouro automatically extracts all endpoint request and response schemas from your OpenAPI specification. You'll still be able to modify all configuration after importing your API using the UI.
Anyone can start monetizing their APIs on Ouro. Configure pricing on each route by editing the route:
Now, anytime someone uses your API, they'll be charged and you'll receive a Bitcoin payment to your wallet.
Adding your API is just the beginning. Maximize your API's success through Ouro's collaborative features:
Join relevant teams:
Create supporting content:
Asset integration strategy:
Recommended architecture: A quick view of the request flow through Ouro to your Modal service.
User Request → Ouro Platform → Your Modal API → ML Model → ResponseCustom asset workflows: Your ML API can automatically appear in Ouro's asset transformation recommendations. When users have compatible assets, your API will be suggested in the sidebar for data processing workflows.
With these steps, your Modal model can be published, discovered, and monetized on Ouro.
from fastapi import FastAPI, Header, HTTPException, Depends, Request
import os
import time
import logging
# Authentication utilities
async def validate_ouro_authentication(
authorization: str = Header(None, alias="Authorization"),
request: Request = None,
):
"""Validate Ouro platform authentication using `Authorization: Basic <token>` header"""
# Required header: Authorization: Basic <token>
if not authorization or not authorization.lower().startswith("basic "):
client_host = getattr(getattr(request, "client", None), "host", "unknown")
logging.warning(f"Missing or invalid Authorization header from {client_host}")
raise HTTPException(status_code=401, detail="Missing Ouro auth header")
token = authorization.split(" ", 1)[1].strip()
# Validate against stored token
expected_secret = os.environ.get("OURO_SERVICE_SECRET")
if not expected_secret:
logging.error("Missing env var OURO_SERVICE_SECRET for Ouro authentication")
raise HTTPException(status_code=500, detail="Server configuration error")
if token != expected_secret:
logging.warning("Invalid Ouro token")
raise HTTPException(status_code=401, detail="Invalid Ouro credentials")
# Log successful authentication
logging.info("Successful Ouro authentication")
return {"platform": "ouro", "authenticated_at": time.time()}
# Apply authentication to endpoints
@web_app.post("/inference/ouro")
async def inference_for_ouro(
request: PredictionInput,
auth: dict = Depends(validate_ouro_authentication)
):
"""ML inference endpoint specifically for Ouro integration"""
try:
# Your ML processing logic
result = await process_ml_inference(request)
return {
**result,
"authenticated_via": auth["platform"],
"request_id": f"ouro_{int(time.time())}"
}
except Exception as e:
logging.error(f"Ouro inference error: {str(e)}")
raise HTTPException(status_code=500, detail="Inference processing failed")from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from typing import List, Optional
import torch
import torch.nn.functional as F
import logging
# Request/Response schemas for OpenAPI generation
class PredictionInput(BaseModel):
features: List[float] = Field(..., description="Input feature vector")
model_version: Optional[str] = Field("v1.0", description="Model version to use")
class Config:
json_schema_extra = {
"example": {
"features": [1.0, 2.0, 3.0, 4.0],
"model_version": "v1.0"
}
}
class PredictionOutput(BaseModel):
prediction: float = Field(..., description="Model prediction")
confidence: float = Field(..., description="Prediction confidence score")
model_version: str = Field(..., description="Model version used")
inference_time_ms: float = Field(..., description="Inference time in milliseconds")
@app.cls(
image=ml_image,
gpu="T4", # Use T4 GPU for cost-effective inference
memory=4096, # 4GB RAM
volumes={"/cache": modal.Volume.from_name("model-cache", create_if_missing=True)},
secrets=[modal.Secret.from_name("ml-model-secrets")]
)
class PyTorchModelService:
def __enter__(self):
"""Load model once when container starts"""
import time
start_time = time.time()
# Load model from cache or download
model_path = "/cache/pytorch_model.pt"
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
try:
self.model = torch.load(model_path, map_location=self.device)
except FileNotFoundError:
# Download model if not cached
self.model = self._download_and_cache_model(model_path)
self.model.eval()
self.model_version = "v1.0"
load_time = time.time() - start_time
logging.info(f"Model loaded in {load_time:.2f}s on {self.device}")
def _download_and_cache_model(self, cache_path: str):
"""Download model and save to cache"""
# Implementation depends on your model source
model = torch.nn.Linear(4, 1) # Example model
torch.save(model, cache_path)
return model
@modal.method()
def predict(self, input_data: PredictionInput) -> PredictionOutput:
import time
start_time = time.time()
try:
# Prepare input tensor
features = torch.FloatTensor([input_data.features]).to(self.device)
# Perform inference
with torch.no_grad():
output = self.model(features)
prediction = output.item()
# Calculate confidence using sigmoid
confidence = torch.sigmoid(output).item()
inference_time = (time.time() - start_time) * 1000 # Convert to ms
return PredictionOutput(
prediction=prediction,
confidence=confidence,
model_version=self.model_version,
inference_time_ms=inference_time
)
except Exception as e:
logging.error(f"Prediction error: {str(e)}")
raise HTTPException(status_code=500, detail=f"Inference failed: {str(e)}")
@modal.fastapi_app()
def create_fastapi_app(self):
web_app = FastAPI(
title="PyTorch ML Model API",
description="Production PyTorch model serving with Modal",
version="1.0.0",
docs_url="/docs",
openapi_url="/openapi.json"
)
@web_app.post("/predict", response_model=PredictionOutput)
async def predict_endpoint(request: PredictionInput):
return self.predict(request)
@web_app.get("/health")
async def health_check():
return {"status": "healthy", "model_loaded": True, "device": str(self.device)}
return web_appimport modal
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional, List
app = modal.App("ml-model-ouro")
web_app = FastAPI(
title="Your ML Model API",
description="Detailed description of your model's capabilities, use cases, and limitations",
version="1.0.0",
contact={
"name": "Your Name",
"email": "[email protected]"
}
)
class PredictionRequest(BaseModel):
input_data: List[float]
model_parameters: Optional[dict] = None
class Config:
schema_extra = {
"example": {
"input_data": [1.0, 2.5, 3.2],
"model_parameters": {"temperature": 0.7}
}
}
class PredictionResponse(BaseModel):
prediction: float
confidence: float
model_version: str
processing_time_ms: float
@web_app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
"""
Generate predictions using the ML model.
This endpoint processes input data through our trained model
and returns predictions with confidence scores.
"""
try:
# Your model inference logic here
result = your_model_function(request.input_data)
return PredictionResponse(
prediction=result["prediction"],
confidence=result["confidence"],
model_version="1.0.0",
processing_time_ms=result["processing_time"]
)
except Exception as e:
raise HTTPException(status_code=400, detail=f"Prediction failed: {str(e)}")
@app.function(
image=modal.Image.debian_slim().pip_install(["fastapi", "your-ml-dependencies"]),
container_idle_timeout=300
)
@modal.asgi_app()
def fastapi_app():
return web_appfrom fastapi import FastAPI, Header
from fastapi.openapi.utils import get_openapi
from ouro.utils import get_custom_openapi, ouro_field
from typing import Optional
web_app = FastAPI(
title="Your ML Model API",
description="Production ML model serving with Ouro integration",
version="1.0.0"
)
# Apply Ouro's custom OpenAPI enhancements
web_app.openapi = get_custom_openapi(web_app, get_openapi)
@web_app.post("/predict")
@ouro_field(
"x-ouro-input-assets",
{
"input_file": {
"asset_type": "file",
"input_file_extensions": ["xy", "xye"],
}
},
)
@ouro_field(
"x-ouro-output-assets",
{
"prediction_file": {
"asset_type": "file",
"file_extensions": ["json"],
"primary": True,
}
},
)
async def predict(
request: PredictionRequest,
# Ouro passes these headers to identify context
ouro_route_id: Optional[str] = Header(None, alias="ouro-route-id"),
ouro_route_org_id: Optional[str] = Header(None, alias="ouro-route-org-id"),
ouro_route_team_id: Optional[str] = Header(None, alias="ouro-route-team-id"),
ouro_action_id: Optional[str] = Header(None, alias="ouro-action-id"),
):
"""
Process input and generate predictions.
The Ouro headers provide context about:
- ouro_route_id: The service route being accessed
- ouro_route_org_id: Organization ID for asset creation context
- ouro_route_team_id: Team ID for asset creation context
- ouro_action_id: Unique action ID for logging and tracking
"""
from ouro import Ouro
import os
# Initialize Ouro SDK with your API key
ouro = Ouro(api_key=os.environ["OURO_API_KEY"])
# Log progress to Ouro's action system
if ouro_action_id:
ouro.client.post(
f"/actions/{ouro_action_id}/log",
json={
"message": "Starting model inference...",
"asset_id": ouro_route_id,
"level": "info",
}
)
# Your model inference logic here
result = your_model_function(request)
# Return result with Ouro metadata for asset creation
return {
"prediction": result,
"prediction_file": { # This creates a file asset in Ouro
"name": f"Prediction for {request.input_name}",
"description": "Model prediction output",
"filename": "prediction.json",
"type": "application/json",
"extension": "json",
"base64": base64_encode(result),
"org_id": ouro_route_org_id, # Use Ouro context
"team_id": ouro_route_team_id,
}
}from fastapi import Request, HTTPException
ALLOWED_DOMAINS = ["api.ouro.foundation", "ouro.foundation"]
@web_app.middleware("http")
async def verify_origin(request: Request, call_next):
origin = request.headers.get("origin", "")
referer = request.headers.get("referer", "")
# Allow direct API testing and Ouro platform access
if any(domain in origin or domain in referer for domain in ALLOWED_DOMAINS):
response = await call_next(request)
return response
# Allow requests without origin (direct API calls)
if not origin and not referer:
response = await call_next(request)
return response
raise HTTPException(status_code=403, detail="Access denied")