T4 and A100 GPUs available on-demand

The Production Runtime for Agentic AI

Turn async Python functions into durable, auto-scaling microservices with a single decorator. GPU inference, auth, and observability included.

Deploy to Azure, AWS, or GCP. Scale to zero. Pay only when your code runs.

Docs

agent.py

@cortex.azure(
    name="analyst-agent",
    compute="standard.gpu.sm",  # GPU Provisioned
    auth="bearer",              # Security Handled
    scale={"min": 0, "max": 5}  # Scale to Zero
)
async def process(ctx, agent, data):
    ctx.log.info("Reasoning...")
    return await agent.run(data)

terminal

$ cortexrun deploy azure

◠Provisioning Azure Resources (West Europe)...

> Service analyst-agent active.

Endpoint: https://api.cortexrun.cloud/api/process

Trusted by AI teams at

AnthropicScale AIWeights & BiasesHugging FaceCohereRunway

2.4M+

Deployments

12,000+

Developers

99.99%

Uptime SLA

< 100ms

Cold Start

Deploy native workloads seamlessly on

Microsoft Azure

AWS

Google Cloud

Docker

Kubernetes

Stop Managing Infrastructure. Start Shipping Agents.

DevOps overhead shouldn't slow down your AI development.

The Old Way

Manage Dockerfile, k8s.yaml, and Helm charts.
Manually configure JWT validation & API Keys.
"It works on my machine" but fails in cloud.
Stateless functions time out after 60s.

The CortexRun Way

One Python Decorator handles everything.
Built-in Auth (bearer, api_key, oauth2).
Full Local Simulation (cortexrun dev).
Durable Execution for long-running Agents.

Everything You Need. Nothing You Don't.

Production-grade features, configured in seconds.

The Protective Shield

CortexRun wraps your agent in a deployment layer that handles timeouts, routing, and recovery. You write the logic; we handle the physics.

Zero-Config Observability

Metrics, distributed tracing, and structured logging are injected into the ctx object automatically. No Datadog setup required.

Intelligent Auto-Scaling

Scale to zero to save costs when idle. Burst to max instances during traffic spikes. Configurable via scale={'min': 0, 'max': 10}.

GPU Access

Need T4 or A100 GPUs for inference? Just set compute='standard.gpu.sm'. We handle the drivers and provisioning.

Day 2 Operations

What Happens After Deploy

Deployment is just the beginning. CortexRun handles the 3am pages, the compliance audits, and the "who changed what" questions.

Instant Rollbacks

One command to revert. Every deployment is versioned. Roll back in under 30 seconds.

$ cortexrun rollback --version=v2.3.1

Blue/Green Deploys

Zero-downtime deployments with automatic traffic shifting and health checks.

$ cortexrun deploy --strategy=blue-green

Secret Management

Encrypted at rest, injected at runtime. Rotate without redeploying.

$ cortexrun secrets set OPENAI_KEY=sk-...

Real-time Metrics

Latency, throughput, error rates. Auto-exported to your observability stack.

$ cortexrun metrics --tail

Alerting Built-in

PagerDuty, Slack, webhooks. Configure thresholds, get notified.

$ cortexrun alerts add --on=error_rate>1%

Audit Logging

Every deploy, every config change, every access. Exportable for compliance.

$ cortexrun audit --last=30d --export=csv

From Zero to Production in Three Commands

No learning curve. Just shipping.

Step 1

Init

Scaffold a new project with best practices baked in.

$ cortexrun init my-project

Step 2

Develop

Run a full production simulation locally with hot reload.

$ cortexrun dev

Step 3

Deploy

Ship to any cloud with a single command.

$ cortexrun deploy aws

Compute Tiers for Every Workload

From webhooks to heavy ML inference. Pay only for what you use.

Tier	Resources	Best For
`standard.cpu.xs`	0.25 vCPU / 256MB	Webhooks, simple logic
`standard.cpu.md`	1.0 vCPU / 1GB	Data processing, Orchestration
`standard.gpu.sm`	NVIDIA T4	Light Inference, Embeddings
`standard.gpu.lg`	NVIDIA A100	LLM Fine-tuning, Heavy Inference

From the Community

Engineers Ship Faster with CortexRun

"We cut our infra team from 4 engineers to 1. CortexRun handles what used to take us weeks of Kubernetes wrestling."

Sarah Chen

Head of Engineering, Series B AI Startup

"The GPU provisioning alone saved us $40k/month. No more paying for idle A100s. Scale to zero actually works."

Marcus Rodriguez

ML Platform Lead, Fortune 500 Tech

"From 'it works on my laptop' to production in 20 minutes. My team ships features now instead of debugging YAML."

David Park

CTO, YC W24

Enterprise Ready

Security and Compliance Built In

We've passed the security reviews at Fortune 500 companies. Your infosec team will thank you.

GDPR Compliant

SOC2 Type II

HIPAA Ready

ISO 27001

Request Security Documentation

SOC2 Type II

Annual audits. Report available under NDA.

HIPAA Ready

BAA available for healthcare workloads.

99.99% SLA

Contractual uptime with financial credits.

SSO / SAML

Okta, Azure AD, Google Workspace.

Dedicated Infra

Isolated compute in your preferred region.

Priority Support

< 1 hour response. Dedicated Slack channel.

Simple Pricing

One Price. Everyone.

Pay per request. No tiers, no limits, no surprises. Scale to zero = pay zero.

Compute Pricing

CPU

$0.001

per request

standard.cpu.*

GPU T4

$0.01

per request

standard.gpu.sm

GPU A100

$0.05

per request

standard.gpu.lg

Billed monthly. No minimum. No egress fees. Volume discounts available.

Support & Security Packages

Same compute rates. Choose your support level.

Community

For individuals and side projects

$0/month

Pay-per-request compute
Community Discord
Public documentation
Standard regions

Pro Support

For teams that need faster answers

$199/month

Everything in Community
< 4 hour response time
Private Slack channel
Architecture review (1x/quarter)
Multi-region deployment

Enterprise

For compliance and security requirements

Custom

Everything in Pro
SSO / SAML
SOC2 Type II report
BAA / HIPAA
99.99% SLA
Dedicated support engineer
Custom contracts

Frequently Asked Questions

Everything you need to know before you ship.

No. You can switch clouds by changing @cortex.azure to @cortex.aws. Your logic code remains 100% standard Python. We believe in portability, not lock-in.

Use cortexrun dev to run a full production simulation locally, including auth checks and metric emission. What runs locally runs identically in the cloud.

Security is "Shift Left." Authentication is enforced at the edge before traffic ever reaches your function. All secrets are encrypted at rest and in transit.

Our intelligent pre-warming and optimized container images keep cold starts under 100ms for CPU workloads. GPU workloads use persistent instances for zero cold starts.

Pay only for compute time consumed. Scale-to-zero means you pay nothing when idle. No hidden fees, no egress charges. Transparent pricing per vCPU-second and GPU-second.