T4 and A100 GPUs available on-demand

The Production Runtime for Agentic AI

Turn async Python functions into durable, auto-scaling microservices with a single decorator. GPU inference, auth, and observability included.

Deploy to Azure, AWS, or GCP. Scale to zero. Pay only when your code runs.

Docs
agent.py
@cortex.azure(
    name="analyst-agent",
    compute="standard.gpu.sm",  # GPU Provisioned
    auth="bearer",              # Security Handled
    scale={"min": 0, "max": 5}  # Scale to Zero
)
async def process(ctx, agent, data):
    ctx.log.info("Reasoning...")
    return await agent.run(data)
terminal
$ cortexrun deploy azure
Provisioning Azure Resources (West Europe)...
> Service analyst-agent active.
Endpoint: https://api.cortexrun.cloud/api/process

Trusted by AI teams at

AnthropicScale AIWeights & BiasesHugging FaceCohereRunway
2.4M+
Deployments
12,000+
Developers
99.99%
Uptime SLA
< 100ms
Cold Start

Deploy native workloads seamlessly on

Microsoft Azure
AWS
Google Cloud
Docker
Kubernetes

Stop Managing Infrastructure. Start Shipping Agents.

DevOps overhead shouldn't slow down your AI development.

The Old Way

  • Manage Dockerfile, k8s.yaml, and Helm charts.
  • Manually configure JWT validation & API Keys.
  • "It works on my machine" but fails in cloud.
  • Stateless functions time out after 60s.

The CortexRun Way

  • One Python Decorator handles everything.
  • Built-in Auth (bearer, api_key, oauth2).
  • Full Local Simulation (cortexrun dev).
  • Durable Execution for long-running Agents.

Everything You Need. Nothing You Don't.

Production-grade features, configured in seconds.

The Protective Shield

CortexRun wraps your agent in a deployment layer that handles timeouts, routing, and recovery. You write the logic; we handle the physics.

Zero-Config Observability

Metrics, distributed tracing, and structured logging are injected into the ctx object automatically. No Datadog setup required.

Intelligent Auto-Scaling

Scale to zero to save costs when idle. Burst to max instances during traffic spikes. Configurable via scale={'min': 0, 'max': 10}.

GPU Access

Need T4 or A100 GPUs for inference? Just set compute='standard.gpu.sm'. We handle the drivers and provisioning.

Day 2 Operations

What Happens After Deploy

Deployment is just the beginning. CortexRun handles the 3am pages, the compliance audits, and the "who changed what" questions.

Instant Rollbacks

One command to revert. Every deployment is versioned. Roll back in under 30 seconds.

$ cortexrun rollback --version=v2.3.1

Blue/Green Deploys

Zero-downtime deployments with automatic traffic shifting and health checks.

$ cortexrun deploy --strategy=blue-green

Secret Management

Encrypted at rest, injected at runtime. Rotate without redeploying.

$ cortexrun secrets set OPENAI_KEY=sk-...

Real-time Metrics

Latency, throughput, error rates. Auto-exported to your observability stack.

$ cortexrun metrics --tail

Alerting Built-in

PagerDuty, Slack, webhooks. Configure thresholds, get notified.

$ cortexrun alerts add --on=error_rate>1%

Audit Logging

Every deploy, every config change, every access. Exportable for compliance.

$ cortexrun audit --last=30d --export=csv

From Zero to Production in Three Commands

No learning curve. Just shipping.

Step 1

Init

Scaffold a new project with best practices baked in.

$ cortexrun init my-project
Step 2

Develop

Run a full production simulation locally with hot reload.

$ cortexrun dev
Step 3

Deploy

Ship to any cloud with a single command.

$ cortexrun deploy aws

Compute Tiers for Every Workload

From webhooks to heavy ML inference. Pay only for what you use.

TierResourcesBest For
standard.cpu.xs
0.25 vCPU / 256MBWebhooks, simple logic
standard.cpu.md
1.0 vCPU / 1GBData processing, Orchestration
standard.gpu.sm
NVIDIA T4Light Inference, Embeddings
standard.gpu.lg
NVIDIA A100LLM Fine-tuning, Heavy Inference

From the Community

Engineers Ship Faster with CortexRun

"We cut our infra team from 4 engineers to 1. CortexRun handles what used to take us weeks of Kubernetes wrestling."

SC
Sarah Chen
Head of Engineering, Series B AI Startup

"The GPU provisioning alone saved us $40k/month. No more paying for idle A100s. Scale to zero actually works."

MR
Marcus Rodriguez
ML Platform Lead, Fortune 500 Tech

"From 'it works on my laptop' to production in 20 minutes. My team ships features now instead of debugging YAML."

DP
David Park
CTO, YC W24

Enterprise Ready

Security and Compliance Built In

We've passed the security reviews at Fortune 500 companies. Your infosec team will thank you.

GDPR Compliant
SOC2 Type II
HIPAA Ready
ISO 27001
Request Security Documentation

SOC2 Type II

Annual audits. Report available under NDA.

HIPAA Ready

BAA available for healthcare workloads.

99.99% SLA

Contractual uptime with financial credits.

SSO / SAML

Okta, Azure AD, Google Workspace.

Dedicated Infra

Isolated compute in your preferred region.

Priority Support

< 1 hour response. Dedicated Slack channel.

Simple Pricing

One Price. Everyone.

Pay per request. No tiers, no limits, no surprises. Scale to zero = pay zero.

Compute Pricing

CPU
$0.001
per request
standard.cpu.*
GPU T4
$0.01
per request
standard.gpu.sm
GPU A100
$0.05
per request
standard.gpu.lg

Billed monthly. No minimum. No egress fees. Volume discounts available.

Support & Security Packages

Same compute rates. Choose your support level.

Community

For individuals and side projects

$0/month
  • Pay-per-request compute
  • Community Discord
  • Public documentation
  • Standard regions
Most Popular

Pro Support

For teams that need faster answers

$199/month
  • Everything in Community
  • < 4 hour response time
  • Private Slack channel
  • Architecture review (1x/quarter)
  • Multi-region deployment

Enterprise

For compliance and security requirements

Custom
  • Everything in Pro
  • SSO / SAML
  • SOC2 Type II report
  • BAA / HIPAA
  • 99.99% SLA
  • Dedicated support engineer
  • Custom contracts

Frequently Asked Questions

Everything you need to know before you ship.

No. You can switch clouds by changing @cortex.azure to @cortex.aws. Your logic code remains 100% standard Python. We believe in portability, not lock-in.

Use cortexrun dev to run a full production simulation locally, including auth checks and metric emission. What runs locally runs identically in the cloud.

Security is "Shift Left." Authentication is enforced at the edge before traffic ever reaches your function. All secrets are encrypted at rest and in transit.

Our intelligent pre-warming and optimized container images keep cold starts under 100ms for CPU workloads. GPU workloads use persistent instances for zero cold starts.

Pay only for compute time consumed. Scale-to-zero means you pay nothing when idle. No hidden fees, no egress charges. Transparent pricing per vCPU-second and GPU-second.