The Production Runtime for Agentic AI
Turn async Python functions into durable, auto-scaling microservices with a single decorator. GPU inference, auth, and observability included.
Deploy to Azure, AWS, or GCP. Scale to zero. Pay only when your code runs.
@cortex.azure(
name="analyst-agent",
compute="standard.gpu.sm", # GPU Provisioned
auth="bearer", # Security Handled
scale={"min": 0, "max": 5} # Scale to Zero
)
async def process(ctx, agent, data):
ctx.log.info("Reasoning...")
return await agent.run(data)Trusted by AI teams at
Deploy native workloads seamlessly on
Stop Managing Infrastructure. Start Shipping Agents.
DevOps overhead shouldn't slow down your AI development.
The Old Way
- Manage Dockerfile, k8s.yaml, and Helm charts.
- Manually configure JWT validation & API Keys.
- "It works on my machine" but fails in cloud.
- Stateless functions time out after 60s.
The CortexRun Way
- One Python Decorator handles everything.
- Built-in Auth (bearer, api_key, oauth2).
- Full Local Simulation (cortexrun dev).
- Durable Execution for long-running Agents.
Everything You Need. Nothing You Don't.
Production-grade features, configured in seconds.
The Protective Shield
CortexRun wraps your agent in a deployment layer that handles timeouts, routing, and recovery. You write the logic; we handle the physics.
Zero-Config Observability
Metrics, distributed tracing, and structured logging are injected into the ctx object automatically. No Datadog setup required.
Intelligent Auto-Scaling
Scale to zero to save costs when idle. Burst to max instances during traffic spikes. Configurable via scale={'min': 0, 'max': 10}.
GPU Access
Need T4 or A100 GPUs for inference? Just set compute='standard.gpu.sm'. We handle the drivers and provisioning.
Day 2 Operations
What Happens After Deploy
Deployment is just the beginning. CortexRun handles the 3am pages, the compliance audits, and the "who changed what" questions.
Instant Rollbacks
One command to revert. Every deployment is versioned. Roll back in under 30 seconds.
Blue/Green Deploys
Zero-downtime deployments with automatic traffic shifting and health checks.
Secret Management
Encrypted at rest, injected at runtime. Rotate without redeploying.
Real-time Metrics
Latency, throughput, error rates. Auto-exported to your observability stack.
Alerting Built-in
PagerDuty, Slack, webhooks. Configure thresholds, get notified.
Audit Logging
Every deploy, every config change, every access. Exportable for compliance.
From Zero to Production in Three Commands
No learning curve. Just shipping.
Init
Scaffold a new project with best practices baked in.
Develop
Run a full production simulation locally with hot reload.
Deploy
Ship to any cloud with a single command.
Compute Tiers for Every Workload
From webhooks to heavy ML inference. Pay only for what you use.
| Tier | Resources | Best For |
|---|---|---|
standard.cpu.xs | 0.25 vCPU / 256MB | Webhooks, simple logic |
standard.cpu.md | 1.0 vCPU / 1GB | Data processing, Orchestration |
standard.gpu.sm | NVIDIA T4 | Light Inference, Embeddings |
standard.gpu.lg | NVIDIA A100 | LLM Fine-tuning, Heavy Inference |
From the Community
Engineers Ship Faster with CortexRun
"We cut our infra team from 4 engineers to 1. CortexRun handles what used to take us weeks of Kubernetes wrestling."
"The GPU provisioning alone saved us $40k/month. No more paying for idle A100s. Scale to zero actually works."
"From 'it works on my laptop' to production in 20 minutes. My team ships features now instead of debugging YAML."
Enterprise Ready
Security and Compliance Built In
We've passed the security reviews at Fortune 500 companies. Your infosec team will thank you.
SOC2 Type II
Annual audits. Report available under NDA.
HIPAA Ready
BAA available for healthcare workloads.
99.99% SLA
Contractual uptime with financial credits.
SSO / SAML
Okta, Azure AD, Google Workspace.
Dedicated Infra
Isolated compute in your preferred region.
Priority Support
< 1 hour response. Dedicated Slack channel.
Simple Pricing
One Price. Everyone.
Pay per request. No tiers, no limits, no surprises. Scale to zero = pay zero.
Compute Pricing
standard.cpu.*standard.gpu.smstandard.gpu.lgBilled monthly. No minimum. No egress fees. Volume discounts available.
Support & Security Packages
Same compute rates. Choose your support level.
Community
For individuals and side projects
- Pay-per-request compute
- Community Discord
- Public documentation
- Standard regions
Pro Support
For teams that need faster answers
- Everything in Community
- < 4 hour response time
- Private Slack channel
- Architecture review (1x/quarter)
- Multi-region deployment
Enterprise
For compliance and security requirements
- Everything in Pro
- SSO / SAML
- SOC2 Type II report
- BAA / HIPAA
- 99.99% SLA
- Dedicated support engineer
- Custom contracts
Frequently Asked Questions
Everything you need to know before you ship.
No. You can switch clouds by changing @cortex.azure to @cortex.aws. Your logic code remains 100% standard Python. We believe in portability, not lock-in.
Use cortexrun dev to run a full production simulation locally, including auth checks and metric emission. What runs locally runs identically in the cloud.
Security is "Shift Left." Authentication is enforced at the edge before traffic ever reaches your function. All secrets are encrypted at rest and in transit.
Our intelligent pre-warming and optimized container images keep cold starts under 100ms for CPU workloads. GPU workloads use persistent instances for zero cold starts.
Pay only for compute time consumed. Scale-to-zero means you pay nothing when idle. No hidden fees, no egress charges. Transparent pricing per vCPU-second and GPU-second.
