How we run untrusted customer code at scale
7 hours ago
- #Unstrusted Code
- #Runtime Isolation
- #API Integrations
- Nango is a code-first platform for building product API integrations, handling over 150 million functions monthly across on-demand actions, long-running syncs, and bursty webhooks.
- Running untrusted customer code required strong isolation from internal systems, between tenants, and between executions to ensure security and fairness.
- Initial use of the in-process Node.js sandbox vm2 was abandoned in 2023 due to sandbox-escape vulnerabilities, shifting to a runner model for better isolation by separating execution into dedicated services.
- The runner model faced issues with resource fairness and observability, leading to a move to AWS Lambda in late 2025 for hardware-virtualized microVM isolation per execution and improved debugging.
- To handle Lambda's 15-minute time limit for long syncs, Nango implemented a 10-minute cap per run with checkpoints for resumable executions, enhancing reliability over long jobs.
- Tenant isolation on Lambda was achieved by pinning each customer's executions to their own Lambda functions, preventing cross-customer environment reuse and reducing security risks, though this increased cold starts to around 9%.
- Internal debates highlighted trade-offs: per-customer functions require warming mechanisms for cold starts, vs. building stronger sandboxes to eliminate the need for tenant isolation; incremental security improvements were prioritized.
- Comparison with other platforms (e.g., E2B, Fly, Modal) shows a trend toward hardware-level isolation (microVMs, gVisor) for untrusted code, while solutions like Cloudflare Workers' V8-isolate were deemed unsuitable for Nango's npm packages.
- Key lessons include distrusting in-process sandboxes as true isolation, aligning security boundaries with product needs, adapting runtimes to workloads with clear workarounds, and preferring resumable, short jobs over long-running ones for resilience.
- Future steps aim for tighter isolation per code function within Lambda, and Nango remains open-source and hiring for roles focused on secure code execution at scale.