Enterprise-grade file uploads, part 2: serverless at the front door

Second article in a series on what "enterprise-grade" actually means for user file uploads.

In part 1 we moved file bytes off the API entirely — the client uploads directly to S3/GCS, and the API becomes a permission-granting service. What it actually does for an upload request now looks like this:

  1. Authenticate the caller.
  2. Check quotas and policy.
  3. Write a pending row to the database.
  4. Sign a URL with the right constraints.
  5. Return the URL.

Total work: a database round trip, a HMAC, a JSON response. Sub-100 ms of CPU. Stateless. Idempotent in any reasonable design.

This is the textbook shape of a serverless workload. And once your API looks like this, the question stops being "should we run a fleet of VMs" and starts being "why are we?"

This article makes the case for putting the upload API on Lambda or Cloud Run, with current 2026 pricing, real scaling numbers, and the cases where you'd choose otherwise.

What "serverless" actually means here

For the purposes of this discussion: the unit of deployment is a request-handling container or function, and the platform schedules, scales, and bills it per request. AWS Lambda, Google Cloud Run, Azure Container Apps, Cloudflare Workers, Fly Machines — different shapes, same property.

Concretely you stop owning:

  • The OS image and its patch cadence
  • Autoscaling group config (target CPU, min/max instances, cooldown)
  • The load balancer in front of the app
  • Capacity planning ("how big should this instance be?")
  • The 3 AM page when the instance fills its disk

You start owning:

  • A container image (or a code bundle for native runtimes)
  • A request handler
  • A deployment artifact and its rollout policy

That's the entire mental model.

The economics of the upload API

Concrete pricing as of May 2026:

AWS Lambda + API Gateway

  • Lambda: $0.20 per million requests, $0.0000166667 per GB-second compute
  • API Gateway HTTP API: $1.00 per million requests (REST API: $3.50)
  • Effective per-request cost: ~$1.20 per million (HTTP API)
  • Free tier: 1M Lambda requests + 400k GB-s/month forever

Google Cloud Run

  • $0.40 per million requests (HTTP entry point included)
  • $0.000024 per vCPU-second + $0.0000025 per GiB-second (request-billed)
  • $0.000018 per vCPU-second + $0.000002 per GiB-second (instance-billed)
  • Free tier: 2M requests + 360k vCPU-s + 180k GiB-s/month forever

Let's price a real scenario: 10 million signed-URL requests per month, each taking 50 ms of CPU at 256 MB / 0.25 vCPU.

OptionSetupMonthly cost
Always-on VM (t3.small, 2 vCPU, 2 GB, 24/7)classic~$15 + ALB ~$22 + bandwidth + ops time = ~$40-60
Lambda + API Gateway HTTP256 MB, 50 ms avg~$2 (Lambda) + ~$10 (API GW) = ~$12
Cloud Run0.25 vCPU, 256 MB, request-billed, concurrency=80~$3-4
Cloud Run with min-instances=1 (no cold starts)same~$15-20

At 10 M requests/month, Cloud Run with request billing is roughly 1/10 the cost of a sized-for-peak VM. The VM has to be big enough to handle the peak hour; Cloud Run scales the resources to the request and shrinks back to zero between requests.

The same math works at 100 M/month: Cloud Run climbs to maybe $40-80; the VM doesn't get 10× cheaper because of peak provisioning — you'd need to scale to a fleet (4-8 instances behind a load balancer) and you're at $200-400/mo + the team's time keeping it healthy.

The crossover where "always-on VM is actually cheaper" exists, and it's around sustained 1,000+ RPS, 24 hours a day, with low CPU variance — i.e., uniformly busy. Upload APIs are almost never that. They have day/night cycles, weekly cycles, marketing-launch spikes, end-of-quarter spikes. The price of always-on is paying for the peak you might not hit.

Why this workload fits serverless

A few properties of the upload API line up with what serverless does well:

Burstiness. Marketing emails go out at 9 AM and 200 users upload KYC documents in the next 5 minutes. End-of-quarter, every customer logs in to download statements. With a VM fleet you either over-provision (pay 80% for capacity you use 5% of the time) or you under-provision and brown out during the spike. Serverless platforms can absorb 0 → 1,000 concurrent in seconds without human intervention.

Statelessness. Every signed-URL request is independent. There's no in-memory cache that matters, no warm connection pool that matters (Lambda has trouble here, Cloud Run handles it via per-container concurrency — more below). The DB connection pool is the only "warm" thing you care about, and it lives one hop away on the same VPC.

Small per-request CPU. HMAC plus a SQL INSERT. Tens of milliseconds. Lambda and Cloud Run bill in 1 ms increments. You pay for what you used and not a millisecond more.

Idle periods. Many products have hours per day with near-zero upload traffic. Scale-to-zero turns those hours into $0 — you literally pay nothing while idle, including the load balancer (Cloud Run's HTTP endpoint is free; Lambda's URL is free, only API Gateway has a per-request component).

Independent failure domains. A bad deploy of the upload API doesn't take down the user API, the auth service, or the worker pipeline. Each is its own function/service.

Lambda vs Cloud Run for this use case

Both work. The honest comparison for an upload-permission API:

Concurrency model — the big difference

Lambda dispatches one request per execution environment. Ten concurrent requests = ten Lambda containers, each with its own connection to the DB. Connection-pool exhaustion is the single most common production failure mode for Lambda + RDS. Mitigations exist (RDS Proxy, lazy connection), but the model fundamentally fights you.

Cloud Run lets one container handle up to 80 concurrent requests (and 1,000 with the newer tier). For an I/O-bound workload — which signed-URL issuance entirely is — this means one Cloud Run instance can do the work of 80 Lambda instances, with one shared DB connection pool. The cost difference is real and significant, often 10-50× cheaper for the same throughput at the same latency.

Cold starts in 2026

The shape of cold starts has changed materially since the early Lambda days:

  • Go binaries on either platform: 50-200 ms cold start. Imperceptible to most users.
  • Python on Lambda: 200-500 ms for typical apps, worse if you import a lot of SDKs at module load.
  • Java on Lambda with SnapStart: ~150 ms (down from 2-5 s pre-SnapStart). SnapStart is free as of 2025 — no excuse not to use it for Java/Kotlin.
  • Node.js on either: 100-300 ms typical.
  • Container images on Lambda: similar to zip if the image is small; can balloon to 1-2 s for fat images.

For the i-filer-api use case (Go, signed URL + DB write), the cold start is well under the user's perception threshold even for the first request after idle. SLO-driven services with sub-100 ms p99 requirements need mitigation; almost nothing else does.

Cold-start mitigations, ranked by cost

  1. Choose a fast runtime. Go, Rust, Node ESM. Free.
  2. Cloud Run min-instances=1. Keeps one warm instance always running. ~$10-20/month for a small service. The right default for any user-facing API.
  3. Lambda SnapStart (Java/.NET). Free, ~90% cold-start reduction.
  4. Lambda Provisioned Concurrency. Reliable, expensive. Real-world case: 5 instances of provisioned concurrency pushed a Lambda's monthly cost from $400 to $2,100. Use only when you genuinely need sub-50 ms p99 on a low-traffic service.

The honest take: cold starts are overblown for properly-engineered Go/JS/Python services on Cloud Run with min-instances=1. They're a real problem if you're running a large JVM app on Lambda without SnapStart — and that's a code/runtime choice you can usually fix.

When Lambda wins anyway

  • Deep AWS integration (S3 event triggers, DynamoDB streams, Kinesis, SQS, EventBridge directly invoking the function). You can do this on Cloud Run too with Pub/Sub and EventArc, but if you're already deep in AWS, Lambda is the simpler shape.
  • Function-as-a-Service code style (single file, single handler). Cloud Run wants a container, which is more setup though arguably more portable.
  • Larger free tier in some narrow cases (especially with the $0 INIT phase historically — though Lambda started billing INIT in August 2025, narrowing this gap).

When Cloud Run wins anyway

  • The workload is HTTP/REST/gRPC and you want first-class HTTP semantics, not "Lambda invocation events translated to HTTP via API Gateway."
  • You want concurrency per instance (i.e., almost any non-CPU-bound workload).
  • You like containers (portable, locally testable, same artifact dev → prod).
  • You want CPU-priced billing instead of memory-priced billing; matters for memory-light workloads.
  • You're already on GCP.

For an upload-permission API specifically: Cloud Run is the better default because of per-container concurrency and HTTP-native pricing. Lambda is a fine choice if you're 100% AWS.

Scaling, in numbers

What "auto-scales" actually means at the platform level:

Lambda

  • 1,000 new execution environments per 10 seconds per region per function (after initial burst capacity)
  • 10,000 concurrent executions soft default per region, raisable to hundreds of thousands
  • No queueing — overflow returns 429s (or buffers if invoked async via SQS/SNS)
  • "Warm" environment lifecycle: ~5-15 minutes idle before reclamation

Cloud Run

  • Scale targets 60% CPU utilization with 80 concurrent requests per instance (defaults)
  • Default max instances: 100; soft limit raisable to 1,000+
  • Built-in request queue with backpressure rather than immediate 429s
  • "Warm" container lifecycle: similar timing, configurable

In both cases, you go from 0 instances to thousands without paging anyone. The capacity planning meeting goes away. The "we need to scale up before the campaign launch" Slack message goes away. The "the autoscaling group didn't react fast enough" post-mortem goes away. Real operational wins.

Operational benefits (the part the cost calculator doesn't show)

The dollars-per-request math sells serverless. The reasons teams stay on serverless are operational:

  • No OS to patch. You don't have to know there was a CVE in libssl this week.
  • No "instance is unhealthy". The platform reaps and replaces unhealthy instances; you ship a health check, the platform polls it.
  • No deploy choreography. No "drain the LB, roll instances, watch the canary, finalize." gcloud run deploy or aws lambda update-function-code and traffic shifts atomically (with traffic splits if you want a canary).
  • Logs are centralized by default. Cloud Run → Cloud Logging, Lambda → CloudWatch. You don't run an agent on each instance.
  • Compliance surface is the platform's. SOC 2 / ISO 27001 / HIPAA: AWS and GCP carry the runtime; you carry the code. You inherit a lot of audit evidence for free.
  • Zero "snowflake" servers. No "wait, why is prod-3 different from prod-1?" — there's no prod-3.
  • Disaster recovery is easier. Redeploy the image in a different region; the platform takes care of the rest. Stateful infra is the hard part of DR, and you've already moved state to S3 and a managed DB.

Add up the engineering hours you don't spend on this, and the salary cost dwarfs the compute bill either way.

Patterns that work well on serverless

The upload-permission API. What this whole article is about. Auth → quota → sign → return. Perfect fit.

Webhook receivers. Stripe / Plaid / Twilio webhooks. Spiky, idempotent, small. Receive, validate signature, enqueue, return 200. Serverless eats this for breakfast.

Async workers behind a queue. Pub/Sub or SQS → Cloud Run/Lambda. The image resize worker in this codebase is exactly this shape. Each message → one invocation → one resize. The platform handles concurrency and retry.

Scheduled jobs. Cron-style work — nightly reports, daily cleanup, KYC re-checks. Cloud Run Jobs or EventBridge → Lambda. You pay for the minutes the job runs and nothing else.

API endpoints with bursty traffic and tight cost discipline. Marketing-funnel APIs, OAuth callbacks, signup flows. The "you must answer in 100 ms or the user goes away" surface.

Patterns that don't fit (or fit poorly)

Being honest: serverless is not the answer to everything.

  • High-RPS sustained workloads. A service doing 5,000 RPS 24/7 with low variance is cheaper on VMs or Kubernetes. The break-even is somewhere in the 1,000-5,000 sustained RPS range depending on memory.
  • Long-running work. Lambda caps at 15 minutes. Cloud Run goes to 60. Video transcoding of a feature film does not fit. Use Cloud Run Jobs, ECS, or batch services for this.
  • WebSockets / persistent connections. Lambda has WebSocket API support via API Gateway, but it's awkward and bills per minute of connection. Cloud Run supports WebSockets natively but the request-billed model still wants connections to be relatively short. For chat / live data, use App Runner / GKE / Fly.
  • GPU-heavy ML inference. Lambda doesn't do GPUs. Cloud Run has GPU support but capacity is constrained. Most teams use SageMaker, Vertex AI, or Modal/Replicate for this.
  • Strict sub-50 ms p99 latency. Provisioned concurrency mitigates but doesn't eliminate variance. Latency-critical services (HFT, ad bidding) belong on long-lived instances.
  • Stateful in-process work. In-memory caches that take 30 seconds to warm up. Containers that hold open hundreds of upstream connections. Anything where the per-request cost is dominated by warm-up.
  • Compliance regimes that forbid multi-tenancy. Some FedRAMP High / specific government workloads require dedicated tenancy. Lambda and Cloud Run run on shared infrastructure.

A useful rule: if your service is a small, stateless transformation between an HTTP request and a database/storage operation, it belongs on serverless. If it owns long-running state, persistent connections, or sustained CPU, it doesn't.

Concrete migration shape for the upload API

For the i-filer-api specifically, the migration path looks like:

  1. The upload-permission HTTP API → Cloud Run. Same Echo handlers, same Go code, just deployed as a container image. The current podman-on-VM setup actually maps almost 1:1 — Cloud Run runs containers too.
  2. The image-resize worker → Cloud Run Jobs or pub/sub-triggered Cloud Run. Already async, already idempotent. Pub/Sub → Cloud Run = native fit.
  3. The database connection stays on the same managed Postgres (Cloud SQL or a self-hosted instance in the VPC). Cloud Run connects via the VPC connector.
  4. The webhook endpoints from Stripe / Kratos / etc. → Cloud Run. Already small, already stateless, already burst-prone.

What stays on long-lived infrastructure:

  • Postgres
  • Object storage (already there)
  • Any long-running daemons (the EVM relayer, if it holds persistent RPC connections)
  • Anything that needs sub-50 ms p99 (not the upload API)

The cost change going from one always-on VM running multiple services (the current alpha server pattern) to per-service Cloud Run is usually a wash until you grow — at which point the VM has to be doubled in size and the Cloud Run services scale linearly with traffic. Cost grows with revenue, not in step changes.

The hidden cost: discipline

The dollars are obvious. The discipline isn't:

  • Code that takes 5 seconds to start now costs you on every cold start. Lazy imports become a profitable habit.
  • Reading config from a remote source on each cold start adds up. Cache at process start; the container lives 5-15 min, plenty of room.
  • DB connections must be reused inside an instance and recycled correctly when the instance is reclaimed. This is a footgun — your DB pool must handle the instance lifecycle.
  • Logging discipline matters more. You can't ssh in and tail. Structured logs to a real backend, request IDs propagated everywhere, traces if you can afford the OTLP cost.
  • Per-request cost makes inefficient code expensive. A handler that takes 500 ms instead of 50 ms isn't 10× slower — it's 10× more expensive. Profiling becomes a budget item, not an optional luxury.

The teams that are happy on serverless tend to be the teams that already had this discipline. The teams that are unhappy tend to be the teams that ported their slow monolith handler-by-handler and were surprised by the bill.

What to measure

Before / after a migration, instrument:

  • p50 / p99 latency from the client's perspective. Not from inside the function — from the browser/mobile client, including TLS, DNS, queueing.
  • Cold-start rate. What fraction of requests hit a cold container? At what percentile does it stop mattering?
  • Cost per million requests. From the bill, not the calculator.
  • Concurrent invocations. Are you hitting the platform's concurrency ceiling? Time to raise it before traffic catches up.
  • Database connection count. Going up sharply after a Lambda migration is the canary for "we forgot RDS Proxy."
  • Init-phase duration. Now that AWS bills for it, it's a real budget line. Optimize it like you would any hot loop.

Wrap

For the upload API specifically — small, stateless, bursty, sub-100ms-of-CPU work — serverless is the right architectural choice in 2026. The economics are clean, the operational story is simpler than a fleet, scaling is automatic, and the workload's shape lines up exactly with what the platforms are good at. The two real risks (cold starts and per-request cost discipline) are tractable with normal engineering hygiene.

The choice between Lambda and Cloud Run for this workload is real, not religious. Cloud Run's per-container concurrency makes it usually 5-10× cheaper for I/O-bound APIs. Lambda is the simpler choice if you're deep in AWS event sources. Both are vastly better than provisioning VMs for a workload that idles 60% of the day.

The next article in the series will pick up where this one leaves off: the worker pipeline that turns a pending upload into a ready one — virus scanning, content-type verification, metadata extraction, variant generation. That work is also serverless-shaped (event-driven, bursty, parallelizable), and the same economic argument applies, but with more interesting failure modes.


Series

  1. Direct-to-S3 uploads — moving file bytes off the API.
  2. Serverless at the front door (this article) — running the permission-granting API on Lambda / Cloud Run.
  3. The async worker pipeline — virus scanning, content verification, variant generation. (next)
  4. Multipart and resumable uploads in the browser.
  5. Variant generation: thumbnails, transcoding, OCR.
  6. Signed download URLs, CDN integration, access control.
  7. Lifecycle, retention, and GDPR-compliant deletion.
  8. Observability and forensics for file pipelines.

References