Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions serverless/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: "Overview"
description: "Pay-as-you-go compute for AI models and compute-intensive workloads."
---

import { EndpointTooltip, WorkersTooltip, WorkerTooltip, HandlerFunctionTooltip, RequestTooltip, ColdStartTooltip, CachedModelsTooltip, PodTooltip, RunpodHubTooltip, PublicEndpointTooltip, JobTooltip, LoadBalancingEndpointTooltip, QueueBasedEndpointsTooltip } from "/snippets/tooltips.jsx";
import { EndpointTooltip, WorkersTooltip, WorkerTooltip, HandlerFunctionTooltip, RequestTooltip, PodTooltip, RunpodHubTooltip, PublicEndpointTooltip, JobTooltip, LoadBalancingEndpointTooltip, QueueBasedEndpointsTooltip } from "/snippets/tooltips.jsx";

Runpod Serverless is a cloud computing platform that lets you run AI models and compute-intensive workloads without managing servers. You only pay for the actual compute time you use, with no idle costs when your application isn't processing requests.

Expand Down Expand Up @@ -126,7 +126,7 @@ flowchart TD

A "cold start" refers to the time between when an endpoint with no running workers receives a request, and when a worker is fully "warmed up" and ready to handle the request. This generally involves starting the container, loading models into GPU memory, and initializing runtime environments. Larger models take longer to load into memory, increasing cold start time, and request response time by extension.

Minimizing <ColdStartTooltip />s is key to creating a responsive and cost-effective endpoint. You can reduce cold starts by using <CachedModelsTooltip />, enabling [FlashBoot](/serverless/endpoints/endpoint-configurations#flashboot), setting [active worker counts](/serverless/endpoints/endpoint-configurations#active-min-workers) above zero.
Minimizing cold starts is key to creating a responsive and cost-effective endpoint. You can reduce cold starts by using [cached models](/serverless/endpoints/model-caching), enabling [FlashBoot](/serverless/endpoints/endpoint-configurations#flashboot), setting [active worker counts](/serverless/endpoints/endpoint-configurations#active-min-workers) above zero.

### [Load balancing endpoints](/serverless/load-balancing/overview)

Expand Down