runpod · promptless · Feb 6, 2026
diff --git a/serverless/overview.mdx b/serverless/overview.mdx
@@ -3,7 +3,7 @@ title: "Overview"
 description: "Pay-as-you-go compute for AI models and compute-intensive workloads."
 ---
 
-import { EndpointTooltip, WorkersTooltip, WorkerTooltip, HandlerFunctionTooltip, RequestTooltip, ColdStartTooltip, CachedModelsTooltip, PodTooltip, RunpodHubTooltip, PublicEndpointTooltip, JobTooltip, LoadBalancingEndpointTooltip, QueueBasedEndpointsTooltip } from "/snippets/tooltips.jsx";
+import { EndpointTooltip, WorkersTooltip, WorkerTooltip, HandlerFunctionTooltip, RequestTooltip, PodTooltip, RunpodHubTooltip, PublicEndpointTooltip, JobTooltip, LoadBalancingEndpointTooltip, QueueBasedEndpointsTooltip } from "/snippets/tooltips.jsx";
 
 Runpod Serverless is a cloud computing platform that lets you run AI models and compute-intensive workloads without managing servers. You only pay for the actual compute time you use, with no idle costs when your application isn't processing requests.
 
@@ -126,7 +126,7 @@ flowchart TD
 
  A "cold start" refers to the time between when an endpoint with no running workers receives a request, and when a worker is fully "warmed up" and ready to handle the request. This generally involves starting the container, loading models into GPU memory, and initializing runtime environments. Larger models take longer to load into memory, increasing cold start time, and request response time by extension. 
 
-Minimizing <ColdStartTooltip />s is key to creating a responsive and cost-effective endpoint. You can reduce cold starts by using <CachedModelsTooltip />, enabling [FlashBoot](/serverless/endpoints/endpoint-configurations#flashboot), setting [active worker counts](/serverless/endpoints/endpoint-configurations#active-min-workers) above zero.
+Minimizing cold starts is key to creating a responsive and cost-effective endpoint. You can reduce cold starts by using [cached models](/serverless/endpoints/model-caching), enabling [FlashBoot](/serverless/endpoints/endpoint-configurations#flashboot), setting [active worker counts](/serverless/endpoints/endpoint-configurations#active-min-workers) above zero.
 
 ### [Load balancing endpoints](/serverless/load-balancing/overview)