Multi-region endpoints are available for Claude on Vertex AI

Today, we’re announcing U.S. and EU multi-region endpoints for Claude on Vertex AI are available in public preview. By pooling capacity across multiple regions, these endpoints dynamically route requests to help improve reliability, while ensuring your data processing remains within your preferred geography to meet your compliance needs.

Available in the U.S. and EU, this approach allows you to deploy with the resilience of a more distributed architecture without sacrificing your data residency requirements. Please visit our Vertex AI documentation for detailed instructions and to start building today.

What are multi-region endpoints and when should you use them?

When using Claude on Vertex AI, you previously had two choices:

Regional endpoints (e.g., us-central1): Keep data and processing within a single specific location. Best for low latency.
Global endpoints: Route traffic anywhere in the world where capacity is available. Best for maximum capacity and lowest cost.

Multi-region endpoints serve as the "middle ground". They allow Vertex AI to automatically shift traffic between different regions within a single geography (for example, moving traffic between us-central1 and us-east4).

When is a multi-region endpoint the right choice?

Data residency compliance: Your organization requires that data stays within the U.S. or EU, but you want to avoid being tied to a single region.
Enhanced reliability: You want to protect your application against a single-region outage or capacity constraint without sending traffic globally.
Simplified traffic management: Instead of managing failover logic between regional endpoints — like us-central1 and us-west1 — yourself, the multi-region endpoint handles it automatically.

Comparing your endpoint options

Feature	Global endpoint	Multi-region endpoint	Regional endpoint
Availability	Maximum (global failover)	High (multi-region failover)	Dependent on single-region capacity (multi-zone failover)
Data residency	No data residency guarantees	Restricted to a specific geography (e.g., U.S., EU)	Restricted to a specific region (e.g., Iowa)
Latency	Variable (based on global routing)	Optimized within the geography	Lowest (if the user is near the region)
Quota	Independent global quota	Shared geography-based quota	Region-specific quota
Models Supported	Claude Opus 4.7 Claude Opus 4.6 Claude Sonnet 4.6 Claude Opus 4.5 Claude Sonnet 4.5 Claude Haiku 4.5 Claude Opus 4.1 Claude Opus 4 Claude Sonnet 4 Claude 3.7 Sonnet	Claude Opus 4.7 Coming soon: Claude Opus 4.6 Claude Sonnet 4.6 Claude Opus 4.5 Claude Sonnet 4.5 Claude Haiku 4.5 Claude Sonnet 4	Claude Opus 4.7* Claude Opus 4.6 Claude Sonnet 4.6 Claude Opus 4.5 Claude Sonnet 4.5 Claude Haiku 4.5 Claude Opus 4.1 Claude Opus 4 Claude Sonnet 4 Claude 3.7 Sonnet Claude 3.5 Haiku *Select regions and consumption modes

Full support for prompt caching

Multi-region endpoints fully support prompt caching. When a request is sent to a multi-region endpoint, Vertex AI attempts to route it to the specific region where your prompt is already cached to ensure the lowest possible latency and cost. If that specific region is under heavy load, the system intelligently balances the request to the next available region within that geography to maintain uptime.

Best practices

To maximize performance, we recommend using multi-region endpoints as your default for production workloads that require residency within the U.S. and EU.

Monitor quotas: Multi-region endpoints use their own quota pools, separate from single-region quotas.
Consistency: For the best caching performance, stick to one location (e.g. US multi-region) per workload.
Pricing: Multi-region endpoint requests follow the standard Claude on Vertex AI pay-as-you-go pricing model. Prices may vary across locations, and are generally lower on global endpoints.