Today, we’re announcing U.S. and EU multi-region endpoints for Claude on Vertex AI are available in public preview. By pooling capacity across multiple regions, these endpoints dynamically route requests to help improve reliability, while ensuring your data processing remains within your preferred geography to meet your compliance needs.
Available in the U.S. and EU, this approach allows you to deploy with the resilience of a more distributed architecture without sacrificing your data residency requirements. Please visit our Vertex AI documentation for detailed instructions and to start building today.
What are multi-region endpoints and when should you use them?
When using Claude on Vertex AI, you previously had two choices:
-
Regional endpoints (e.g., us-central1): Keep data and processing within a single specific location. Best for low latency.
-
Global endpoints: Route traffic anywhere in the world where capacity is available. Best for maximum capacity and lowest cost.
Multi-region endpoints serve as the "middle ground". They allow Vertex AI to automatically shift traffic between different regions within a single geography (for example, moving traffic between us-central1 and us-east4).
When is a multi-region endpoint the right choice?
-
Data residency compliance: Your organization requires that data stays within the U.S. or EU, but you want to avoid being tied to a single region.
-
Enhanced reliability: You want to protect your application against a single-region outage or capacity constraint without sending traffic globally.
-
Simplified traffic management: Instead of managing failover logic between regional endpoints — like us-central1 and us-west1 — yourself, the multi-region endpoint handles it automatically.
Comparing your endpoint options
| Feature | Global endpoint | Multi-region endpoint | Regional endpoint |
| Availability | Maximum (global failover) | High (multi-region failover) | Dependent on single-region capacity (multi-zone failover) |
| Data residency | No data residency guarantees | Restricted to a specific geography (e.g., U.S., EU) | Restricted to a specific region (e.g., Iowa) |
| Latency | Variable (based on global routing) | Optimized within the geography | Lowest (if the user is near the region) |
| Quota | Independent global quota | Shared geography-based quota | Region-specific quota |
| Models Supported |
|
Coming soon:
|
*Select regions and consumption modes |
Full support for prompt caching
Multi-region endpoints fully support prompt caching. When a request is sent to a multi-region endpoint, Vertex AI attempts to route it to the specific region where your prompt is already cached to ensure the lowest possible latency and cost. If that specific region is under heavy load, the system intelligently balances the request to the next available region within that geography to maintain uptime.
Best practices
To maximize performance, we recommend using multi-region endpoints as your default for production workloads that require residency within the U.S. and EU.
-
Monitor quotas: Multi-region endpoints use their own quota pools, separate from single-region quotas.
-
Consistency: For the best caching performance, stick to one location (e.g. US multi-region) per workload.
-
Pricing: Multi-region endpoint requests follow the standard Claude on Vertex AI pay-as-you-go pricing model. Prices may vary across locations, and are generally lower on global endpoints.
How to get started
Integrating multi-region endpoints for Claude models requires a simple change to your configuration.
Step 1: Ensure you have enabled a supported Claude model in your Vertex AI project.
Step 2: Update your API base URL or location variable. Instead of a specific region like us-central1, use the multi-region identifier us or eu.
Example cURL:
Bash
code_block )])]>