Faster Inference

Model Gateway can be used with Azure OpenAI to provide fast load balancing and failover across multiple regions.

Here is a schema of a typical load balancing setup with Azure OpenAI and Model Gateway:

Azure OpenAI Load Balancing

Routing to the fastest region

Model Gateway allows also dynamic load balancing to the fastest available region of any cloud you use. To enable this feature, please contact us.