Faster Inference

Model Gateway can be used with Azure OpenAI to provide fast load balancing and failover across multiple regions.

Here is a schema of a typical load balancing setup with Azure OpenAI and Model Gateway:

Routing to the fastest region

Model Gateway allows also dynamic load balancing to the fastest available region of any cloud you use. To enable this feature, please contact us.