Get up to 15x faster response from OpenAI GPT API with Model Gateway

Model Gateway is an open-source robust intermediary platform designed to streamline and manage AI inference requests from your client applications to various AI service providers.

Learn more Documentation

The fastest GPT response

Model you ❤️ but up to 15x faster

We monitor OpenAI Platform and all Azure OpenAI data centers and route your request to the fastest available AI provider and region that is reliable at a given moment. Enjoy your favorite OpenAI GPT models, but much faster.

Fastest Possible Inference: Get up to 15x more output tokens per second with active routing compared to using your current static endpoints.
Load Balancing and Failover: Distributes load across multiple endpoints and regions to ensure high availability and redundancy.
Easy Integration: Keep using your favorite AI libraries. Model Gateway is compatible with all major existing ones.
Integration with Multiple AI Providers: Connects seamlessly with Azure OpenAI, OpenAI, Ollama, and more for flexible and scalable integration.
Administrative Interface: Manage configurations and monitor performance with a user-friendly UI and GraphQL API support.
Secure and Configurable: Handles API keys and tokens securely with advanced configuration options for customized needs.
Secure By Default: Security is our top priority. We use the latest security standards to keep communication safe.
Privacy Guaranteed: All your data belongs to you. Host Model Gateway on your infrastructure.

Super-simple integration

No additional dependencies, no complex setup. Just a simple configuration.

from openai import AzureOpenAI

MODELGW_API_KEY = "sk-..."

client = AzureOpenAI(
    api_key=MODELGW_API_KEY,
    api_version="2023-05-15",
    azure_endpoint="http://modelgw:4001", 
)

completion = client.chat.completions.create(
    messages=[{"role": "user", "content": "Hello there!"}],
    model="auto",  # set your model to "auto"
)

print(completion)

import OpenAI from 'openai';

const modelgwApiKey = 'sk-...';

const openai = new OpenAI({
  apiKey: modelgwApiKey,
  baseURL: 'http://modelgw:4001/openai/deployments/auto',
  defaultQuery: { 'api-version': '2023-05-15' },
  defaultHeaders: { 'api-key': modelgwApiKey },
});

async function main() {
  const result = await openai.chat.completions.create({
    model: 'auto',
    messages: [{ role: 'user', content: 'Hello there!' }],
  });
  console.log(JSON.stringify(result, null, 4));
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

String modelgwApiKey = "sk-...";

OpenAIClient client = new OpenAIClientBuilder()
        .credential(new KeyCredential(modelgwApiKey))
        .endpoint("http://modelgw:4001")
        .buildClient();

List<ChatRequestMessage> chatMessages = new ArrayList<>();
chatMessages.add(new ChatRequestUserMessage("Hello there!"));

ChatCompletions chatCompletions = client.getChatCompletions("auto",
        new ChatCompletionsOptions(chatMessages));

for (ChatChoice choice : chatCompletions.getChoices()) {
    ChatResponseMessage message = choice.getMessage();
    System.out.println(message.getContent());
}

$modelgwApiKey = 'sk-...';

$client = OpenAI::factory()
    ->withBaseUri('http://modelgw:4001/openai/deployments/auto')
    ->withHttpHeader('api-key', $modelgwApiKey)
    ->withQueryParam('api-version', '2023-05-15')
    ->make();

$response = $client->chat()->create([
    'model' => 'auto',
    'messages' => [
        ['role' => 'user', 'content' => 'Hello there!'],
    ],
]);

foreach ($response->choices as $result) {
    echo $result->message->content;
}

curl "http://modelgw:4001/openai/deployments/auto/chat/completions?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-H "api-key: $MODELGW_API_KEY" \
-d "{ \"messages\": [{ \"role\": \"user\", \"content\": \"Hello there\!\" } ], \"model\": \"auto\" }"

Looking for faster OpenAI GPT inference?
Get in touch with us!

su tcatnoC

Pricing

Choose the plan that works for you.

Open-source

The essentials for centralized and reliable AI inference. Say goodbye to API errors and timeouts.

Free

View on GitHub

Self-hosted
Automatic failover
Unlimited gateways
Unlimited requests/month

Open-source Plus

⚡️ Fastest inference

Get the routing to the fastest available regions of cloud AI providers such as Azure OpenAI service.

Custom

Self-hosted or managed
Automatic failover
Routing to the fastest region
Up to 15x faster inference
Support

Frequently asked questions

Get in touch

We are here to help and answer any questions you might have. We look forward to hearing from you.

Headquarters

Email: moc.wgledom@olleh

Support

Email: moc.wgledom@troppus