Get up to 15x faster response from OpenAI GPT API with Model Gateway
Model Gateway is an open-source robust intermediary platform designed to streamline and manage AI inference requests from your client applications to various AI service providers.
The fastest GPT response
Model you ❤️ but up to 15x faster
We monitor OpenAI Platform and all Azure OpenAI data centers and route your request to the fastest available AI provider and region that is reliable at a given moment. Enjoy your favorite OpenAI GPT models, but much faster.
- Fastest Possible Inference
- Get up to 15x more output tokens per second with active routing compared to using your current static endpoints.
- Load Balancing and Failover
- Distributes load across multiple endpoints and regions to ensure high availability and redundancy.
- Easy Integration
- Keep using your favorite AI libraries. Model Gateway is compatible with all major existing ones.
- Integration with Multiple AI Providers
- Connects seamlessly with Azure OpenAI, OpenAI, Ollama, and more for flexible and scalable integration.
- Administrative Interface
- Manage configurations and monitor performance with a user-friendly UI and GraphQL API support.
- Secure and Configurable
- Handles API keys and tokens securely with advanced configuration options for customized needs.
- Secure By Default
- Security is our top priority. We use the latest security standards to keep communication safe.
- Privacy Guaranteed
- All your data belongs to you. Host Model Gateway on your infrastructure.
Super-simple integration
No additional dependencies, no complex setup. Just a simple configuration.
from openai import AzureOpenAI
MODELGW_API_KEY = "sk-..."
client = AzureOpenAI(
api_key=MODELGW_API_KEY,
api_version="2023-05-15",
azure_endpoint="http://modelgw:4001",
)
completion = client.chat.completions.create(
messages=[{"role": "user", "content": "Hello there!"}],
model="auto", # set your model to "auto"
)
print(completion)
import OpenAI from 'openai';
const modelgwApiKey = 'sk-...';
const openai = new OpenAI({
apiKey: modelgwApiKey,
baseURL: 'http://modelgw:4001/openai/deployments/auto',
defaultQuery: { 'api-version': '2023-05-15' },
defaultHeaders: { 'api-key': modelgwApiKey },
});
async function main() {
const result = await openai.chat.completions.create({
model: 'auto',
messages: [{ role: 'user', content: 'Hello there!' }],
});
console.log(JSON.stringify(result, null, 4));
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
String modelgwApiKey = "sk-...";
OpenAIClient client = new OpenAIClientBuilder()
.credential(new KeyCredential(modelgwApiKey))
.endpoint("http://modelgw:4001")
.buildClient();
List<ChatRequestMessage> chatMessages = new ArrayList<>();
chatMessages.add(new ChatRequestUserMessage("Hello there!"));
ChatCompletions chatCompletions = client.getChatCompletions("auto",
new ChatCompletionsOptions(chatMessages));
for (ChatChoice choice : chatCompletions.getChoices()) {
ChatResponseMessage message = choice.getMessage();
System.out.println(message.getContent());
}
$modelgwApiKey = 'sk-...';
$client = OpenAI::factory()
->withBaseUri('http://modelgw:4001/openai/deployments/auto')
->withHttpHeader('api-key', $modelgwApiKey)
->withQueryParam('api-version', '2023-05-15')
->make();
$response = $client->chat()->create([
'model' => 'auto',
'messages' => [
['role' => 'user', 'content' => 'Hello there!'],
],
]);
foreach ($response->choices as $result) {
echo $result->message->content;
}
curl "http://modelgw:4001/openai/deployments/auto/chat/completions?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-H "api-key: $MODELGW_API_KEY" \
-d "{ \"messages\": [{ \"role\": \"user\", \"content\": \"Hello there\!\" } ], \"model\": \"auto\" }"
Looking for faster OpenAI GPT inference?
Get in touch with us!
Pricing
Choose the plan that works for you.
Open-source
The essentials for centralized and reliable AI inference. Say goodbye to API errors and timeouts.
Free
View on GitHub
- Self-hosted
- Automatic failover
- Unlimited gateways
- Unlimited requests/month
Open-source Plus
⚡️ Fastest inference
Get the routing to the fastest available regions of cloud AI providers such as Azure OpenAI service.
Custom
Contact us
- Self-hosted or managed
- Automatic failover
- Routing to the fastest region
- Up to 15x faster inference
- Support
Frequently asked questions
Get in touch
We are here to help and answer any questions you might have. We look forward to hearing from you.
Headquarters
- moc.wgledom@olleh
Support
- moc.wgledom@troppus