AI Deployments
Large Language Model (LLM)
What's the Role of the Language Model in a Gen AI?
The language model serves as the pivotal component in a Generative AI system. It is responsible for generating answers to user queries, essentially acting as the "brain" of the operation.
What LLM Can I Choose From?
We currently offer a selection of language models to best suit your needs:
-
OpenAI:
gpt-5
: The most advanced and powerful model, for tasks requiring maximum capability.gpt-5-chat
: Optimized for conversational flow and dialogue.gpt-5-mini
: A smaller, faster version balanced for general-purpose tasks.gpt-5-nano
: The most lightweight and efficient version, ideal for simple, high-speed applications.gpt-4.5-preview
: Provides early access to the next generation of models with the latest features.gpt-4.1
: A highly capable model for complex reasoning and in-depth analysis.gpt-4o
: The flagship multimodal model, optimized for a superior balance of speed, intelligence, and cost.gpt-4o-mini
: A more streamlined and economical version of GPT-4o.- ... and more, as new models are frequently added upon their release.
-
Microsoft Azure OpenAI:
- Benefit from the full power of OpenAI's models, hosted on Azure's secure and scalable infrastructure. The list of available models is the same as for OpenAI, including the gpt-5, gpt-4.1, and gpt-4o families.
-
Anthropic:
claude-4-sonnet-latest
: A newer generation model with enhanced capabilities.claude-3-5-sonnet-20240620
: A powerful and very fast model, excellent for a wide range of tasks.claude-3-opus-latest
: The most powerful model in the Claude 3 family for highly complex tasks.claude-3-sonnet-latest
: An ideal balance of intelligence and speed for enterprise workloads.claude-3-haiku-latest
: The fastest and most compact model for near-instant responsiveness.
-
Mistral:
- mistral-large-latest: Mistral's flagship model with top-tier reasoning capabilities.
- codestral-latest: A specialized model openly-weighted for code generation tasks.
- mistral-small-latest: A fast and cost-effective model for high-volume tasks.
- pixtral-large-latest: An innovative model designed for specific advanced use cases.
-
Google Vertex AI:
- gemini-2-flash: The next generation of Google's fast and efficient models.
- gemini-1.5-pro: A highly capable multimodal model with a very large context window.
- gemini-1.5-flash: A lighter, faster, and more cost-efficient version of Gemini 1.5 Pro.
- gemma-3-9b-it: The latest generation of Google's open, instruction-tuned models.
- codegemma: A specialized model for code generation and software development tasks.
-
Deepseek:
- deepseek-chat: A powerful model designed for general conversational use.
- deepseek-reasoner: A specialized model optimized for complex logical reasoning and problem-solving.
-
LLMProxy / Lite LLM:
- These providers offer a flexible gateway to use a vast range of other models by specifying their direct path (e.g., openai/gpt-4o-mini).
We are continuously expanding our offerings, so stay tuned for more options.
How to Configure Your AI Deployment?
Setting up your AI model is a straightforward process. The configuration is divided into two parts: general settings that are common to all models, and specific parameters that change depending on the model you select..
General Settings
These fields are the foundation of your AI deployment:
- Model name: A custom name you give to this specific deployment for easy identification.
- Provider: The company providing the language model (e.g., OpenAI, Google, Anthropic).
- Text embeddings: The model used to convert text into numerical representations for semantic understanding.
- Base model: The specific language model you want to use (e.g., gpt-4o, gpt-5).
- API token: Your secret API key from the provider to authorize the requests.
Model-Specific Parameters
The parameters to fine-tune the model's behavior will appear once you have selected a "Base model". There are two main configuration types:
-
Standard Configuration (for models like GPT-4o)
This configuration uses "Temperature" to control the creativity of the model. It is typically found on previous-generation models.
- Temperature: Defines the level of creativity of the AI. A lower value (e.g., 0,2) makes the output more focused and deterministic, while a higher value increases its creativity and randomness.
- Max tokens to generate: Sets a hard limit on the number of tokens the model will generate in a single response. Be careful, as this can cut off longer answers.
- Advanced Configuration (for models like GPT-5)
This modern configuration provides more intuitive controls over the model's thought process and the detail of its response.

Reasoning Effort
This parameter influences how deeply the model thinks before answering. A higher effort can lead to more accurate and well-structured responses but may slightly increase the response time. The available options are:
- Minimal: For the fastest possible answers with surface-level reasoning.
- Low: For basic reasoning, suitable for simple and direct questions.
- Medium: A good balance between the depth of thought and the speed of response. Ideal for most use cases.
- High: For in-depth analysis and complex thinking, perfect for challenging problems that require a thorough breakdown.
Verbosity
This controls the length and level of detail of the answer. It allows you to choose between a very brief response and a complete explanation. The available options are:
- Minimal: Provides the shortest and most direct answer possible.
- Low: Generates a concise response that gets straight to the point.
- Medium: Offers a response with a balanced level of detail, providing context without being overly long.
- High: Produces a complete and detailed explanation, covering all aspects of the query.
Max tokens to generate
This parameter functions the same way, setting a hard limit on the length of the response.
Updated about 4 hours ago