AI Deployments

Large Language Model (LLM)

What's the Role of the Language Model in a Gen AI?

The language model serves as the pivotal component in a Generative AI system. It is responsible for generating answers to user queries, essentially acting as the "brain" of the operation.

What LLM Can I Choose From?

We currently offer a selection of language models to best suit your needs:

  • OpenAI:

    • gpt-5: The most advanced and powerful model, for tasks requiring maximum capability.
    • gpt-5-chat: Optimized for conversational flow and dialogue.
    • gpt-5-mini: A smaller, faster version balanced for general-purpose tasks.
    • gpt-5-nano: The most lightweight and efficient version, ideal for simple, high-speed applications.
    • gpt-4.5-preview: Provides early access to the next generation of models with the latest features.
    • gpt-4.1: A highly capable model for complex reasoning and in-depth analysis.
    • gpt-4o: The flagship multimodal model, optimized for a superior balance of speed, intelligence, and cost.
    • gpt-4o-mini: A more streamlined and economical version of GPT-4o.
    • ... and more, as new models are frequently added upon their release.
  • Microsoft Azure OpenAI:

    • Benefit from the full power of OpenAI's models, hosted on Azure's secure and scalable infrastructure. The list of available models is the same as for OpenAI, including the gpt-5, gpt-4.1, and gpt-4o families.
  • Anthropic:

    • claude-4-sonnet-latest: A newer generation model with enhanced capabilities.
    • claude-3-5-sonnet-20240620: A powerful and very fast model, excellent for a wide range of tasks.
    • claude-3-opus-latest: The most powerful model in the Claude 3 family for highly complex tasks.
    • claude-3-sonnet-latest: An ideal balance of intelligence and speed for enterprise workloads.
    • claude-3-haiku-latest: The fastest and most compact model for near-instant responsiveness.
  • Mistral:

    • mistral-large-latest: Mistral's flagship model with top-tier reasoning capabilities.
    • codestral-latest: A specialized model openly-weighted for code generation tasks.
    • mistral-small-latest: A fast and cost-effective model for high-volume tasks.
    • pixtral-large-latest: An innovative model designed for specific advanced use cases.
  • Google Vertex AI:

    • gemini-2-flash: The next generation of Google's fast and efficient models.
    • gemini-1.5-pro: A highly capable multimodal model with a very large context window.
    • gemini-1.5-flash: A lighter, faster, and more cost-efficient version of Gemini 1.5 Pro.
    • gemma-3-9b-it: The latest generation of Google's open, instruction-tuned models.
    • codegemma: A specialized model for code generation and software development tasks.
  • Deepseek:

    • deepseek-chat: A powerful model designed for general conversational use.
    • deepseek-reasoner: A specialized model optimized for complex logical reasoning and problem-solving.
  • LLMProxy / Lite LLM:

    • These providers offer a flexible gateway to use a vast range of other models by specifying their direct path (e.g., openai/gpt-4o-mini).

We are continuously expanding our offerings, so stay tuned for more options.

How to Configure Your AI Deployment?

Setting up your AI model is a straightforward process. The configuration is divided into two parts: general settings that are common to all models, and specific parameters that change depending on the model you select..

General Settings

These fields are the foundation of your AI deployment:

  • Model name: A custom name you give to this specific deployment for easy identification.
  • Provider: The company providing the language model (e.g., OpenAI, Google, Anthropic).
  • Text embeddings: The model used to convert text into numerical representations for semantic understanding.
  • Base model: The specific language model you want to use (e.g., gpt-4o, gpt-5).
  • API token: Your secret API key from the provider to authorize the requests.

Model-Specific Parameters

The parameters to fine-tune the model's behavior will appear once you have selected a "Base model". There are two main configuration types:

  1. Standard Configuration (for models like GPT-4o)
    This configuration uses "Temperature" to control the creativity of the model. It is typically found on previous-generation models.

  • Temperature: Defines the level of creativity of the AI. A lower value (e.g., 0,2) makes the output more focused and deterministic, while a higher value increases its creativity and randomness.
  • Max tokens to generate: Sets a hard limit on the number of tokens the model will generate in a single response. Be careful, as this can cut off longer answers.

  1. Advanced Configuration (for models like GPT-5)
    This modern configuration provides more intuitive controls over the model's thought process and the detail of its response.

Reasoning Effort

This parameter influences how deeply the model thinks before answering. A higher effort can lead to more accurate and well-structured responses but may slightly increase the response time. The available options are:

  • Minimal: For the fastest possible answers with surface-level reasoning.
  • Low: For basic reasoning, suitable for simple and direct questions.
  • Medium: A good balance between the depth of thought and the speed of response. Ideal for most use cases.
  • High: For in-depth analysis and complex thinking, perfect for challenging problems that require a thorough breakdown.

Verbosity

This controls the length and level of detail of the answer. It allows you to choose between a very brief response and a complete explanation. The available options are:

  • Minimal: Provides the shortest and most direct answer possible.
  • Low: Generates a concise response that gets straight to the point.
  • Medium: Offers a response with a balanced level of detail, providing context without being overly long.
  • High: Produces a complete and detailed explanation, covering all aspects of the query.

Max tokens to generate

This parameter functions the same way, setting a hard limit on the length of the response.