AI Deployments

Large Language Model (LLM)

What's the Role of the Language Model in a Gen AI?

The language model serves as the pivotal component in a Generative AI system. It is responsible for generating answers to user queries, essentially acting as the "brain" of the operation.

What LLM Can I Choose From?

We currently offer a selection of language models to best suit your needs:

OpenAI:
- gpt-4: High-capacity model suitable for complex tasks requiring deep understanding and detailed responses.
- gpt-4o: Optimized version of gpt-4 for reduced latency and lower costs while maintaining high accuracy.
- gpt-4o-mini: Further streamlined version for even faster responses and minimal resource usage, ideal for lightweight applications.
- and more, new models are usually available a few days after their annoucement.
Microsoft Azure OpenAI:
- gpt-4: Same as OpenAI's gpt-4, but hosted on Azure infrastructure, offering seamless integration with Microsoft cloud services.
- gpt-4o: Optimized variant, leveraging Azure’s scalability for enhanced performance.
- gpt-4o-mini: Lightweight version hosted on Azure, perfect for cost-sensitive or high-throughput use cases.
Anthropic:
- claude-3-sonnet-20240220
- claude-3-opus-20240229
- claude-3-sonnet-20240229
- claude-3-haiku-20240307
Mistral:
- open-mistral-7b
- open-mixtral-8x7b
- open-mistral-nemo
- mistral-small-latest
- mistral-medium-latest
- mistral-large-latest
Google:
- gemini-1.5-flash-001
- gemini-1.5-pro-001
- gemini-1.0-pro-001

We are continuously expanding our offerings, so stay tuned for more options.

What Are the available options for the LLM?

Here is the available options for Large Langage Model of your Gen AI:

Provider
Identifies the company responsible for the language model.
Example: OpenAI

Name
Specifies the variant of the model.
Example: gpt-4-o

Temperature
Controls the creativity level of the Generative AI. A higher value induces more creative but potentially less focused outputs.
Example: 0.2

Max Tokens to Generate
Sets a hard limit on the number of tokens the model will generate in each response. Exercise caution when setting this parameter, as it can truncate longer answers.
Example: 256