Analytics > Latency

This report is providing with the following content

In this case we have a RAG agent.

To measure the performance of our system, we focus on the following key metrics:

Average Response Time: This metric captures the time elapsed from the moment the user sends a question to the delivery of the last token in the response. It’s worth noting that the time to first token is typically much shorter. For example, even if the total response time is 5 seconds, the first token might appear in just 1–2 seconds, ensuring a responsive experience.
Shortest Response Time: This represents the quickest response recorded during the period, often observed with simple interactions like greetings.
Longest Response Time: This metric highlights the slowest response during the period. It is particularly valuable for identifying and addressing edge cases that cause delays, ensuring consistent performance across all interactions.