Extra Metrics

The metrics on this page require extra information from the Step 1.

Information

Metric(s)

Expected Answer

Context Precision, Context Recall, Context Entity Recall, Answer Similarity, Answer Correctness

Response Time

Latency

LLM Calls

Conversation

User Frustration

Context Precision

Context Precision measures if all the important items in the context are ranked at the top. Ideally, all relevant parts should be ranked highly. The higher scores mean better precision.

Context Recall

Context Recall measures how well the retrieved context matches the expected answer. Higher scores mean better performance.

To calculate this, each point in the expected answer is checked to see if it can be linked to the retrieved context. Ideally, all points in the expected answer should match the retrieved context.

Context Entity Recall

This metric measures how well the retrieved context includes the important entities from the expected answers. It compares the number of matching entities in both the expected answers and the retrieved context to the total number of entities in the expected answers.

In simple terms, it shows what portion of the important entities were found. It can assess how well the retrieval system works by checking if it includes the important entities, which is crucial when those entities are significant.

Answer Similarity

Answer Similarity measures how similar the meaning of the generated answer is to the expected answer. This is scored from 0 to 1, with higher scores indicating better alignment.

Assessing this similarity helps determine the quality of the generated answer. A cross-encoder model is used to calculate the similarity score.

Answer Correctness

Answer Correctness measures how accurate the generated answer is compared to the expected answer. Scores range from 0 to 1, with higher scores meaning the answer is more accurate.

It looks at two main factors: how similar the meanings of the answers are and how factually correct they are. These factors are combined using a weighted system to create the correctness score.

Latency

Latency measures if the time it takes for the LLM to provide an answer is shorter than a specific amount of seconds.

LLM Calls

LLM Calls refers to how many times the system needs to call the LLM before it gives the final result.

User Frustration

Administrators need to know if a user is feeling frustrated during their interaction. User frustration evaluation can be applied to a single exchange or across the whole conversation to see if the user is becoming frustrated.

PreviousBasic Metrics NextIntegration Guide

Last updated 10 months ago