XBot
  • Overview
    • Introduction
    • How xBot Works
    • Target Audience
    • Key Benefits of xBot
    • Core Concepts of xBot
  • Quick Start
    • Quick Setup
    • Getting Started
      • Zalo Channel
      • Azure Bot Framework
      • FaceBook Channel
      • Team Channel
      • Webchat Channel
      • Email Channel
    • Basic Configuration
    • First AI Flow Setup
    • Initial Testing and Go Live
  • Features
    • Using xBot to Handle End-User Queries
    • Communication Channels
      • Zalo OA
      • Facebook
      • Teams
      • WebChat
      • Email
    • Understanding the Message Handling Flow
    • Understanding AI Bots in xBot
    • Configuring Dispatch Rules in xBot
    • User Functions and Permissions
      • Custom Roles and Permissions
      • Auditing and Monitoring User Activities
    • Cross-Platform Message Type Compatibility
    • AI Flow
      • Core Concepts
      • AI Services
        • Knowledge Base Agent
        • AI Agent
        • AI Proxy Agent
      • Knowledge Base
      • Functions
      • Evaluation Metrics
        • Essential Information
        • Basic Metrics
        • Extra Metrics
  • Integration Guide
    • Integrates with multiple channels
      • API reference
        • Webhook
          • ZaloPushToXBot
          • AzbotPushToXBot
        • Webchat
          • InitForClient
  • References
    • Industry-Specific Use Cases
      • Media and Entertainment
      • Wholesale
      • Transportation and Logistics
      • Manufacturing
      • Energy and Utilities
      • Real Estate
      • Agriculture
      • Travel and Hospitality
      • Healthcare and Wellness
      • Retail and E-Commerce
      • Public Administration
      • Legal
      • Training
      • Education
      • xBot Use Case: Insurance
      • Securities -Use Case
      • Banking - Use Case
      • xBot Use Case: Finance
Powered by GitBook
On this page
  • Context Precision
  • Context Recall
  • Context Entity Recall
  • Answer Similarity
  • Answer Correctness
  • Latency
  • LLM Calls
  • User Frustration
  1. Features
  2. AI Flow
  3. Evaluation Metrics

Extra Metrics

The metrics on this page require extra information from the Step 1.

Information
Metric(s)

Expected Answer

Context Precision, Context Recall, Context Entity Recall, Answer Similarity, Answer Correctness

Response Time

Latency

LLM Calls

LLM Calls

Conversation

User Frustration

Context Precision

Context Precision measures if all the important items in the context are ranked at the top. Ideally, all relevant parts should be ranked highly. The higher scores mean better precision.

Context Recall

Context Recall measures how well the retrieved context matches the expected answer. Higher scores mean better performance.

To calculate this, each point in the expected answer is checked to see if it can be linked to the retrieved context. Ideally, all points in the expected answer should match the retrieved context.

Context Entity Recall

This metric measures how well the retrieved context includes the important entities from the expected answers. It compares the number of matching entities in both the expected answers and the retrieved context to the total number of entities in the expected answers.

In simple terms, it shows what portion of the important entities were found. It can assess how well the retrieval system works by checking if it includes the important entities, which is crucial when those entities are significant.

Answer Similarity

Answer Similarity measures how similar the meaning of the generated answer is to the expected answer. This is scored from 0 to 1, with higher scores indicating better alignment.

Assessing this similarity helps determine the quality of the generated answer. A cross-encoder model is used to calculate the similarity score.

Answer Correctness

Answer Correctness measures how accurate the generated answer is compared to the expected answer. Scores range from 0 to 1, with higher scores meaning the answer is more accurate.

It looks at two main factors: how similar the meanings of the answers are and how factually correct they are. These factors are combined using a weighted system to create the correctness score.

Latency

Latency measures if the time it takes for the LLM to provide an answer is shorter than a specific amount of seconds.

LLM Calls

LLM Calls refers to how many times the system needs to call the LLM before it gives the final result.

User Frustration

Administrators need to know if a user is feeling frustrated during their interaction. User frustration evaluation can be applied to a single exchange or across the whole conversation to see if the user is becoming frustrated.

PreviousBasic MetricsNextIntegration Guide

Last updated 9 months ago