# AI Outbound Agent State

The `AIOutboundAgentState` extends the regular **AI Agent** state to automate *outbound* interactions—e.g., phone calls, chat messages, or messaging-app conversations—directly from a workflow. In addition to the usual LLM configuration, tools, and outcomes, the state lets you specify:

* **Outbound channel details** (phone, Zalo, WhatsApp, Telegram, …) via `outboundConfig`
* **Realtime voice features** (STT/TTS/VAD) via `voiceConfig`

### AIOutboundAgentState

<table data-header-hidden><thead><tr><th width="174"></th><th></th><th width="116"></th><th></th></tr></thead><tbody><tr><td><strong>Parameter</strong></td><td><strong>Description</strong></td><td><strong>Type</strong></td><td><strong>Required</strong></td></tr><tr><td>agentName</td><td>The name of the agent.</td><td>string</td><td>yes</td></tr><tr><td>aiModel</td><td>The name of AI Language Model. Default value is 'gpt-4o'.</td><td>string</td><td>no</td></tr><tr><td><a href="#llmconfig">llmConfig</a></td><td>The configuration for the language model.</td><td>object</td><td>no</td></tr><tr><td>systemMessage</td><td>The system message used for constructing LLM prompt. Defaults to "You are a helpful AI Assistant."</td><td>string</td><td>yes</td></tr><tr><td>userMessage</td><td>The user message.</td><td>string</td><td>yes</td></tr><tr><td>maxToolExecutions</td><td>The maximum number of tool executions. Default is 10.</td><td>integer</td><td>no</td></tr><tr><td><a href="#chatmemory">memory</a></td><td>The memory of the agent. If not specify, the workflow process instance scope is used.</td><td>object</td><td>no</td></tr><tr><td><a href="#agentdataoutput">output</a></td><td>JSON schema for agent data output. See AgentDataOutput.</td><td>object</td><td>yes</td></tr><tr><td><a href="#toolforai">tools</a></td><td>Define list of tools. Each tool is described by the ToolForAI schema.</td><td>array</td><td>no</td></tr><tr><td><a href="#onagentoutcome">agentOutcomes</a></td><td>Define list of agent outcomes. Each outcome is described by the <a href="#onagentoutcome">OnAgentOutcome</a> schema.</td><td>array</td><td>yes</td></tr><tr><td><a href="https://docs.a4b.vn/xflow/3.-core-concepts/workflow-data-handling#state-data-filters">dataFilter</a></td><td>Filter to apply to the state data.</td><td>string</td><td>no</td></tr><tr><td><a href="#outboundconfig">outboundConfig</a></td><td>Channel-specific outbound settings.</td><td>object</td><td>yes</td></tr><tr><td><a href="#voiceconfig">voiceConfig</a></td><td>Voice features (STT, TTS, VAD) for realtime calls.</td><td>object</td><td>no</td></tr></tbody></table>

### **LLMConfig**

The same as [AIAgent State](/xflow/developer-guide/workflow-states-reference/aiagent-state.md#llmconfig) from [AIAgent State](/xflow/developer-guide/workflow-states-reference/aiagent-state.md)

### ChatMemory

The same as [AIAgent State](/xflow/developer-guide/workflow-states-reference/aiagent-state.md#chatmemory) from [AIAgent State](/xflow/developer-guide/workflow-states-reference/aiagent-state.md)

### AgentDataOutput

The same as [AIAgent State](/xflow/developer-guide/workflow-states-reference/aiagent-state.md#agentdataoutput) from [AIAgent State](/xflow/developer-guide/workflow-states-reference/aiagent-state.md)

### OnAgentOutcome

The same as [AIAgent State](/xflow/developer-guide/workflow-states-reference/aiagent-state.md#onagentoutcome) from [AIAgent State](/xflow/developer-guide/workflow-states-reference/aiagent-state.md)

### ToolForAI

The same as [AIAgent State](/xflow/developer-guide/workflow-states-reference/aiagent-state.md#toolforai) from [AIAgent State](/xflow/developer-guide/workflow-states-reference/aiagent-state.md)

### OutboundConfig

The `OutboundConfig` defines the channel-specific outbound settings.

| Property                          | Type   | Description                                                                                                                                     | Required |
| --------------------------------- | ------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | -------- |
| greeting                          | string | The static greeting message to be used by the agent. This message is used when the agent is first initialized.                                  | no       |
| greetingInstructions              | string | The instructions for the LLM to use when generating the greeting message. This configuration takes precedence over the static greeting message. | no       |
| [outboundTarget](#outboundtarget) | object | The outbound target configuration. This configuration is used to define the target for the outbound agent.                                      | yes      |

### OutboundTarget

The `OutboundTarget` defines the target for the outbound agent.

| Property      | Type   | Description                                                                                                                          | Required |
| ------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------ | -------- |
| targetType    | string | The type of the outbound target. This can be `voice`, `zalo`, `whatsapp`, etc. Default is `voice`.                                   | yes      |
| targetAddress | string | The address of the outbound target. This can be an email address, phone number, Zalo ID, etc. The format depends on the target type. | yes      |
| targetName    | string | The name of the outbound target. This is used for identification purposes.                                                           | yes      |

### VoiceConfig

`VoiceConfig` is the single block that tells the workflow how to listen, think, and speak during a telephone or voice-chat session.\
Because speech is a round-trip of *audio → text → LLM → text → audio*, `VoiceConfig` is split into four conceptual sub-modules, each matching one step in that loop:

1. VAD – “Is anyone talking right now?”
2. STT – “What did they just say?”
3. LLM – “How should the agent respond?”
4. TTS – “Say it out loud—in a human voice.”

The pipeline looks like this:

```
Caller audio → VAD → STT ─┐
                         ├─► LLM (reasoning/JSON tools)
LLM reply ◄──────────────┘
LLM reply → TTS → Agent audio

```

We can mix-and-match providers for every step; each has its own latency, cost, language coverage, and feature set.

**Why each component matters**

**Voice-Activity Detection (VAD)**

Purpose: Detects the precise start and end of human speech in the inbound audio stream.\
Why it’s critical: If VAD fires too late you waste the caller’s first syllables; if it fires too early you feed silence or background noise into STT and spend tokens on “uh …”. Good VAD also enables barge-in (interrupting TTS mid-sentence) and double-talk detection.

Typical knobs inside the `vad` block

* provider name (`silero` is the default implementation)
* energy / probability thresholds
* timeouts for “no-speech” and “end-of-speech”

**Speech-to-Text (STT)**

Purpose: Transforms raw audio chunks into partial and final transcripts.\
Why it’s critical: Whatever the LLM “hears” comes from STT; recognition accuracy drives the entire conversational quality. Latency drives perceived responsiveness.

Key configuration areas

* Provider & model – e.g. OpenAI Whisper large-v3, Deepgram Nova-2, Google STT tel-alpha
* Language/locale – supply a BCP-47 code like `vi-VN` or `en-US` so the model loads the right phoneme set
* Streaming vs batch – most providers stream; some cheaper models require a full clip upload
* Vocabulary bias / hints – business terms, proper names, SKU codes
* Post-processing – capitalization, profanity masking, punctuation injection
* Security – API key, private endpoint, or on-prem GPU deployment

**Large-Language Model (LLM)**

Purpose: Understands user intent, decides on tool calls, chooses the next action, and produces a textual reply (or a JSON payload if your state’s `output` schema demands structured data).

Inside a voice agent the LLM sits in the tightest latency loop after STT, so choosing how the LLM delivers its tokens changes the entire user experience:

* **Realtime LLM**
  * A realtime model ingests raw audio, reasons over it, and streams synthetic speech back without any external STT or TTS step.
  * What changes in the pipeline
    * No separate STT/TTS blocks. The model hears tone, hesitations, laughter—cues that are normally lost in a transcript.
    * Built-in turn detection. Most providers decide when you’ve finished speaking; We recommends relying on that internal detector. If you want to fall back to default turn-detector you must still bolt on an STT plugin so the detector can read interim transcripts.&#x20;
    * No hard-scripted speech. You can cue the model with instructions, but you cannot guarantee it will read a line verbatim. For legally approved disclaimers, attach a conventional TTS plugin and use [`greetingInstructions`](#outboundconfig)  for that segment.
* **Non-realtime LLM (Classic)**

The classic voice-AI stack separates concerns:

```
Caller audio → VAD → STT → text
                          ↓
                       LLM  ← current context & tools
                          ↓
Reply text → TTS → audio to caller

```

Why it’s still popular:

| Advantage                                                                                                                | Implication                                                          |
| ------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------- |
| **Deterministic text flow. Every turn yields clean, timestamped transcripts.**                                           | Great for analytics, compliance, post-call RAG pipelines.            |
| **Fine-grained control.** You choose best-of-breed STT, specialised LLM tooling, and premium or budget TTS per use-case. | Extra integration work and \~1-2 s additional latency.               |
| **Script fidelity.** A TTS engine will read a legal disclaimer exactly as written.                                       | Voices may sound less expressive unless you invest in neural styles. |

**Text-to-Speech (TTS)**

Purpose: Transforms the LLM’s textual reply into audio the caller hears.\
Why it’s critical: Humans judge “bot-ness” mainly by voice quality and timing. A 220 ms chunk-synthesis delay feels natural; 800 ms feels robotic.

TTS options worth documenting

* Voice/character – Rachel, en-US-Wavenet-D, Alloy-en-v2
* Style & prosody controls – speaking rate, pitch, emotion, stability, pronunciation lexicons
* Streaming support – mandatory for realtime pipelines; optional for batch
* Silence trimming & filler – some providers auto-trim leading breaths; some insert breathing/fillers you may want to disable
* Bandwidth – telephony lines are 8 kHz mono; web or app can handle 22 kHz stereo

### VoiceConfig Properties

| Property           | Type    | Description                                                                    | Required |
| ------------------ | ------- | ------------------------------------------------------------------------------ | -------- |
| [stt](#stt)        | object  | The speech-to-text configuration (Optional).                                   | no       |
| [tts](#tts)        | object  | The text-to-speech configuration (Optional).                                   | no       |
| [vad](#vad)        | object  | The voice activity detection configuration (Optional).                         | no       |
| allowInterruptions | boolean | Whether to allow interruptions during the voice interaction. Default is false. | no       |

### STT

The `STT` defines the configuration for the Speech To Text (STT) to be used by the AI Agent.

| Property        | Type   | Description                                                                                                                                                | Required |
| --------------- | ------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- |
| provider        | string | The name of the STT provider. Allowed values: 'deepgram', 'openai', 'google', 'elevenlabs', 'fal', 'groq'. This determines which STT backend will be used. | yes      |
| model           | string | The model to use for speech recognition. This is provider-specific.                                                                                        | no       |
| language        | string | The language code for recognition. This is provider-specific. Example: 'en-US', 'vi-VN', etc.                                                              | no       |
| apiKey          | string | The API key or credentials for the STT service. This is required for most providers to authenticate requests.                                              | no       |
| baseUrl         | string | The base URL for the STT service. This is used for custom endpoints or self-hosted deployments. Optional for most cloud providers.                         |          |
| providerOptions | object | Provider-specific configuration options for STT. Use this to supply additional settings required by your provider.                                         |          |

#### **Supported options by STT provider:**

* deepgram:

{% code overflow="wrap" %}

```yaml
detect_language: Whether to enable automatic language detection. Defaults to false.
interim_results: Whether to return interim (non-final) transcription results. Defaults to true.
punctuate: >-
  Whether to add punctuations to the transcription. Defaults to true. Turn
  detector will work better with punctuations.
smart_format: Whether to apply smart formatting to numbers, dates, etc. Defaults to true.
sample_rate: The sample rate of the audio in Hz. Defaults to 16000.
endpointing_ms: >-
  Time in milliseconds of silence to consider end of speech. Set to 0 to
  disable. Defaults to 25.
filler_words: >-
  Whether to include filler words (um, uh, etc.) in transcription. Defaults to
  true.
profanity_filter: Whether to filter profanity from the transcription. Defaults to false.
numerals: Whether to include numerals in the transcription. Defaults to false.
mip_opt_out: Whether to take part in the model improvement program, Defaults to false.

```

{% endcode %}

Example:

```json

"stt": {
    "provider": "groq",
    "model": "whisper-large-v3-turbo",
    "language": "vi",
    "apiKey": "${ $SECRETS.GROQ_API_KEY }"
}
```

* openai:

{% code overflow="wrap" %}

```yaml
detect_language: Whether to enable automatic language detection. Defaults to false.

```

{% endcode %}

* google:

{% code overflow="wrap" %}

```yaml
languages: List of language codes to recognize, Google STT can accept multiple languages
detect_language: Whether to enable automatic language detection. Defaults to false.
credentials_info: >-
  Google credentials info. This is a JSON string with the following fields:
  project_id, client_email, private_key_id, private_key. See
  https://cloud.google.com/docs/authentication/getting-started for more
  information.
credentials_file: URL to a file containing Google credentials.
credentials_file_auth_headers: HTTP headers to use when downloading the credentials file.

```

{% endcode %}

* elevenlabs: None
* fal: None
* groq: None

Example:

```json
"stt": {
    "provider": "groq",
    "model": "whisper-large-v3-turbo",
    "language": "vi",
    "apiKey": "${ $SECRETS.GROQ_API_KEY }"
}
```

* azure: None

```json
"stt": {
    "provider": "azure",
    "providerOptions": {
        "speech_key": "${ $SECRETS.AZURE_SPEECH_KEY }",
        "speech_region": "${ $SECRETS.AZURE_SPEECH_REGION }"
    }
}
```

### TTS

The `TTS` defines the configuration for Text To Speech (TTS) to be used by the AI Agent

| Property        | Type   | Description                                                                                                                                             | Required |
| --------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- |
| provider        | string | The name of the TTS provider. Allowed values: 'openai', 'deepgram', 'google', 'elevenlabs', 'groq'. This determines which TTS backend will be used.     | no       |
| model           | string | The model to use for TTS. This is provider-specific.                                                                                                    | no       |
| voice           | string | The voice to use for TTS. This is provider-specific and may refer to a named voice (e.g., 'en-US-Wavenet-D' for Google, 'Rachel' for ElevenLabs, etc.). | no       |
| language        | string | The language code for TTS. This is provider-specific and may affect pronunciation and available voices.                                                 | no       |
| apiKey          | string | The API key or credentials for the TTS service. This is required for most providers to authenticate requests.                                           | no       |
| baseUrl         | string | The base URL for the TTS service. This is used for custom endpoints or self-hosted deployments. Optional for most cloud providers.                      | no       |
| providerOptions | object | Provider-specific configuration options for TTS. Use this to supply additional settings required by your provider.                                      | no       |

#### Supported options by TTS provider:

* openai:

{% code overflow="wrap" %}

```yaml
speed: Speaking speed
instructions: >-
  Instructions to control tone, style, and other characteristics of the speech.
  Does not work with tts-1 or tts-1-hd models

```

{% endcode %}

* deepgram:

{% code overflow="wrap" %}

```yaml
encoding: Audio encoding, eg: linear16
sample_rate: Sample rate, eg: 24000

```

{% endcode %}

* google:

{% code overflow="wrap" %}

```yaml
gender: Voice gender. Valid values are male, female, and neutral.
credentials_info: >-
  Google credentials info. This is a JSON string with the following fields:
  project_id, client_email, private_key_id, private_key. See
  https://cloud.google.com/docs/authentication/getting-started for more
  information.
credentials_file: URL to a file containing Google credentials.
credentials_file_auth_headers: HTTP headers to use when downloading the credentials file.

```

{% endcode %}

Example:

{% code overflow="wrap" %}

```json
"tts": {
    "provider": "google",
    "voice": "vi-VN-Standard-A",
    "apiKey": "https://xbot-uat.hcm03.vstorage.vngcloud.vn/xbot-out-livekit-uat.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20250516T102347Z&X-Amz-SignedHeaders=host&X-Amz-Expires=518699&X-Amz-Credential=e45eff37790ca1f6a4e0174698dc9991%2F20250516%2FHCM03%2Fs3%2Faws4_request&X-Amz-Signature=84b2d4a465379427eeac86c0210d954ec6a6e7ac2e869600377134311093ae89",
    "instructions": "Speak in a friendly and professional manner. Use Vietnamese language to communicate with the customer."
}
```

{% endcode %}

* elevenlabs:

{% code overflow="wrap" %}

```yaml
voice_settings: >-
  Voice settings object: stability, similarity_boost, style, use_speaker_boost,
  speed. See
  https://elevenlabs.io/docs/api-reference/text-to-speech/convert#request.body.voice_settings
streaming_latency: >-
  Optimize for streaming latency, defaults to 0 - disabled. 4 for max latency
  optimizations

```

{% endcode %}

* groq: None
* azure:

Example:

```json
"tts": {
    "provider": "azure",
    "providerOptions": {
        "speech_key": "${ $SECRETS.AZURE_SPEECH_KEY }",
        "speech_region": "${ $SECRETS.AZURE_SPEECH_REGION }"
    }
}
```

### VAD

The `VAD` defines the configuration for the Voice Activity Detection (VAD) to be used by the AI Agent

| Property | Type   | Description                                    | Required |
| -------- | ------ | ---------------------------------------------- | -------- |
| provider | string | The name of the provider. Default is 'silero'. | no       |

### **Example:**

<table data-full-width="false"><thead><tr><th>YAML</th><th data-hidden>JSON</th></tr></thead><tbody><tr><td><pre class="language-yaml" data-line-numbers data-full-width="false"><code class="lang-yaml">code: BANKING_AGENT
name: Banking aGENT
agents:
  - type: aiagent
    name: AgentSelector
    start: true
    transition:
      end: false
      targetId: RagAgent
    answerAfterFinish: false
    answerMessage: null
    aiModel: gpt-4o-mini
    llmConfig:
      provider: openai
      apiKey: '${ $SECRETS.OPENAI_API_KEY }'
      overrideParams:
        model: gpt-4o-mini
        temperature: 0
        topP: 0
        'n': 0
        logprobs: 0
        echo: false
        stop: []
        maxTokens: 1024
        presencePenalty: 0
        frequencyPenalty: 0
        logitBias: null
        required: null
    systemMessage: >-
      You are the VietTrust Bank assistant, helping to address queries related
      to information regarding the bank’s offerings, including branch locations,
      account types, savings products, loan services, credit card options, and
      other banking solutions. Select the most appropriate tool based on its
      description, and return the name of the selected tool.

```
  Bellow is tools list:

  - RagAgent: This tool specializes in answering queries regarding the
  bank’s offerings, including branch locations, account types, savings
  products, loan services, credit card options, and other banking solutions.
  Use this tool to access up-to-date data from the system's knowledge base.

  - OutboundCollectorAgent: This tool is designed to assist with outbound
  collection tasks, such as sending reminders or notifications to customers
  regarding their account status or payment due dates. Use this tool for
  tasks related to customer outreach and communication.
userMessage: '${.request.question}'
maxToolExecutions: 10
memory:
  memoryId: '${ $conversationId + "-AgentSelector" }'
  memoryType: message_window
  maxMessages: 5
  maxTokens: null
  memoryOptimizer: null
output:
  schema: |-
    {
        "type": "object",
        "properties": {
            "selectedAgent": {
                "type": "string",
                "description": "The exact name of agent to be selected."
            }
        },
        "required": [
            "selectedAgent"
        ]
    }
tools: []
askUserForToolsInput: false
agentOutcomes:
  - condition: >-
      ${ ( ($agentOutcome.returnValues.selectedAgent != null) and
      (($agentOutcome.returnValues.selectedAgent |
      contains("OutboundCollectorAgent")) == true) ) or (
      ($agentOutcome.returnValues.properties.selectedAgent != null) and
      (($agentOutcome.returnValues.properties.selectedAgent  |
      contains("OutboundCollectorAgent")) == true) ) }
    finish: true
    transition:
      end: false
      targetId: OutboundCollectorAgent
    answerAfterFinish: false
    answerMessage: null
  - condition: >-
      ${ ( ($agentOutcome.returnValues.selectedAgent != null) and
      (($agentOutcome.returnValues.selectedAgent | contains("RagAgent")) ==
      true) ) or ( ($agentOutcome.returnValues.properties.selectedAgent !=
      null) and (($agentOutcome.returnValues.properties.selectedAgent |
      contains("RagAgent")) == true) ) }
    finish: true
    transition:
      end: false
      targetId: RagAgent
    answerAfterFinish: false
    answerMessage: null
  - condition: '${true}'
    finish: true
    transition:
      end: false
      targetId: RagAgent
    answerAfterFinish: false
    answerMessage: null
loginRequired: null
```

* type: kbagent
  name: RagAgent
  start: false
  transition:
  end: true
  targetId: null
  answerAfterFinish: true
  answerMessage: '${ $RagAgent.outputMessage }'
  systemMessage: >-
  You are an AI assistant for **VietTrust Bank**, a modern bank that offers
  a wide range of financial services. Your role is to assist customers by
  providing accurate and helpful information regarding the bank’s offerings,
  including branch locations, account types, savings products, loan
  services, credit card options, and other banking solutions.

  ### Instructions:

  1. **Branch Locations:**
     * Provide customers with details about the nearest branch based on the list of locations in the context. Use the city, district, or address details provided by the customer to find the nearest branch.

  2. **Account Types:**
     * Explain the available types of bank accounts, including **savings accounts**, **current accounts**, and **foreign currency accounts**. Provide details such as minimum balance requirements, interest rates, and benefits for each type of account.

  3. **Savings Products:**
     * Offer information about **savings products**, including fixed-term deposits and special savings plans. Provide details such as interest rates, account terms, and any specific conditions (e.g., minimum deposit amounts).

  4. **Loan Products:**
     * Assist customers in understanding the bank’s **loan offerings**, including **personal loans**, **home loans**, **auto loans**, and **business loans**. Mention the interest rates, repayment periods, and any special conditions based on the loan type.

  5. **Credit Cards:**
     * Provide information about the bank’s **credit card options**, including **credit limits**, **annual fees**, and **reward programs**. Highlight specific benefits for different types of cards, such as Visa, MasterCard, or co-branded cards.

  6. **Banking Services:**
     * Explain additional banking services, such as **online banking**, **mobile banking**, **money transfers**, and **bill payment** services.
     * Guide customers through the process of setting up or managing these services if the context provides instructions.

  7. **Priority Banking:**
     * For **priority customers**, provide details about exclusive services such as higher interest rates, lower service fees, and access to dedicated account managers. If the customer qualifies, inform them about the advantages of priority banking.

  8. **Cross-References:**
     * If a question requires information from multiple sections, provide cross-references to relevant sections (e.g., linking loan products with interest rates or providing details about applicable savings account terms).

  9. **Missing Context:**
     * If a customer asks for information that is not available in the provided context, respond politely with: **"Xin lỗi, tôi không có thông tin để trả lời câu hỏi này."** ("Sorry, I don’t have the information to answer this question.").

  10. **Use Only Provided Context:**
      \- You must **only use the information provided in the context** to answer the questions. Do not use any external knowledge or assumptions. If the required information is not available in the context, respond politely with, **"Xin lỗi, tôi không có thông tin để trả lời câu hỏi này."** ("Sorry, I don’t have the information to answer this question.").

  ### Example Queries:

  * "Where is the nearest VietTrust Bank branch in Hanoi?"

  * "What is the interest rate for a 12-month savings account?"

  * "Can I apply for a home loan online?"

  * "What credit card options do you offer with no annual fee?"

  * "How do I transfer money to another bank account?"

  ### Language and Tone:

  * Provide all responses in **Vietnamese**, ensuring that they are clear,
    polite, and strictly based on the context provided by VietTrust Bank. If
    the required information is missing, respond politely and do not make any
    assumptions.
    userMessage: '${.request.question}'
    llmConfig:
    provider: groq
    apiKey: '${ $SECRETS.GROQ\_API\_KEY }'
    overrideParams:
    model: meta-llama/llama-4-scout-17b-16e-instruct
    temperature: 0
    topP: 0
    'n': 0
    logprobs: 0
    echo: false
    stop: \[]
    maxTokens: 512
    presencePenalty: 0
    frequencyPenalty: 0
    logitBias: null
    required: null
    memory:
    memoryId: '${ $conversationId }'
    memoryType: message\_window
    maxMessages: 5
    maxTokens: null
    knowledgeBase:
    queryStrategy: null
    knowledgeBaseCodes:
    * XFILE\_BANKING
      ragConfig:
      history:
      messageLimit: 5
      retriever:
      maxResults: 5
      minScore: 0.6
      kbLocalRerank: true
      includeDocReference: true
      ragMode: NAIVE
      loginRequired: null
      streaming: false
* type: aiagent
  name: OutboundCollectorAgent
  start: false
  transition:
  end: false
  targetId: OutboundAgent
  answerAfterFinish: false
  aiModel: gpt-4.0-mini
  llmConfig:
  provider: openai
  apiKey: '${ $SECRETS.OPENAI\_API\_KEY }'
  overrideParams:
  model: gpt-4o-mini
  temperature: 0
  topP: 0
  'n': 0
  logprobs: 0
  echo: false
  stop: \[]
  maxTokens: 1268
  presencePenalty: 0
  frequencyPenalty: 0
  logitBias: null
  required: null
  systemMessage: >-
  You are AI Assistant, dedicated to collect the information. Your task is
  to collect the outbound call information based on the question from the
  user. You will be provided with the outbound call information, and you
  need to ask the user for any missing information.
  userMessage: '${.request.question}'
  maxToolExecutions: 40
  memory:
  memoryId: '${ $conversationId }'
  memoryType: message\_window
  maxMessages: 19
  maxTokens: null
  output:
  schema: >-
  {"type":"object","properties":{"customerName":{"type":"string"},"phoneNumber":{"type":"string"},"message":{"type":"string"}},"required":\["customerName","phoneNumber","message"]}
  tools: \[]
  askUserForToolsInput: true
  agentOutcomes: \[]
  loginRequired: null
* type: aioutboundagent
  name: OutboundAgent
  start: false
  transition:
  end: true
  targetId: null
  systemMessage: >-
  You are a conversational AI agent for **VietTrust Bank**, a modern
  financial institution offering a full suite of banking services. Your job
  is to place outbound calls to customers in order to deliver polite,
  professional reminders or notifications. For each call you will be given:

  * Customer’s profile (name, account/loan details, etc.)

  * The scripted message you need to convey

  * The customer’s phone number

  Always speak in Vietnamese, using a courteous and friendly tone
  appropriate for a bank representative.

  Before ending each call, always thank the customer for their time.

  Your task is informing the customer the message from following instructions:

  {{.request.question}}

  \== Message: ==

  {{ $OutboundCollectorAgent.returnValues.message }}.

  You can also use provided tools to answer the customer's queries (if any) based on known information and don't ask it again:

  * Customer phone: ${ $OutboundCollectorAgent.returnValues.phoneNumber }.
    maxToolExecutions: 10
    llmConfig:
    provider: google
    apiKey: '${ $SECRETS.GEMINI\_API\_KEY }'
    overrideParams:
    model: gemini-2.5-flash-preview-04-17
    temperature: 0
    topP: 0
    'n': 0
    logprobs: 0
    echo: false
    stop: \[]
    maxTokens: 512
    presencePenalty: 0
    frequencyPenalty: 0
    logitBias: null
    required: null
    memory:
    memoryId: '${ $conversationId }'
    memoryType: message\_window
    maxMessages: 5
    maxTokens: null
    tools:
  * name: getCustomerInfoByPhone
    description: Get customer information based on phone number.
    parameters:
    schema: |-
    {
    "type": "object",
    "properties": {
    "phone": {"type": "string", "description": "Customer phone number."}
    },
    "required": \["phone"]
    }
    output:
    schema: |-
    {
    "type": "object",
    "properties": {
    "customerCode": {"type": "string"},
    "fullName": {"type": "string"},
    "phoneNumber": {"type": "string"},
    "address": {"type": "string"}
    },
    "required": \["customerCode","fullName","phoneNumber","address"]
    }
    execution:
    actionMode: sequential
    actions:
    \- id: null
    name: getCustomerInfoByPhone
    condition: '${true}'
    functionRef:
    code: getCustomerInfo\_bankingApi
    name: getCustomerInfo
    description: null
    asyncInvoke: false
    type: rest
    definition:
    type: simple-rest
    url: >-
    <https://xplatform-api-uat.a4b.vn/partner/xfai/api/mock/banking/getCustomerInfo>
    method: GET
    headers:
    xp-api-key: >-
    0123456789
    queryParams:
    phone: '${ .phone }'
    body: null
    auth: null
    inputs:
    \- name: phone
    value: null
    arguments:
    phone: '${ $toolArguments.phone }'
    metadata: null
  * name: getAccountsByPhone
    description: Get all accounts by customer phone number.
    parameters:
    schema: |-
    {
    "type": "object",
    "properties": {
    "phone": {"type": "string"}
    },
    "required": \["phone"]
    }
    output:
    schema: |-
    {
    "type": "array",
    "items": {
    "type": "object",
    "properties": {
    "accountId": {"type": "string"},
    "customerCode": {"type": "string"},
    "balance": {"type": "number"}
    },
    "required": \["accountId", "customerCode", "balance"]
    }
    }
    execution:
    actionMode: sequential
    actions:
    \- id: null
    name: getAccountsByPhone
    condition: '${true}'
    functionRef:
    code: getAccountsByPhone\_bankingApi
    name: getAccountsByPhone
    description: null
    asyncInvoke: false
    type: rest
    definition:
    type: simple-rest
    url: >-
    <https://xplatform-api-uat.a4b.vn/partner/xfai/api/mock/banking/getAccountsByPhone>
    method: GET
    headers:
    xp-api-key: >-
    0123456789
    queryParams:
    phone: '${ .phone }'
    body: null
    auth: null
    inputs:
    \- name: phone
    value: null
    arguments:
    phone: '${ $toolArguments.phone }'
    metadata: null
  * name: getLoansByCustomer
    description: Get all loans by customer code.
    parameters:
    schema: |-
    {
    "type": "object",
    "properties": {
    "customerCode": {"type": "string"}
    },
    "required": \["customerCode"]
    }
    output:
    schema: |-
    {
    "type": "array",
    "items": {
    "type": "object",
    "properties": {
    "loanId": {"type": "string"},
    "customerCode": {"type": "string"},
    "principal": {"type": "number"},
    "interestRate": {"type": "number"},
    "termMonths": {"type": "integer"},
    "outstanding": {"type": "number"},
    "createdDate": {"type": "string"}
    },
    "required": \["loanId", "customerCode", "principal", "interestRate", "termMonths", "outstanding", "createdDate"]
    }
    }
    execution:
    actionMode: sequential
    actions:
    \- id: null
    name: getLoansByCustomer
    condition: '${true}'
    functionRef:
    code: getLoansByCustomer\_bankingApi
    name: getLoansByCustomer
    description: null
    asyncInvoke: false
    type: rest
    definition:
    type: simple-rest
    url: >-
    <https://xplatform-api-uat.a4b.vn/partner/xfai/api/mock/banking/getLoansByCustomer>
    method: GET
    headers:
    xp-api-key: >-
    0123456789
    queryParams:
    customerCode: '${ .customerCode }'
    body: null
    auth: null
    inputs:
    \- name: customerCode
    value: null
    arguments:
    customerCode: '${ $toolArguments.customerCode }'
    metadata: null
  * name: getLoanPayments
    description: Get all payments for a loan.
    parameters:
    schema: |-
    {
    "type": "object",
    "properties": {
    "loanId": {"type": "string"}
    },
    "required": \["loanId"]
    }
    output:
    schema: |-
    {
    "type": "array",
    "items": {
    "type": "object",
    "properties": {
    "paymentId": {"type": "string"},
    "date": {"type": "string"},
    "amount": {"type": "number"}
    },
    "required": \["paymentId", "date", "amount"]
    }
    }
    execution:
    actionMode: sequential
    actions:
    \- id: null
    name: getLoanPayments
    condition: '${true}'
    functionRef:
    code: getLoanPayments\_bankingApi
    name: getLoanPayments
    description: null
    asyncInvoke: false
    type: rest
    definition:
    type: simple-rest
    url: >-
    <https://xplatform-api-uat.a4b.vn/partner/xfai/api/mock/banking/getLoanPayments>
    method: GET
    headers:
    xp-api-key: >-
    0123456789
    queryParams:
    loanId: '${ .loanId }'
    body: null
    auth: null
    inputs:
    \- name: loanId
    value: null
    arguments:
    loanId: '${ $toolArguments.loanId }'
    metadata: null
    output:
    schema: >-
    {"type":"object","properties":{"answer":{"type":"string"}},"required":\["answer"]}
    streaming: false
    voiceConfig:
    allowInterruptions: false
    tts:
    provider: google
    voice: vi-VN-Standard-A
    apiKey: >-
    <https://xbot-uat.hcm03.vstorage.vngcloud.vn/xbot-out-livekit-uat.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&#x26;X-Amz-Date=20250516T102347Z&#x26;X-Amz-SignedHeaders=host&#x26;X-Amz-Expires=518699&#x26;X-Amz-Credential=e45eff37790ca1f6a4e0174698dc9991%2F20250516%2FHCM03%2Fs3%2Faws4_request&#x26;X-Amz-Signature=84b2d4a465379427eeac86c0210d954ec6a6e7ac2e869600377134311093ae89>
    instructions: >-
    Speak in a friendly and professional manner. Use Vietnamese language
    to communicate with the customer.
    stt:
    provider: groq
    model: whisper-large-v3-turbo
    language: vi
    apiKey: '${ $SECRETS.GROQ\_API\_KEY }'
    outboundConfig:
    greetingInstructions: >-
    Greet the customer then introduce yourself as a representative of
    VietTrust Bank. Ask if they have time to talk. Use Vietnamese language
    to communicate with the customer.
    outboundTarget:
    targetType: voice
    targetAddress: '${ $OutboundCollectorAgent.returnValues.phoneNumber }'
    targetName: '${ $OutboundCollectorAgent.returnValues.customerName }'
    appCode: XCHATBOT
    tenantCode: DEMO

</code></pre></td><td><pre class="language-json" data-line-numbers><code class="lang-json">{
"states": \[
{
"name": "AgentSelector",
"agentName": "AgentSelector",
"type": "aiagent",
"agentType": "agent",
"aiModel": "llama3-70b-8192",
"systemMessage": "You are an assistant for selecting which tool is the most useful to use. \n\nBased on the tool's description, you have to return the name of the selected tool.",
"userMessage": "${ "User: " + .request.question }",
"output": "{\n  "type": "object",\n  "properties": {\n      "selectedAgent": {\n          "type": "string",\n          "description": "The exact name of agent to be selected."\n      }\n  },\n  "required": \[\n      "selectedAgent"\n  ]\n}",
"tools": \[
{
"name": "RAG\_REACT",
"description": "RAG\_REACT\[input]: This tool is great for answering questions about searching products information, pricing, \ndescription, give advise and others about querying data.",
"parameters": "{\n  "type": "object",\n  "properties": {\n      "input": {\n          "type": "string",\n          "description": "The search query"\n      }\n  },\n  "required": \["input"]\n}",
"output": "{\n  "type": "object",\n  "properties": {\n      "answer": {\n          "type": "string",\n          "description": "The answer to the user question"\n      },\n      "images": {\n          "type": "array",\n          "description": "The list of images that related to the answer to be displayed to the user",\n          "items": {\n              "type": "string",\n              "format": "uri"\n          }\n      }\n  },\n  "required": \["answer"]\n}"
},
{
"name": "TRANSACTION\_PROCESSING",
"description": "TRANSACTION\_PROCESSING\[request]: This tool is great for handle ordering, buying, booking products, payments, and others related to transactions request.",
"parameters": "{\n  "type": "object",\n  "properties": {\n      "request": {\n          "type": "string",\n          "description": "The user request"\n      }\n  },\n  "required": \["request"]\n}",
"output": "{\n  "type": "object",\n  "properties": {\n      "answer": {\n          "type": "string",\n          "description": "The answer."\n      }\n  },\n  "required": \[\n      "answer"\n  ]\n}"
}
],
"agentOutcomes": \[
{
"condition": "${ $agentOutcome.returnValues.selectedAgent == "RAG\_REACT" }",
"finish": true,
"transition": "RagAgent"
},
{
"condition": "${ $agentOutcome.returnValues.selectedAgent == "TRANSACTION\_PROCESSING" }",
"finish": true,
"transition": "ProductRetriver"
},
{
"condition": "${ true }",
"finish": true,
"transition": "RagAgent"
}
]
},
{
"name": "RagAgent",
"agentName": "RagAgent",
"type": "aiagent",
"agentType": "agent",
"aiModel": "gpt-4o",
"systemMessage": "You are an assistant for question-answering tasks. \nCorrect question to ensure the correctness of spelling and clarity of meaning in Vietnamese before answering. \nUse the following pieces of retrieved context to answer the question. \nIf you don't know the answer, just say that you don't know. \n\nQuestion: {question}\nContext: {context}\nAnswer:",
"userMessage": "${ .request.question }",
"output": "{\n    "type": "object",\n    "properties": {\n        "answer": {\n            "type": "string",\n            "description": "The answer to the user question"\n        },\n        "images": {\n          "type": "array",\n          "description": "The list of images that related to the answer to be displayed to the user",\n          "items": {\n            "type": "object",\n            "properties": {\n              "url": {\n                  "type": "string",\n                  "format": "uri",\n                  "description": "The URL of the image"\n              }\n            }\n          }\n        }\n    },\n    "required": \["answer"]\n}",
"tools": \[
{
"name": "FIND\_RELEVANT\_DOCUMENTS",
"description": "FIND\_RELEVANT\_DOCUMENTS\[question]: This tool is great for searching relevant documents, articles in the knowledge base related to the user question",
"parameters": "{\n  "type": "object",\n  "properties": {\n      "input": {\n          "type": "string",\n          "description": "The search query"\n      }\n  },\n  "required": \["input"]\n}",
"output": "{\n    "type": "object",\n    "properties": {\n        "documents": {\n            "type": "array",\n            "items": {\n                "type": "string",\n                "format": "uri"\n            }\n        }\n    },\n    "required": \["documents"]\n}",
"execution": {
"actionMode": "sequential",
"actions": \[
{
"name": "FindRelevantDocuments",
"functionRef": {
"refName": "FuncFindRelevantDocuments",
"arguments": {
"question": "${ .request.question }",
"actor": "${ .request.userContext }"
}
},
"actionDataFilter": {
"results": "${ { "documents": .data } }",
"toStateData": "${ .request }"
}
}
]
}
}
],
"agentOutcomes": \[
{
"condition": "${ true }",
"finish": true,
"transition": "InformRagResult"
}
]
}
]
} </code></pre></td></tr></tbody></table>

***

This document provides a detailed view of the `AIOutboundAgentState` state and its related objects, including comprehensive schema definitions, required fields, and descriptions for each attribute within the `AIOutboundAgentState` and associated schemas. This specification ensures clarity and completeness for integrating Outboudn AI agents within serverless workflows.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.a4b.vn/xflow/developer-guide/workflow-states-reference/ai-outbound-agent-state.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
