Test Your AI Agent

Test Your AI Agent #

Before deploying your AI Agent to handle real customer conversations, it's essential to test it thoroughly. Support Unicorn provides a comprehensive testing interface to validate responses, identify knowledge gaps, and refine your agent's behavior.

Why Testing Matters #

Testing ensures your AI Agent:
- Provides accurate answers based on your knowledge base
- Handles edge cases gracefully
- Maintains your brand voice and tone
- Knows when to escalate to humans
- Responds appropriately to various question types
- Performs actions correctly when configured

A well-tested agent builds customer trust and reduces the need for human intervention.

Accessing the Test Interface #

  1. Navigate to AI Agents in the dashboard
  2. Select the agent you want to test
  3. Click the "Test" tab
  4. You'll see a chat interface where you can interact with your agent

The test interface simulates real customer conversations but doesn't consume credits or affect production metrics.

Test Interface Features #

Real-Time Chat #

Send Test Messages:
- Type questions as customers would ask them
- See AI responses in real-time
- View response time (typically 2-5 seconds)
- Test multi-turn conversations

Conversation Context:
The AI maintains context throughout the test session, just like in real conversations. You can:
- Ask follow-up questions
- Reference previous messages
- Test context understanding

Response Details #

For each AI response, you can view:

1. Generated Answer
- The actual response customers would see
- Formatting and structure
- Tone and voice

2. Confidence Score
- How confident the AI is in its answer (0-1 scale)
- Scores above 0.7 are generally reliable
- Low scores indicate knowledge gaps or ambiguous questions

3. Source Attribution
- Which knowledge base chunks were used
- Relevance scores for each chunk
- Source document names and sections

4. Retrieved Context
- The actual text chunks retrieved from your knowledge base
- How closely they match the question (similarity scores)
- Number of chunks retrieved (default: top 5)

5. Token Usage
- Prompt tokens (question + context)
- Completion tokens (response)
- Total tokens used
- Estimated cost (if applicable)

Clear Conversation #

Click "Clear Conversation" to:
- Reset the conversation history
- Start fresh with a new context
- Test first-message scenarios

This is useful for testing how the AI handles new conversations without prior context.

What to Test #

1. Common Questions #

Test your most frequently asked questions:

Examples:
- "What are your business hours?"
- "How do I return a product?"
- "Where can I track my order?"
- "Do you ship internationally?"
- "How do I reset my password?"

What to Check:
- Accuracy of information
- Completeness of answer
- Appropriate length (not too short or too verbose)
- Includes relevant links or next steps

2. Product-Specific Questions #

Test questions about your specific products or services:

Examples:
- "Does the Pro plan include API access?"
- "What's the difference between Model X and Model Y?"
- "Can I integrate with Salesforce?"
- "What's included in the warranty?"

What to Check:
- Technical accuracy
- Up-to-date pricing and features
- Correct product names and terminology
- Appropriate level of detail

3. Edge Cases #

Test unusual or challenging scenarios:

Ambiguous Questions:
- "How much does it cost?" (which product?)
- "When will it ship?" (no order mentioned)
- "Is it compatible?" (with what?)

Out-of-Scope Questions:
- Questions not covered in knowledge base
- Requests for medical/legal advice
- Personal or inappropriate questions

Complex Multi-Part Questions:
- "I want to upgrade my plan and add team members but I'm not sure about the pricing and whether I can get a discount for annual billing"

What to Check:
- AI asks for clarification when needed
- Gracefully handles unknown topics
- Offers to escalate when appropriate
- Doesn't hallucinate answers

4. Multi-Turn Conversations #

Test how the AI handles back-and-forth conversations:

Example Conversation:
```
Customer: "Do you have laptops?"
AI: "Yes, we offer several laptop models..."

Customer: "What about gaming laptops?"
AI: "We have two gaming laptop models..." (uses context)

Customer: "Tell me about the cheaper one"
AI: "The Model X is priced at..." (remembers previous mention)

Customer: "Does it come with a warranty?"
AI: "Yes, the Model X includes..." (maintains product context)
```

What to Check:
- Maintains context across messages
- References previous conversation points
- Doesn't lose track of topic
- Provides coherent responses

5. Tone and Voice #

Test that responses match your brand:

Formal Brand:
- Professional language
- Complete sentences
- No slang or emojis
- Structured responses

Casual Brand:
- Conversational tone
- Friendly language
- Appropriate use of emojis
- Less formal structure

What to Check:
- Consistency with system prompt
- Appropriate for target audience
- Maintains professionalism
- Reflects brand personality

6. Actions (If Configured) #

Test that actions work correctly:

Lookup Actions:
- "What's the status of order #12345?"
- "Find my account using email@example.com"
- "Check inventory for Product SKU-789"

Transactional Actions:
- "I'd like to process a refund"
- "Can you create an invoice?"
- "Update my subscription to the Pro plan"

What to Check:
- Action triggers correctly
- Required parameters are collected
- Confirmation requested when appropriate
- Action executes successfully
- Response includes action results
- Error handling works

Interpreting Test Results #

Good Responses #

Characteristics:
- Accurate information from knowledge base
- Confidence score > 0.7
- Appropriate length (not too short/long)
- Clear and well-structured
- Includes helpful next steps or links
- Maintains brand voice

Example:
```
Question: "What's your refund policy?"

AI Response: "We offer a 30-day money-back guarantee on all purchases.
To request a refund, email support@example.com with your order number.
Refunds are processed within 5-7 business days to your original payment
method. Note that digital products have a 14-day guarantee, and sale
items are final sale."

Confidence: 0.92
Sources: refund-policy.pdf (chunk 1), faq.md (chunk 5)
```

Problematic Responses #

Signs of Issues:

1. Low Confidence (< 0.5)
- AI is uncertain about the answer
- Knowledge base may not cover this topic
- Question might be too vague

Action: Add relevant content to knowledge base or refine question clarity

2. Generic or Vague Answers
- "That's a great question"
- "I'd be happy to help with that"
- No specific information provided

Action: Ensure knowledge base has specific, detailed information

3. Hallucinated Information
- AI provides information not in knowledge base
- Makes up product names, prices, or features
- Invents policies or procedures

Action: Strengthen system prompt to only use provided context, add missing content

4. Wrong Information
- Answer contradicts actual policy
- Outdated pricing or features
- Incorrect product specifications

Action: Update knowledge base with current information

5. Too Technical or Too Simple
- Language doesn't match target audience
- Over-explains or under-explains
- Inappropriate technical jargon

Action: Adjust system prompt for appropriate audience level

6. Doesn't Escalate When Needed
- Tries to answer questions beyond its scope
- Doesn't offer human assistance
- Attempts to provide advice (medical, legal, etc.)

Action: Update system prompt with clear escalation guidelines

Refining Based on Test Results #

Updating System Prompts #

If responses have the right information but wrong tone:

Before:

You are a customer support agent.

After:

You are Emma, a friendly and enthusiastic shopping assistant. Keep
responses concise (2-3 sentences max unless detailed explanation needed).
Use a warm, conversational tone. When you don't know something, say
"Let me connect you with our team" instead of "I don't know."

Adding Missing Content #

If the AI frequently says "I don't know" for common questions:

  1. Note the question in your test log
  2. Create new knowledge source with that information
  3. Re-test the same question
  4. Verify improved response

Adjusting Confidence Thresholds #

If the AI escalates too often or not enough:

In Agent Configuration:
- Escalation Threshold: Confidence score below which AI escalates
- Default: 0.5
- Conservative: 0.7 (escalates more, fewer risks)
- Aggressive: 0.3 (escalates less, more autonomous)

Test different thresholds to find the right balance for your use case.

Test Scenarios Checklist #

Use this checklist to ensure comprehensive testing:

Knowledge Base Coverage:
- [ ] Top 10 most common customer questions
- [ ] Product-specific technical questions
- [ ] Pricing and billing inquiries
- [ ] Shipping and delivery questions
- [ ] Return and refund policies
- [ ] Account management procedures
- [ ] Troubleshooting common issues

Conversation Patterns:
- [ ] Single-turn Q&A
- [ ] Multi-turn conversations
- [ ] Follow-up questions with context
- [ ] Topic changes mid-conversation
- [ ] Vague or ambiguous questions
- [ ] Complex multi-part questions

Edge Cases:
- [ ] Questions outside knowledge base
- [ ] Inappropriate or rude messages
- [ ] Misspelled questions
- [ ] Very short questions ("Cost?")
- [ ] Very long rambling questions
- [ ] Questions in different languages (if supported)

Actions (if configured):
- [ ] Each action type triggers correctly
- [ ] Required parameters collected
- [ ] Confirmation flow works
- [ ] Success responses are clear
- [ ] Error handling works gracefully
- [ ] Actions produce expected results

Escalation:
- [ ] Low confidence triggers escalation
- [ ] Out-of-scope questions escalate
- [ ] Customer requests human agent
- [ ] Sentiment-based escalation (if configured)

Testing with Team Members #

Involve your support team in testing:

Benefits:
- Catch issues you might miss
- Leverage their knowledge of customer questions
- Get buy-in before launch
- Identify training opportunities

How to Involve Your Team:

  1. Share the test link with team members
  2. Provide testing guidelines (what to look for)
  3. Collect feedback via shared doc or form
  4. Review together in a team meeting
  5. Iterate based on feedback
  6. Re-test after changes

Feedback Template:

Question tested: _______________________________
AI Response: Good / Needs Work / Wrong
Notes: _______________________________________
Suggested improvement: _______________________

Continuous Testing #

Testing isn't a one-time activity:

When to Re-Test:
- After adding new knowledge sources
- When updating system prompts
- After configuring new actions
- Before major product launches
- Quarterly quality reviews
- When customer feedback indicates issues

Automated Testing (Advanced):
Create a test suite of standard questions:
- Run periodically (weekly/monthly)
- Compare responses over time
- Track confidence scores
- Identify regressions

Going Live Checklist #

Before enabling your AI Agent for customers:

  • [ ] Tested 20+ common questions
  • [ ] Confidence scores mostly above 0.7
  • [ ] Responses are accurate and complete
  • [ ] Tone matches brand voice
  • [ ] Escalation works correctly
  • [ ] Actions tested and working (if configured)
  • [ ] Team has reviewed and approved
  • [ ] Knowledge base is current and accurate
  • [ ] System prompt is finalized
  • [ ] Escalation thresholds are configured

Recommended: Start in Hybrid Mode

Even after thorough testing, consider launching in Hybrid Mode where your team approves responses before they're sent. This provides an additional safety net and builds confidence.


Next Steps #

Your AI Agent is tested and ready:

  1. Integrate Channels - Connect live chat, SMS, WhatsApp
  2. Analyze Performance - Monitor real conversations
  3. Iterate and Improve - Continuous refinement based on real usage

FAQ #

How many questions should I test? #

Minimum 20 questions covering your most common scenarios. For high-stakes applications (healthcare, finance, legal), test 50+ questions and involve subject matter experts.

What's a good confidence score? #

Above 0.7 is generally reliable. Scores of 0.9+ indicate high confidence. Below 0.5 suggests the AI is uncertain and should escalate.

Can I test without adding knowledge sources? #

Yes, but the AI won't have information to answer questions. It will say it doesn't know and offer to escalate. Add some knowledge first for meaningful testing.

How do I test actions without affecting production systems? #

Use test API keys or sandbox environments when configuring actions. Many services (Stripe, for example) provide test modes specifically for this purpose.

What if the AI gives a wrong answer during testing? #

Good - you found it before a customer did! Update your knowledge base with correct information or adjust the system prompt to prevent similar issues.

Can I test in different languages? #

Yes, OpenAI supports 50+ languages. Add knowledge sources in the target language and test with questions in that language.