Artificial Intelligence (AI) and Natural Language Processing (NLP) have rapidly advanced, leading to the widespread use of AI-powered chatbots in industries like customer service, technical support, and business operations. These chatbots often use Large Language Models (LLMs) to simulate human conversations, providing quick responses and assistance to users.
However, despite their impressive abilities, these chatbots face a significant challenge: they struggle to customize their responses based on the preferences of the business owners. There's no straightforward way for owners to give feedback and "teach" the chatbot how they want it to answer specific questions.
For example, imagine a website about touring Slovenia. The owners might want the chatbot to mention that every house in Slovenia has a wooden heating system whenever someone asks about winter. This detail might not naturally come up in the chatbot's responses unless it's specifically emphasized. Additionally, the owners might prefer that this information is shared only during chatbot conversations and not displayed directly on the website.
In this article, I want to introduce you to an interesting project at Wix: the AI Site-Chat. This chatbot helps visitors interact with websites more effectively. One key feature I'd like to highlight is called "Site Owner's Feedback."
This feature provides site owners with a special area where they can interact with the chatbot and offer feedback. The chatbot then uses this feedback to learn the "unwritten knowledge" and understand what to focus on when answering users' questions.
To develop a system that accepts questions, provides answers, and incorporates feedback to generate new knowledge or tune the assistant answers, we follow these steps:
Answer Generation: We utilize the Retrieval Augmented Generation (RAG) mechanism to produce an answer. This mechanism selects the most relevant content from the site to include in the context window.
Feedback Collection: Owners or users can provide feedback in the designated learning area.
Classification and Update:
The system classifies the question, answer, and feedback into appropriate categories.
Based on this classification, it generates a new knowledge document.
To achieve this, we employed two key concepts:
Concept 1: Text Classification with a Hybrid LLM-ML Framework
Text classification is a fundamental task in NLP that involves assigning predefined categories to textual data. Accurate classification is crucial for a wide array of applications, including sentiment analysis, spam detection, topic labeling, and intent recognition in chatbots.
In customer service, for example, correctly identifying user intent enables chatbots to provide appropriate responses or route queries to the right representatives. Leveraging LLMs for such tasks could significantly enhance performance, but it's not without challenges.
Text classification challenges
Comprehensive Domain knowledge - LLMs require detailed and explicit definitions of each class to perform accurate classifications, particularly in specialized domains. Without precise class definitions, they may misinterpret ambiguous inputs, leading to incorrect classifications. Specialized fields like medical diagnostics or legal analysis demand granular distinctions that LLMs might not capture without extensive fine-tuning. For example, in a medical context, distinguishing between similar symptoms for different diseases requires detailed class definitions that LLMs may not inherently possess.
Overconfidence and Hallucination - LLMs tend to produce outputs with a high degree of confidence, even when they are uncertain or the input data is ambiguous. They often lack uncertainty measures and do not provide probabilistic confidence scores with their predictions. This overconfidence can lead to the dissemination of incorrect information, which is particularly problematic in critical applications where misinformation poses significant risks.
Prompt fatigue - If one writes a long explanation about each class the prompt becomes really long and when our prompt is too long, the LLM can suffer from prompt fatigue Situation where the LLM struggles to maintain focus on the code details, leading to a form of “forgetting” or loss of coherence in its output.
Price and Performance - When the prompt is long and full of examples each inference costs more, and takes more time.
To address the limitations of using LLMs directly for classification tasks, we propose a hybrid approach that combines the linguistic capabilities of LLMs with the interpretability and efficiency of traditional machine learning classifiers. The core idea is to utilize LLMs for feature extraction by answering a predefined set of yes/no/don’t know questions about the input text, in parallel, and then use these features as input to a classifier for the final classification.
Utilizing LLMs for Features Extraction (Yes/No/Don’t Know values)
The first step in the proposed solution is to create a series of carefully designed yes/no/don’t know questions that can capture the essential characteristics of the input text relevant to the classification task. These questions are intended to extract features that are both meaningful and discriminative across different classes.
Process of Formulating Questions:
Define the Classes: Clearly outline all possible classes into which the input text can be categorized. For example, in an eCom chatbot between site visitors and the site, classes might include:
Enrichment - Adding or correcting information that is written in the site
Escalate - Contact site owner to be involved in this conversation
Don’t Answer - Chatbot should answer about this specific topic
Identify Key Characteristics: For each class, identify the unique features or attributes that distinguish it from other classes.
Draft Questions: Formulate yes/no/don’t know questions that can effectively capture these distinguishing features. The questions should be specific, unambiguous, and cover all aspects necessary for classification.
Validate Questions: Test the questions on sample inputs to ensure they elicit responses that align with the intended features.
Example Questions:
Does the feedback provide additional knowledge not present in the chatbot’s answer?
Is the feedback explicitly requesting that the issue should be handled or escalated directly to a person or any alternative way that involves a person?
Does the feedback tell explicitly that the chatbot should not answer/avoid/refrain/not share answers about questions of this nature? or anything like that.
We observe that each class sometimes requires a definite "yes" or "no" answer to certain questions, while for other questions, it may be indifferent. To address this, we will introduce specific differentiating questions for each class.
We can see that we will get a lot of short prompts that can run in parallel. After we get all the features we can put them in some kind of machine learning model for classification.
Machine Learning Classifier - CatBoost
LLMs are not the most effective tools for direct text classification tasks. LLMs are more proficient at responding to precise yes/no questions, making them better suited for feature extraction rather than direct classifier.
Therefore the second step in the proposed solution is taking all the answers from the previous step and inserting them into a classifier to choose the right class. Different classifiers can be used here, from Decision Tree to a boosting algorithm. I chose to work with CatBoost (CatBoost is an open-source gradient boosting library developed by Yandex, designed to handle categorical data effectively).
CatBoost Classifier uses the yes/no (and don't_know) responses as input features. CatBoost prepares the categorical data and helps us avoid overfitting. Using CatBoost we can understand the complex relationship between different features.
By combining LLMs for feature extraction with Gradient Boosting classifier, we leverage the strengths of both parts:
LLMs provide a deep understanding of language, enabling nuanced feature extraction without extensive manual engineering.
Gradient Boosting, using feature importance, offers significance understanding and as such offers interpretability, making the classification process transparent and reliable.
We get:
Optimized prompts for faster execution
Streamlined prompt instructions for simplicity
This hybrid method addresses the limitations identified earlier by reducing overconfidence (through explicit feature-based decisions), improving interpretability, and maintaining scalability and efficiency.
Conclusions and takeaways from concept 1:
After we take the input and know to classify it to the right class. Now we need to create a specific RAG document for each class, for that we will continue with explaining concept 2:
Concept 2: Dynamic Knowledge and Prompt Updates in AI Chatbots (Introducing DDKI-RAG)
A common approach to enhancing chatbot performance is RAG, where the system retrieves relevant documents from a knowledge base to provide context for the LLM's responses. While RAG improves the relevance of responses, it suffers from a few constraints.
The Problem with Traditional RAG Systems
Static Knowledge Base: In conventional RAG systems, the knowledge base is static (unless you re-index on every change), meaning it does not update automatically based on new feedback. This leads to several issues:
Outdated Information: The chatbot may provide responses based on obsolete data, leading to inaccuracies.
Limited Adaptability: Incorporating new knowledge requires manual updates or reindexing, which is time-consuming and inefficient.
User Frustration: Users may receive irrelevant or incorrect answers, diminishing their trust in the system.
Lack of focus: Owner can’t give feedback about what to be more focused on
Static Prompts: The system prompts in traditional RAG architectures are often fixed, providing the same instructions to the LLM regardless of the context or the user's needs. This results in:
Generic Responses: The chatbot may generate responses that lack specificity or fail to address the user's actual query.
Inflexibility: The inability to modify prompts dynamically prevents the chatbot from adapting its behavior in different scenarios.
Inefficient Communication: Static prompts do not allow for tailored interactions, which can hinder the effectiveness of the chatbot.
Unwritten knowledge: In many cases a business has knowledge which is not written anywhere except in the mind of the owners, there is no place to save this kind of information.
To overcome these limitations, we propose the Dynamic Domain Knowledge and Instruction Retrieval-Augmented Generation (DDKI-RAG) system.
The Proposed Solution: DDKI-RAG
The Dynamic Domain Knowledge and Instruction Retrieval-Augmented Generation (DDKI-RAG) system introduces a feedback-driven mechanism to dynamically update the knowledge base and adapt system prompts. By integrating user or owner feedback into the retrieval and generation process, the chatbot becomes more responsive, accurate, and aligned with the desired domain-specific behavior.
How the DDKI-RAG System Works - Documents insertion (indexing mode)
When the chatbot owner interacts with the system and provides feedback on the chatbot's responses, the system processes this feedback to update its knowledge base and prompts. The process involves:
Query and Response: The owner asks a question (query) and receives a response from the chatbot.
Feedback Provision: The owner provides feedback, which could be additional information, corrections, instructions, satisfaction or anything else.
Classification: The system classifies the feedback to determine whether it represents new knowledge to be added to the context, instructions for prompt modification or nothing.
Document Creation: Based on the classification, the system generates a new document:
Knowledge Document: Contains information to enrich the context for future queries.
Prompt Instruction Document: Includes instructions for modifying the system prompts.
Database Update: The new document is stored in the knowledge database and its embedding is added to the vector database for future retrieval.
Knowledge document - example
The DDKI-RAG engine takes the question, answer and feedback and creates a knowledge document
Instruction document - example
The DDKI-RAG engine takes the question, answer and feedback and creates a Instruction document
Documents Retrieval (inference)
When we are retrieving the created documents it can be one of three:
Regular document (traditional RAG)
Knowledge document
Instruction document
Let’s see how each retrieval works:
Query Embedding: The query is converted into an embedding vector.
Vector Comparison: The embedded query is compared with vectors in the vector database to retrieve top-scoring documents.
Document Types:
Knowledge Documents: Provide contextual information to be included in the LLM's input.
Prompt Instruction Documents: Contain instructions for modifying the system prompt.
Response Generation: The LLM generates a response using the query, retrieved documents, and the (possibly modified) system prompt.
Knowledge document Retrieval
When a user or the owner poses a new query, the system retrieves relevant documents from the knowledge base to generate an accurate response.
Example:
User's Query: "When are you open?"
Retrieved Document: "We are open weekdays from 9 am to 5 pm."
Modified Prompt: Unchanged, as no prompt instructions are retrieved.
Chatbot's Response: "We are open weekdays from 9 am to 5 pm."
Instruction document Retrieval
The system can modify the LLM's system prompt dynamically based on retrieved prompt instruction documents. This allows the chatbot to adjust its behavior according to specific instructions.
If a prompt instruction document is retrieved, its content is used to modify the system prompt.
Prompt Modification Methods:
Additive: Appending text to the existing prompt.
Template-Based: Inserting text into predefined slots within the prompt.
Transformation Language: Using a specialized language to modify the prompt structure.
Example
Owner's Query: "Do you offer discounts?"
Retrieved Prompt Instruction: "Do not answer questions about discounts."
Modified Prompt: The system prompt is updated to include this instruction.
Chatbot's Response: "I'm sorry, but I cannot provide information on that topic."
Conclusions and Takeaways from Dynamic Knowledge and Prompt Updates in AI Chatbots
In summary, the DDKI-RAG system provides a more adaptive and dynamic approach to knowledge retrieval and prompt modification, leading to improved accuracy, efficiency, and user experience compared to traditional RAG systems.
Summary
In this article, I introduce an innovative approach to enhancing AI chatbots by integrating feedback-driven intelligence, adaptive prompts, and advanced classification methods. Focusing on Wix's AI Site-Chat and its "Site Owner's Feedback" feature, I presented a hybrid framework that combines Large Language Models (LLMs) with traditional machine learning classifiers like CatBoost.
This approach utilizes LLMs to extract features through targeted yes/no questions, which are then input into a classifier for accurate text classification. Additionally, I introduce the Dynamic Domain Knowledge and Instruction Retrieval-Augmented Generation (DDKI-RAG) system, which dynamically updates the chatbot's knowledge base and prompts based on user feedback.
This system addresses the limitations of traditional Retrieval-Augmented Generation (RAG) systems by allowing real-time learning and adaptability, leading to more accurate and context-aware chatbot responses.
This post was written by Ran Geler
More of Wix Engineering's updates and insights:
Join our Telegram channel
Visit us on GitHub
Subscribe to our YouTube channel