Human-like Conversational AI

Human-like Conversational AI

Combining Semantic Search with Large Language Models for Enriched Interactions

Embarking on a journey towards human-like Conversational AI represents a significant milestone in the realm of artificial intelligence. Imagine engaging with a virtual assistant not just to retrieve information but to truly converse and connect. Picture this:

User: "What's the best Italian restaurant nearby?"

Response from a traditional Conversational AI bot: "There's a place called Bella Italia on Main Street."

Response from a LLM-enabled Conversational AI bot: "Are you in the mood for classic pasta dishes or perhaps wood-fired pizza? There's a cozy spot called Trattoria Da Vinci known for its authentic cuisine and warm atmosphere, located just a few blocks away. Shall I provide directions?"

This scenario highlights the transformative potential of combining Semantic Search with Large Language Models (LLMs) in Conversational AI. By infusing contextual understanding and natural language generation capabilities, interactions become not just informative but deeply enriching and personalized. Let us take another example from the world of companion bots. 

Let’s take a user conversing with his/her robotic companion:

User: "Tell me about dinosaurs!"

Response from a Traditional Companion Robot: "Dinosaurs lived millions of years ago. They were big animals that are now extinct."

Response from a LLM-enabled Companion Robot: "Sure thing! Did you know that dinosaurs roamed the Earth millions of years ago, ruling the land, sea, and sky? For instance, the mighty Tyrannosaurus Rex was a fearsome predator, while the gentle Brachiosaurus towered above the treetops, munching on leaves. How about we embark on a virtual journey to explore the Jurassic period together? We can encounter different dinosaur species, learn about their habitats, and even witness epic battles between predators and prey. Ready for an adventure?"

There is a world of difference between the two responses, the latter being not only informative but also immersive. In this blog, we will look into how Miko integrates Semantic Search with Large Language Models to achieve human-like seamless robotic conversations.

Traditional Process Flow from Query to Response in Conversational AI

As highlighted in our previous blog post (TBD: Add a link), the conversation between a human and a robot begins with the user entering or speaking a query. This query undergoes semantic parsing, which involves preprocessing and extracting embeddings or features that capture the semantic meaning of the query using advanced techniques like BERT.


Next, the system performs intent inference through similarity search or classifier models, determining the user's intent based on the query's content. This involves comparing the content of the user's query to a database of known intents or patterns to identify the user's underlying purpose or goal. For example, if the user asks about the weather forecast, the system recognizes the intent as seeking weather information. 

Finally, the robot formulates a response, drawing from a repository of pre-written responses, which are crafted in advance to cover a range of common queries. The responses may be retrieved from a database or a third party API. This response is then delivered to the user.

Integrating LLMs into Conversational AI

To achieve human-like and immersive interactions, Large Language Models or LLMs can be employed in the response formulation phase of the above process flow. While formulating the response based on the inferred intent, the system can utilize an LLM to dynamically generate responses tailored to the context of the user query and the identified intent, deviating from the conventional approach of relying solely on pre-written responses. However, to seamlessly integrate LLMs, the conversational AI process flow requires specific enhancements, as outlined below.

Step 1: Create a contextual knowledge base (DB)

The process of training an LLM is time consuming and data intensive. LLMs like ChatGPT are trained on data up to a specific point in time, and their knowledge extends only until that cutoff date. As a result, they may not be aware of developments or information that occurred after that particular time frame. Therefore, it is necessary to augment the knowledge of LLMs through relevant and up-to-date contextual information based on the user query. 

A second point to be considered is the appropriateness of the LLM response for specific applications. LLMs are trained on extensive knowledge bases, enabling them to provide a wide range of responses. However, in certain applications like a child companion bot, it is crucial to not only ensure faster responses but also guarantee age appropriate interactions. Restricting the LLM to operate within a specific context can significantly enhance its performance, both in terms of speed and relevance, by eliminating inappropriate responses and tailoring its output to better suit the intended purpose.

To create a rich, contextual knowledge base, wikipedia pages and relevant documents are chunked into passages of 200-500 words, rather than sentences or phrases. The chunking strategy depends on the use-case, accuracy, latency and other aspects. These passage chunks are vectorized (similar to sentence embeddings) through an embedding model, and an index of these vectors is created to enable quick and fast search (eg: ANNOY and FAISS (add link to blog 3)). The original passages and document sections are stored in a database.

Step 2: Retrieve Relevant Context Based on the Query (Retriever)

In this step, the input user query is vectorized during the inference stage using a similar embedding model and an ANN (Approximate Nearest Neighbor) (add link to blog 3) search is performed to retrieve the top K relevant vectors from the contextual knowledge base. The mapping of vector indices to actual text content can be kept in a separate database or the same vector database and the top K relevant vectors are matched to their respective passages. These are then fed to a re-ranker model. 

The re-ranker re-ranks the top K passages through a combined lexical and semantic re-ranking approach. This step improves the precision and relevance of the retrieved context. 

Step 3: Engineer and Augment the Prompt and Feed to the LLM (Prompt Engineering)

The prompt or user query, now augmented with the retrieved context, is fed to the LLM. This step also enforces strict rules and policing to ensure that inappropriate responses and LLM induced hallucinations are minimized. 

Step 4: Generate Response using the LLM based on the Augmented Context (Generator)

A large language model enhanced by the additional contextual information is used to generate accurate and contextually relevant responses.

Step 5: Perform Profanity and Safety Checks (Moderation Layer)

The response produced by the LLM undergoes validation and sanitization through a moderation layer before being conveyed to the user. This layer implements rules and checks to uphold child safety standards, including screening for profanity, gender bias, racial bias and political views. To ensure real-time response, caching techniques are used. 

Step 6: Periodically Update the Knowledge Base (Data cycle)

To ensure current contextual information for accurate and relevant responses, the knowledge base is periodically updated. New and latest sources of information are scanned, passages retrieved, vectorized and indices stored for future retrieval.

The chart below summarizes the above steps. This is also referred to as Retrieval Augmented Generation or RAG. 


Enriching Conversations: Human-like Interactions with LLMs in AI

The integration of Large Language Models (LLMs) into conversational AI systems opens up vast possibilities for enhancing interactions between humans and machines. By leveraging LLMs, we can create more dynamic, contextually-aware responses that better cater to the needs and preferences of users. However, it's crucial to recognize that LLMs have a knowledge cutoff point, necessitating ongoing efforts to augment their knowledge with relevant and recent information. With careful customization based on the application at hand, such as child companion bots, we can ensure that LLM-powered conversational AI systems provide meaningful, engaging experiences that continue to evolve and improve over time. As we continue to refine and innovate in this field, the potential for LLMs to transform conversational AI into truly human-like interactions is within reach. Look out for upcoming blogs where we'll delve deeper into each of these steps!


Back to blog