How to Build a Chatbot with Generative Models like GPT-4, ChatGPT, LLaMA 2, and Mixtral 8x7b

As the demand for conversational AI continues to grow, so does the need for advanced chatbot technologies that can provide personalized, human-like interactions. In recent years, generative models such as GPT-4 and ChatGPT, and open-source alternatives like LLaMA 2 and Mixtral 8x7b have emerged as promising tools for building chatbots that can understand and respond to natural language input with unprecedented accuracy and sophistication.

In this article, we'll explore the basics of generative models and how they can be used to build chatbots.

Chatbot and conversational AI

LLaMA 2 and Mixtral 8x7b: open-source alternatives to ChatGPT and GPT-4

ChatGPT and GPT-4 are two advanced language models developed by OpenAI. ChatGPT, short for "Chat Generative Pre-training Transformer," is a large language model that can generate human-like text based on its training data. It was introduced in November 2022 and quickly gained widespread attention for its ability to interact with users in a conversational manner, answering questions, providing information, and engaging in various tasks.

GPT-4, or "Generative Pre-training Transformer 4," is the successor to GPT-3 and was announced by OpenAI in March 2023. It represents a significant leap in the field of AI language models, boasting an even larger size and enhanced capabilities compared to its predecessors. GPT-4 is capable of generating highly detailed and accurate text in a wide range of domains, including natural language processing, computer programming, and creative writing.

Both ChatGPT and GPT-4 are trained on vast amounts of data using unsupervised learning, enabling them to understand and generate human language with remarkable accuracy and fluency. These models have opened new possibilities for the development of conversational AI, content generation, and various other applications in industries such as customer service, education, and entertainment.

Shortly after that, LLaMA 2 was released by Meta, and Mixtral 8x7b was released by the French AI startup Mistral AI. These generative models are open-source alternatives to ChatGPT and GPT-4. They are very good candidates if you want to build an advanced chatbot. You can either deploy LLaMA 2 and Mixtral on your own servers, or easily use them through the NLP Cloud API.

All these generative AI LLMs take a bit of practice though. First because these models need to be given the right prompts in order to behave as expected. And also because they are "stateless", meaning that they don't keep an history of your conversations.

Using the Right Prompt For your Chatbot

If you naively send requests to these models without a bit of context and formatting, you will be disappointed by the responses. This is because these models are very versatile. They can not only help create chatbots, but also many other applications like question answering, summarization, paraphrase, classification, entity extraction, product description generation, and much more. So the first thing you need to do is tell the model which "mode" he should adopt.

Here is a request example you could send:

This is a discussion between a [human] and an [ai]. 
The [ai] is very nice and empathetic.

[human]: I broke up with my girlfriend...
[robot]:

In this example, you can note 2 things.

First, we added a simple formatting in order for the model to understand that it is in conversational mode: ([human], [ai], ...).

Secondly, we added some context at the top in order to help the model understand what it is doing and the tone it should use.:

In order to make this process simpler, both OpenAI and NLP Cloud propose dedicated chatbot API endpoints, that take care of this formatting for you.

Sometimes a context is not enough. For example imagine that you want to create a chatbot with a very specific tone and character. In that case you will want to fine-tune your own generative model. You can fine-tune your own chatbot based on generative AI on OpenAI and on NLP Cloud.

Another scenario is when you want to create a chatbot that answers questions about specific domain knowledge. In that case fine-tuning is not the solution. You will want to create your own retrieval augmented generation (RAG) system instead based on semantic search. See our dedicated article about RAG and semantic search here.

Maintaining a Conversation History For your Chatbot

Generative AI models are "stateless" models, meaning that every request you make is new and the AI is not going to remember anything about the previous requests you made.

For many use cases it's not a problem (summarization, classification, paraphrase...), but as far as chatbots are concerned it's definitely an issue because we do want our chatbot to memorize the discussion history in order to make more relevant responses.

For example, if you tell the AI that you're a programmer, you want it to keep it in memory because it will have an impact on the following responses it will make.

The best way to achieve this is to store every AI response in a local database. For example, the PostgreSQL database supports long texts storing, with a very good efficiency.

Then, everytime you're making a new request to the chatbot, you should do the following:

1. Retrieve the conversation history from the local database
2. Add your actual request to the conversation history
3. Send the whole request
4. In your local database, replace your old history with the response from the AI

This is both a versatile and robust system that requires little effort, and perfectly leverages the power of generative models like GPT-4, ChatGPT, LLaMA 2, and Mixtral.

It is important to note that each model has its own context size that will determine how much text you can pass to the history. For example the current context size for GPT-4 is 8k tokens (i.e. more or less 7k words) and the current context size of Mixtral 8x7b is 16k tokens on NLP Cloud (i.e. more or less 14k words). So if your conversation history goes above this, you might want to either truncate the oldest part of the history, or only retain the most important parts of the discussions.

Content Restrictions for a Chatbot

OpenAI has implemented content restrictions on ChatGPT and GPT-4 to ensure that the AI-generated text adheres to their guidelines. By monitoring and regulating the content generated by chatbots, OpenAI aims to create a more positive and reliable user experience. This includes blocking requests for information on certain topics or providing only pre-vetted, trustworthy information.

Some prefer to use generative models that do not come with such restrictions though, and find the quality of the responses more diverse and accurate. LLaMA 2 and Mixtral 8x7b don't have such restritions. When using such AI models, it is the responsibility of the developer to use AI responsibly. If needed, limitations can still be implemented by creating the right prompt for the chatbot, by fine-tuning your own chatbot, or by filtering user requests before they reach the AI model.

Conclusion

Generative AI models like GPT-4, ChatGPT, LLaMA 2, and Mixtral 8x7b, really took chatbots and conversational AI to the next level. These advanced models are very good at understanding your context and adapting to it. In most cases, setting the right context is enough, but for advanced use cases the best solution is to train/fine-tune your own AI model (which is fairly easy as these models require very small datasets).

On NLP Cloud you can easily try LLaMA 2 and Mixtral 8x7b among other models. You can also fine-tune them and deploy your own private generative AI models in one click. If not done yet, try NLP Cloud for free.

If you have questions about how to implement your own chatbot, please don't hesitate to contact us!

François
Full-stack engineer at NLP Cloud