Effectively using GPT-J and GPT-Neo, the GPT-3 open-source alternatives, with few-shot learning

GPT-J and GPT-Neo, the open-source alternatives to GPT-3, are among the best NLP models as of this writing. But using them effectively can take practice. Few-shot learning is an NLP technique that works very well with these models.

GPT-J and GPT-Neo

GPT-Neo and GPT-J are both open-source NLP models, created by EleutherAI (a collective of researchers working to open source AI).

GPT-J has 6 billion parameters, which makes it the most advanced open-source NLP model as of this writing. This is a direct alternative to OpenAI's proprietary GPT-3 Curie.

These models are very versatile. They can be used for almost any NLP use case: text generation, sentiment analysis, classification, machine translation,... and much more (see below). However using them effectively sometimes takes practice. Their response time (latency) might also be longer than more standard NLP models.

GPT-J and GPT-Neo are both available on the NLP Cloud API. Below, we're showing you examples obtained using the GPT-J endpoint of NLP Cloud on GPU, with the Python client. If you want to copy paste the examples, please don't forget to add your own API token. In order to install the Python client, first run pip install nlpcloud.

Few-Shot Learning

Few-shot learning is about helping a machine learning model make predictions thanks to only a couple of examples. No need to train a new model here: models like GPT-J and GPT-Neo are so big that they can easily adapt to many contexts without being re-trained.

Giving only a few examples to the model does help it dramatically increase its accuracy.

In NLP, the idea is to pass these examples along with your text input. See the examples below!

Also not that, if few-shot learning is not enough, you can also fine-tune GPT-J on NLP Cloud so the model is perfectly tailored to your use case.

You can easily test few-shot learning on the NLP Cloud playground.

Sentiment Analysis with GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Message: Support has been terrible for 2 weeks...
            Sentiment: Negative
            ###
            Message: I love your API, it is simple and so fast!
            Sentiment: Positive
            ###
            Message: GPT-J has been released 2 months ago.
            Sentiment: Neutral
            ###
            Message: The reactivity of your team has been amazing, thanks!
            Sentiment:""",
            length_no_input=True,
            end_sequence="\n###",
            remove_input=True)
print(generation["generated_text"])

Output:

Positive

As you can see, the fact that we first give 3 examples with a proper format, leads GPT-J to understand that we want to perform sentiment analysis. And its result is good.

### is an arbitrary delimiter that helps GPT-J understand the different sections. We could perfectly use something else like --- or simply a new line. Then we set end_sequence="\n###" which is an NLP Cloud parameter that tells GPT-J to stop generating content after a new line + ###.

HTML code generation with GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""description: a red button that says stop
    code: <button style=color:white; background-color:red;>Stop</button>
    ###
    description: a blue box that contains yellow circles with red borders
    code: <div style=background-color: blue; padding: 20px;><div style=background-color: yellow; border: 5px solid red; border-radius: 50%; padding: 20px; width: 100px; height: 100px;>
    ###
    description: a Headline saying Welcome to AI
    code:""",
    max_length=500,
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

<h1 style=color: white;>Welcome to AI</h1>

Code generation with GPT-J really is amazing. This is partly thanks to the fact that GPT-J has been trained on huge code bases.

SQL code generation with GPT-J

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Question: Fetch the companies that have less than five people in it.
            Answer: SELECT COMPANY, COUNT(EMPLOYEE_ID) FROM Employee GROUP BY COMPANY HAVING COUNT(EMPLOYEE_ID) < 5;
            ###
            Question: Show all companies along with the number of employees in each department
            Answer: SELECT COMPANY, COUNT(COMPANY) FROM Employee GROUP BY COMPANY;
            ###
            Question: Show the last record of the Employee table
            Answer: SELECT * FROM Employee ORDER BY LAST_NAME DESC LIMIT 1;
            ###
            Question: Fetch the three max employees from the Employee table;
            Answer:""",
    max_length=100,
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

SELECT * FROM Employee ORDER BY ID DESC LIMIT 3;

Automatic SQL generation works very well with GPT-J, especially due to the declarative nature of SQL, and the fact that SQL is quite a limited language with relatively few possibilities (compared to most programming languages).

Entity Extraction (NER) with GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""My name is Julien and I work for NLP Cloud as a Chief Technical Officer.
            Position: Chief Technical Officer
            Company: NLP Cloud
            ###
            Hi, I am a marketing assistant at Microsoft.
            Position: marketing assistant
            Company: Microsoft
            ###
            John was the CEO of AquaFun until 2020.
            Position: CEO
            Company: AquaFun
            ###
            I have been a Go developer for Google for 3 years, but now I think about quitting.""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

Position: Go developer
Company: Google

This is really impressive how GPT-J solves entity extraction without any re-training even needed! Usually, extracting new types of entities (like name, position, country, etc.) takes a whole new process of annotation, training, deployment... Here, it's completely seamless.

Question Answering With GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Context: NLP Cloud was founded in 2021 when the team realized there was no easy way to reliably leverage NLP in production.
            Question: When was NLP Cloud founded?
            Answer: 2021
            ###
            Context: NLP Cloud developed their API by mid-2020 and they added many pre-trained open-source models since then.
            Question: What did NLP Cloud develop?
            Answer: API
            ###
            Context: The main challenge with GPT-J is memory consumption. Using a GPU plan is recommended.
            Question: Which plan is recommended for GPT-J?
            Answer:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

GPU plan

Question answering works very well. It can also be achieved with other dedicated NLP models, but maybe not with the same level of accuracy.

Tutorial Creation With GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Here is a tutorial about how to make a cake.
        
            1. Take some flour.
            2. Take some sugar.""",
    max_length=500)
print(generation["generated_text"])

Output:

Here a tutorial how to make a cake.
        
                    1. Take some flour.
                    2. Take some sugar.
                    3. Take some butter.
                    4. Take some eggs.
                    5. Take some water.
                    6. Take some baking powder.
                    7. Take some vanilla.
                    8. Mix all together.
                    9. Bake in a pan.
                    10. Enjoy.
                    
Well, that's it. You can make this for your birthday or a party or you can even make it for your kids. They will love this.

As you can see it's quite impressive how GPT-J automatically follows your initial formatting, and the generated content is also very good too. You might create a proper cake out of this (not tried yet though).

Grammar and Spelling Correction with GPT-J

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""I love goin to the beach.
            Correction: I love going to the beach.
            ###
            Let me hav it!
            Correction: Let me have it!
            ###
            It have too many drawbacks.
            Correction: It has too many drawbacks.
            ###
            I do not wan to go
            Correction:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

I do not want to go.

Spelling and grammar corrections work as expected. If you want to be more specific about the location of the mistake in the sentence, you might want to use a dedicated model though.

Machine Translation with GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Hugging Face a révolutionné le NLP.
            Translation: Hugging Face revolutionized NLP.
            ###
            Cela est incroyable!
            Translation: This is unbelievable!
            ###
            Désolé je ne peux pas.
            Translation: Sorry but I cannot.
            ###
            NLP Cloud permet de deployer le NLP en production facilement.
            Translation""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

NLP Cloud makes it easy to deploy NLP to production.

Machine translation usually takes dedicated models (often 1 per language). Here all languages are handle out of the box by GPT-J, which is quite impressive.

Tweet Generation with GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""keyword: markets
            tweet: Take feedback from nature and markets, not from people
            ###
            keyword: children
            tweet: Maybe we die so we can come back as children.
            ###
            keyword: startups
            tweet: Startups should not worry about how to put out fires, they should worry about how to start them.
            ###
            keyword: NLP
            tweet:""",
    max_length=200,
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

People want a way to get the benefits of NLP without paying for it.

Here is a funny and easy way to generate short tweets following a context.

Chatbot and Conversational AI with GPT-J

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""This is a discussion between a [human] and a [robot]. 
The [robot] is very nice and empathetic.

[human]: Hello nice to meet you.
[robot]: Nice to meet you too.
###
[human]: How is it going today?
[robot]: Not so bad, thank you! How about you?
###
[human]: I am ok, but I am a bit sad...
[robot]: Oh? Why that?
###
[human]: I broke up with my girlfriend...
[robot]: """,
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

Oh? How did that happen?

As you can see, GPT-J properly understands that you are in a conversational mode. And the very powerful thing is that, if you change the tone in your context, the responses from the model will follow the same tone (sarcasm, anger, curiosity...).

Intent Classification with GPT-J

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""I want to start coding tomorrow because it seems to be so fun!
            Intent: start coding
            ###
            Show me the last pictures you have please.
            Intent: show pictures
            ###
            Search all these files as fast as possible.
            Intent: search files
            ###
            Can you please teach me Chinese next week?
            Intent:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

learn chinese

This is quite impressive how GPT-J can detect the intent from your sentence. It works very well for more complex sentences. You can even ask it to format the intent differently if you want. For example you could automatically generate a Javascript function name like "learnChinese".

Paraphrasing with GPT-J

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Paraphrase the [Original] section and put the result in the [Paraphrase] section. Follow these rules:
- Use synonyms and new words
- Start sentences with different words
- Alter the structure of sentences

[Original]: A US nuclear submarine hit an "unknown object" while submerged in waters in the Asia-Pacific region, injuring a number of sailors, US officials say. It was not clear what caused the incident on Saturday, they said. The submarine remained "fully operational". Unnamed officials told US media the collision happened in international waters in the South China Sea, and that 11 sailors had been injured. The incident happened amid rising tensions in the region. The US Navy said the extent of the damage was still being assessed and that the submarine's nuclear propulsion plant and spaces had not been affected. The statement did not give details about where the incident took place or the number of people hurt, saying only that the injuries were not "life threatening".
[Paraphrase]: Several sailors where injured during the crash of a US submarine. US officials say they don't know what cause the US submarine to hit this unknown object and the submarine was not harmed. The incident happened in the South China Sea. More and more thensions are happening in this region. Hopefully, no sailor was severly injured. The nuclear propulsion of the submarine is still working according ot the US Navy. No more details were disclosed about the incident.
###
[Original]: The world is warming because of fossil fuel emissions caused by humans. Extreme weather events linked to climate change - including heatwaves, floods and forest fires - are intensifying. The past decade was the warmest on record, and governments agree urgent collective action is needed. For this conference, 200 countries are being asked for their plans to cut emissions by 2030. They all agreed in 2015 to make changes to keep global warming "well below" 2C above pre-industrial levels - and to try aim for 1.5C - so that we avoid a climate catastrophe. This is what's known as the Paris Agreement, and it means countries have to keep making bigger emissions cuts until reaching net zero in 2050.
[Paraphrase]: Humans are the cause of global warming and the consequence is that heatwaves, floods, and forest fires are going to happen more often. More specifically, the last 10 years were the warmest in history, so governments want to take collective actions. In 2015, 200 countries agreed to keep global warming below 2C, even 1.5C in order to avoid chaos, and for this conference they are now asked to decrease their emissions by 2030. These countries will need to make more efforts than initially anticipated.
###
[Original]:In The Sopranos, the mob is besieged as much by inner infidelity as it is by the federal government. Early in the series, the greatest threat to Tony's Family is his own biological family. One of his closest associates turns witness for the FBI, his mother colludes with his uncle to contract a hit on Tony, and his kids click through Websites that track the federal crackdown in Tony's gangland.
[Paraphrase]:In the first season of The Sopranos, Tony Soprano’s mobster activities are more threatened by members of his biological family than by agents of the federal government. This familial betrayal is multi-pronged. Tony’s closest friend and associate is an FBI informant, his mother and uncle are conspiring to have him killed, and his children are surfing the Web for information about his activities.
###
[Original]: French President Emmanuel Macron has said he hopes the diplomatic tension with Algeria will ease soon. "My wish is that we can calm things down because I think it is better to talk to one another, and to make progress," Macron told France Inter radio in an interview. President Macron said he had "cordial" relations with his Algerian counterpart. Algeria recalled its ambassador to Paris on Saturday and closed its airspace to French military planes a day later after the French president made comments about the northern Africa country. President Macron was quoted as saying the former French colony was ruled by a "political-military system" with an official history that was based not on truth, but on hatred of France. The diplomatic spat came days after France cut the number of visas it issues for citizens of Algeria and other North African countries.
[Paraphrase]:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True,
    top_p=0.75,
    temperature=0.8,
    min_length=50,
    max_length=200)
print(generation["generated_text"])

Output:

French President Emmanuel Macron hopes the diplomatic tension with Algeria will calm down. He wants to make progress in the relations with the former French colony. President Macron says he has a good relationship with his Algerian counterpart and he is glad that they have a cordial relationship. The president was quoted as saying that Algeria was ruled by a "political-military system" with an official history that was based not on truth, but on hatred of France. France cut the number of visas it issues for citizens of Algeria and other North African countries.

GPT-J made modifications to our paragraph, while keeping the main sense, which is what paraphrasing is about. You could perfectly encourage GPT-J to return more original paraphrases, by passing different examples in the input, and by playing with API parameters like temperature, top_p, repetition penalty...

Zero-shot text classification with GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""When the spaceship landed on Mars, the whole humanity was excited
        Topic: space
        ###
        Message: I love playing tennis and golf. I'm practicing twice a week.
        Topic: sport
        ###
        Message: Managing a team of sales people is a tough but rewarding job.
        Topic: business
        ###
        Message: I am trying to cook chicken with tomatoes.
        Topic:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

food

Here is an easy and powerful way to categorize a piece of text thanks to the so-called "zero-shot learning" technique, without having to declare categories in advance.

Keyword and Keyphrase Extraction with GPT-J

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Information Retrieval (IR) is the process of obtaining resources relevant to the information need. For instance, a search query on a web search engine can be an information need. The search engine can return web pages that represent relevant resources.
        Keywords: information, search, resources
        ###
        David Robinson has been in Arizona for the last three months searching for his 24-year-old son, Daniel Robinson, who went missing after leaving a work site in the desert in his Jeep Renegade on June 23. 
        Keywords: searching, missing, desert
        ###
        I believe that using a document about a topic that the readers know quite a bit about helps you understand if the resulting keyphrases are of quality.
        Keywords: document, understand, keyphrases
        ###
        Since transformer models have a token limit, you might run into some errors when inputting large documents. In that case, you could consider splitting up your document into paragraphs and mean pooling (taking the average of) the resulting vectors.
        Keywords:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

paragraphs, transformer, input, errors

Keyword extraction is about getting the main ideas from a piece of text. This is an interesting NLP subfield that GPT-J can handle very well. See below for keyphrase extraction (same thing but with multiple words).

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Information Retrieval (IR) is the process of obtaining resources relevant to the information need. For instance, a search query on a web search engine can be an information need. The search engine can return web pages that represent relevant resources.
        Keywords: information retrieval, search query, relevant resources
        ###
        David Robinson has been in Arizona for the last three months searching for his 24-year-old son, Daniel Robinson, who went missing after leaving a work site in the desert in his Jeep Renegade on June 23. 
        Keywords: searching son, missing after work, desert
        ###
        I believe that using a document about a topic that the readers know quite a bit about helps you understand if the resulting keyphrases are of quality.
        Keywords: document, help understand, resulting keyphrases
        ###
        Since transformer models have a token limit, you might run into some errors when inputting large documents. In that case, you could consider splitting up your document into paragraphs and mean pooling (taking the average of) the resulting vectors.
        Keywords:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

large documents, paragraph, mean pooling

Same example as above except that this time we don't want to extract one single word but several words (called keyphrase).

Generate Text From Keywords

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
        generation = client.generation("""Generate a sentence out of keywords.

        Keywords: train, travel, mountains
        Sentence: I took the train for a great travel in the mountains.
        ###
        Keywords: programming, go, concurrency
        Sentence: I like go programming because it makes concurrency very easy.
        ###
        Keywords: house, countryside, dollars
        Sentence: I recently purchased a house in the countryside for 100 thousand dollars.
        ###
        Keywords: garden, tomatoes, summer
        Sentence:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

The tomatoes in my garden are very tasty this summer.

It is possible to ask GPT-J to generate a piece of text containing specific keywords. Here we're only generating one sentence, but we could generate a whole paragraph or even more if needed.

Blog Post Generation

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""[Instruction]: generate a blog article about a text classification API
[Blog article]:
<h1>Text Classification API</h1>
<h2>What is Text Classification?</h2>
<p>Text classification is the process of categorizing a block of text based on one or several labels.
Let's say you have the following block of text:
Perseverance is just getting started, and already has provided some of the most iconic visuals in space exploration history. It reinforces the remarkable level of engineering and precision that is required to build and fly a vehicle to the Red Planet.
Let's also say that you also have the following labels: space, science, and food.
Now the question is: which ones of these labels apply best to this block of text? Answer is space and science of course.</p>
<h2>Text Classification with Hugging Face Transformers.</h2>
<p>Hugging Face transformers is an amazing library that has been recently released. It is based on either PyTorch or TensorFlow, depending on the model you're using. Transformers have clearly helped deep learning NLP make great progress in terms of accuracy. However this accuracy improvement comes at a cost: transformers are extremely demanding in terms of resources.</p>
###
[Instruction]: generate a blog article about a chatbot API
[Blog article]:
<h1>Chatbot API</h1>
<h2>What are Chatbots?</h2>
<p>Conversational AI is a central sub-field of Natural Language Processing that makes it possible for a human to have a conversation with a machine. Everytime the human says or asks something to the AI, the whole conversation history is sent too, so the AI can have the context in memory and make relevant responses. Modern chabots leverage conversational AI and can do more than simply having a conversation. For example they can detect customer intents, search documents, understand the customer tone and adapt their own tone (anger, joy, sarcasm...).</p>
<h2>Building an Advanced Chatbot</h2>
<p>In order to make the most of GPT-J, it is crucial to have in mind the so-called few-shot learning technique: by giving only a couple of examples to the AI, it is possible to dramatically improve the relevancy of the results, without even training a dedicated AI.
Sometimes, few-shot learning is not enough (for example if your chatbot relies on very specific content, bound to your company only). In that case, the best solution is to fine-tune (train) GPT-J with your own data.</p>
###
[Instruction]: generate a blog article about a summarization API
[Blog article]:
<h1>Summarization API</h1>
<h2>What is Summarization?</h2>
<p>Text summarization simply is the process of summarizing a block of text in order to make it shorter. Let's say you have the following block of text:
The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.
This technical description is quite long and maybe not all these details are necessary for a common reader to grasp the general idea. So we now want to leverage machine learning in order to automatically summarize this piece of text.</p>
<h2>Summarization with Hugging Face Transformers.</h2>
<p>Hugging Face transformers is an amazing library that has been recently released. It is based on either PyTorch or TensorFlow, depending on the model you're using. Transformers have clearly helped deep learning NLP make great progress in terms of accuracy. However this accuracy improvement comes at a cost: transformers are extremely demanding in terms of resources.</p>
###
[Instruction]: generate a blog article about a text generation API
[Blog article]:""",
    max_length:500,
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

<h1>Text Generation API</h1>
<h2>What is Text Generation?</h2>
<p>Text generation is the process of generating text from a block of text. It is not simply the inverse of summarization: a text generator will take a short piece of text as an input and generate a longer one.</p>
<p>For example you have the following text:</p>
Ahead of the opening of the meeting,
<p>In this case the goal of the API is to automatically generate the following:</p>
Ahead of the opening of the meeting, Prime Minister Justin Trudeau spoke to reporters at the Montreal airport where he was travelling to speak at the conference.
<p>So basically we want to leverage the output of the API as the input for the text generator, and we will then be able to generate some sentences based on the input we already have.</p>
<h2>Text generation with GPT-2</h2>
<p>We will leverage GPT-2 in order to have a starting point for the text generator. In this case, we will use the API as the input for the text generator, and we will use the output of the API as the input for GPT-2.
However, we want to fine-tune GPT-2 to generate more complex sentences and to have better results.</p>

Isn't it impressive? This generated blog article is small but you can generate much longer articles. The structure of the generated blog post really depends on the structure you used in your few-shot examples. In order to get more complex structures and more relevant content, fine-tuning GPT-J is the key.

Conclusion

As you can see, few-shot learning is a great technique that helps GPT-J and GPT-Neo achieve amazing things! The key here is to pass a correct context before making your request.

Even for simple text generation, it is recommended to pass as much context as possible, in order to help the model.

Hope you found it useful! If you have some questions about how to make the most of these models, please don't hesitate to ask us.

Julien Salinas
CTO at NLPCloud.io