Effectively using GPT-J and GPT-Neo, the GPT-3 open-source alternatives, with few-shot learning

GPT-J and GPT-Neo, the open-source alternatives to GPT-3, are among the best NLP models as of this writing. But using them effectively can take practice. Few-shot learning is an NLP technique that works very well with these models.

GPT-J and GPT-Neo

GPT-Neo and GPT-J are both open-source NLP models, created by EleutherAI (a collective of researchers working to open source AI).

GPT-J has 6 billion parameters, which makes it the most advanced open-source NLP model as of this writing. This is a direct alternative to OpenAI's proprietary GPT-3 Curie.

These models are very versatile. They can be used for almost any NLP use case: text generation, sentiment analysis, classification, machine translation,... and much more (see below). However using them effectively sometimes takes practice. Their response time (latency) might also be longer than more standard NLP models.

GPT-J and GPT-Neo are both available on the NLP Cloud API. Below, we're showing you examples obtained using the GPT-J endpoint of NLP Cloud on GPU, with the Python client. If you want to copy paste the examples, please don't forget to add your own API token. In order to install the Python client, first run pip install nlpcloud.

Few-Shot Learning

Few-shot learning is about helping a machine learning model make predictions thanks to only a couple of examples. No need to train a new model here: models like GPT-J and GPT-Neo are so big that they can easily adapt to many contexts without being re-trained.

Giving only a few examples to the model does help it dramatically increase its accuracy.

In NLP, the idea is to pass these examples along with your text input. See the examples below!

Also note that, if few-shot learning is not enough, you can also fine-tune GPT-J on NLP Cloud so the model is perfectly tailored to your use case.

You can easily test few-shot learning on the NLP Cloud playground.

Sentiment Analysis with GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Message: Support has been terrible for 2 weeks...
            Sentiment: Negative
            ###
            Message: I love your API, it is simple and so fast!
            Sentiment: Positive
            ###
            Message: GPT-J has been released 2 months ago.
            Sentiment: Neutral
            ###
            Message: The reactivity of your team has been amazing, thanks!
            Sentiment:""",
            length_no_input=True,
            end_sequence="\n###",
            remove_input=True)
print(generation["generated_text"])

Output:

Positive

As you can see, the fact that we first give 3 examples with a proper format, leads GPT-J to understand that we want to perform sentiment analysis. And its result is good.

### is an arbitrary delimiter that helps GPT-J understand the different sections. We could perfectly use something else like --- or simply a new line. Then we set end_sequence="\n###" which is an NLP Cloud parameter that tells GPT-J to stop generating content after a new line + ###.

HTML code generation with GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""description: a red button that says stop
    code: <button style=color:white; background-color:red;>Stop</button>
    ###
    description: a blue box that contains yellow circles with red borders
    code: <div style=background-color: blue; padding: 20px;><div style=background-color: yellow; border: 5px solid red; border-radius: 50%; padding: 20px; width: 100px; height: 100px;>
    ###
    description: a Headline saying Welcome to AI
    code:""",
    max_length=500,
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

<h1 style=color: white;>Welcome to AI</h1>

Code generation with GPT-J really is amazing. This is partly thanks to the fact that GPT-J has been trained on huge code bases.

SQL code generation with GPT-J

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Question: Fetch the companies that have less than five people in it.
            Answer: SELECT COMPANY, COUNT(EMPLOYEE_ID) FROM Employee GROUP BY COMPANY HAVING COUNT(EMPLOYEE_ID) < 5;
            ###
            Question: Show all companies along with the number of employees in each department
            Answer: SELECT COMPANY, COUNT(COMPANY) FROM Employee GROUP BY COMPANY;
            ###
            Question: Show the last record of the Employee table
            Answer: SELECT * FROM Employee ORDER BY LAST_NAME DESC LIMIT 1;
            ###
            Question: Fetch the three max employees from the Employee table;
            Answer:""",
    max_length=100,
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

SELECT * FROM Employee ORDER BY ID DESC LIMIT 3;

Automatic SQL generation works very well with GPT-J, especially due to the declarative nature of SQL, and the fact that SQL is quite a limited language with relatively few possibilities (compared to most programming languages).

Entity Extraction (NER) with GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""My name is Julien and I work for NLP Cloud as a Chief Technical Officer.
            Position: Chief Technical Officer
            Company: NLP Cloud
            ###
            Hi, I am a marketing assistant at Microsoft.
            Position: marketing assistant
            Company: Microsoft
            ###
            John was the CEO of AquaFun until 2020.
            Position: CEO
            Company: AquaFun
            ###
            I have been a Go developer for Google for 3 years, but now I think about quitting.""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

Position: Go developer
Company: Google

This is really impressive how GPT-J solves entity extraction without any re-training even needed! Usually, extracting new types of entities (like name, position, country, etc.) takes a whole new process of annotation, training, deployment... Here, it's completely seamless.

Question Answering With GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Context: NLP Cloud was founded in 2021 when the team realized there was no easy way to reliably leverage NLP in production.
            Question: When was NLP Cloud founded?
            Answer: 2021
            ###
            Context: NLP Cloud developed their API by mid-2020 and they added many pre-trained open-source models since then.
            Question: What did NLP Cloud develop?
            Answer: API
            ###
            Context: The main challenge with GPT-J is memory consumption. Using a GPU plan is recommended.
            Question: Which plan is recommended for GPT-J?
            Answer:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

GPU plan

Question answering works very well. It can also be achieved with other dedicated NLP models, but maybe not with the same level of accuracy.

Tutorial Creation With GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Here is a tutorial about how to make a cake.
        
            1. Take some flour.
            2. Take some sugar.""",
    max_length=500)
print(generation["generated_text"])

Output:

Here is a tutorial how to make a cake.
        
                    1. Take some flour.
                    2. Take some sugar.
                    3. Take some butter.
                    4. Take some eggs.
                    5. Take some water.
                    6. Take some baking powder.
                    7. Take some vanilla.
                    8. Mix all together.
                    9. Bake in a pan.
                    10. Enjoy.
                    
Well, that's it. You can make this for your birthday or a party or you can even make it for your kids. They will love this.

As you can see it's quite impressive how GPT-J automatically follows your initial formatting, and the generated content is also very good too. You might create a proper cake out of this (not tried yet though).

Grammar and Spelling Correction with GPT-J

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""I love goin to the beach.
            Correction: I love going to the beach.
            ###
            Let me hav it!
            Correction: Let me have it!
            ###
            It have too many drawbacks.
            Correction: It has too many drawbacks.
            ###
            I do not wan to go
            Correction:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

I do not want to go.

Spelling and grammar corrections work as expected. If you want to be more specific about the location of the mistake in the sentence, you might want to use a dedicated model though.

Machine Translation with GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Hugging Face a révolutionné le NLP.
            Translation: Hugging Face revolutionized NLP.
            ###
            Cela est incroyable!
            Translation: This is unbelievable!
            ###
            Désolé je ne peux pas.
            Translation: Sorry but I cannot.
            ###
            NLP Cloud permet de deployer le NLP en production facilement.
            Translation""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

NLP Cloud makes it easy to deploy NLP to production.

Machine translation usually takes dedicated models (often 1 per language). Here all languages are handle out of the box by GPT-J, which is quite impressive.

Tweet Generation with GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""keyword: markets
            tweet: Take feedback from nature and markets, not from people
            ###
            keyword: children
            tweet: Maybe we die so we can come back as children.
            ###
            keyword: startups
            tweet: Startups should not worry about how to put out fires, they should worry about how to start them.
            ###
            keyword: NLP
            tweet:""",
    max_length=200,
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

People want a way to get the benefits of NLP without paying for it.

Here is a funny and easy way to generate short tweets following a context.

Chatbot and Conversational AI with GPT-J

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""This is a discussion between a [human] and a [robot]. 
The [robot] is very nice and empathetic.

[human]: Hello nice to meet you.
[robot]: Nice to meet you too.
###
[human]: How is it going today?
[robot]: Not so bad, thank you! How about you?
###
[human]: I am ok, but I am a bit sad...
[robot]: Oh? Why that?
###
[human]: I broke up with my girlfriend...
[robot]: """,
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

Oh? How did that happen?

As you can see, GPT-J properly understands that you are in a conversational mode. And the very powerful thing is that, if you change the tone in your context, the responses from the model will follow the same tone (sarcasm, anger, curiosity...).

Intent Classification with GPT-J

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""I want to start coding tomorrow because it seems to be so fun!
            Intent: start coding
            ###
            Show me the last pictures you have please.
            Intent: show pictures
            ###
            Search all these files as fast as possible.
            Intent: search files
            ###
            Can you please teach me Chinese next week?
            Intent:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

learn chinese

This is quite impressive how GPT-J can detect the intent from your sentence. It works very well for more complex sentences. You can even ask it to format the intent differently if you want. For example you could automatically generate a Javascript function name like "learnChinese".

Paraphrasing with GPT-J

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""[Original]: Algeria recalled its ambassador to Paris on Saturday and closed its airspace to French military planes a day later after the French president made comments about the northern Africa country. 
        [Paraphrase]: Last Saturday, the Algerian government recalled its ambassador and stopped accepting French military airplanes in its airspace. It happened one day after the French president made comments about Algeria.
        ###
        [Original]: President Macron was quoted as saying the former French colony was ruled by a "political-military system" with an official history that was based not on truth, but on hatred of France.
        [Paraphrase]: Emmanuel Macron said that the former colony was lying and angry at France. He also said that the country was ruled by a "political-military system".
        ###
        [Original]: The diplomatic spat came days after France cut the number of visas it issues for citizens of Algeria and other North African countries.
        [Paraphrase]: Diplomatic issues started appearing when France decided to stop granting visas to Algerian people and other North African people.
        ###
        [Original]: After a war lasting 20 years, following the decision taken first by President Trump and then by President Biden to withdraw American troops, Kabul, the capital of Afghanistan, fell within a few hours to the Taliban, without resistance.
        [Paraphrase]: """,
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True,
    min_length=0,
    max_length=50)
print(generation["generated_text"])

Output:

French President Emmanuel Macron hopes the diplomatic tension with Algeria will calm down. He wants to make progress in the relations with the former French colony. President Macron says he has a good relationship with his Algerian counterpart and he is glad that they have a cordial relationship. The president was quoted as saying that Algeria was ruled by a "political-military system" with an official history that was based not on truth, but on hatred of France. France cut the number of visas it issues for citizens of Algeria and other North African countries.

GPT-J made modifications to our paragraph, while keeping the main sense, which is what paraphrasing is about. You could perfectly encourage GPT-J to return more original paraphrases, by passing different examples in the input, and by playing with API parameters like temperature, top_p, repetition penalty...

Zero-shot text classification with GPT-J

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""When the spaceship landed on Mars, the whole humanity was excited
        Topic: space
        ###
        Message: I love playing tennis and golf. I'm practicing twice a week.
        Topic: sport
        ###
        Message: Managing a team of sales people is a tough but rewarding job.
        Topic: business
        ###
        Message: I am trying to cook chicken with tomatoes.
        Topic:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

food

Here is an easy and powerful way to categorize a piece of text thanks to the so-called "zero-shot learning" technique, without having to declare categories in advance.

Keyword and Keyphrase Extraction with GPT-J

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Information Retrieval (IR) is the process of obtaining resources relevant to the information need. For instance, a search query on a web search engine can be an information need. The search engine can return web pages that represent relevant resources.
        Keywords: information, search, resources
        ###
        David Robinson has been in Arizona for the last three months searching for his 24-year-old son, Daniel Robinson, who went missing after leaving a work site in the desert in his Jeep Renegade on June 23. 
        Keywords: searching, missing, desert
        ###
        I believe that using a document about a topic that the readers know quite a bit about helps you understand if the resulting keyphrases are of quality.
        Keywords: document, understand, keyphrases
        ###
        Since transformer models have a token limit, you might run into some errors when inputting large documents. In that case, you could consider splitting up your document into paragraphs and mean pooling (taking the average of) the resulting vectors.
        Keywords:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

paragraphs, transformer, input, errors

Keyword extraction is about getting the main ideas from a piece of text. This is an interesting NLP subfield that GPT-J can handle very well. See below for keyphrase extraction (same thing but with multiple words).

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Information Retrieval (IR) is the process of obtaining resources relevant to the information need. For instance, a search query on a web search engine can be an information need. The search engine can return web pages that represent relevant resources.
        Keywords: information retrieval, search query, relevant resources
        ###
        David Robinson has been in Arizona for the last three months searching for his 24-year-old son, Daniel Robinson, who went missing after leaving a work site in the desert in his Jeep Renegade on June 23. 
        Keywords: searching son, missing after work, desert
        ###
        I believe that using a document about a topic that the readers know quite a bit about helps you understand if the resulting keyphrases are of quality.
        Keywords: document, help understand, resulting keyphrases
        ###
        Since transformer models have a token limit, you might run into some errors when inputting large documents. In that case, you could consider splitting up your document into paragraphs and mean pooling (taking the average of) the resulting vectors.
        Keywords:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

large documents, paragraph, mean pooling

Same example as above except that this time we don't want to extract one single word but several words (called keyphrase).

Product Description and Ad Generation

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""Generate a product description out of keywords.

        Keywords: shoes, women, $59
        Sentence: Beautiful shoes for women at the price of $59.
        ###
        Keywords: trousers, men, $69
        Sentence: Modern trousers for men, for $69 only.
        ###
        Keywords: gloves, winter, $19
        Sentence: Amazingly hot gloves for cold winters, at $19.
        ###
        Keywords: t-shirt, men, $39
        Sentence:""",
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

Extraordinary t-shirt for men, for $39 only.

It is possible to ask GPT-J to generate a product description or an ad containing specific keywords. Here we're only generating a simple sentence, but we could easily generate a whole paragraph if needed.

Blog Post Generation

Test on the playground

import nlpcloud
client = nlpcloud.Client("gpt-j", "your_token", gpu=True)
generation = client.generation("""[Title]: 3 Tips to Increase the Effectiveness of Online Learning
[Blog article]: <h1>3 Tips to Increase the Effectiveness of Online Learning</h1>
<p>The hurdles associated with online learning correlate with the teacher’s inability to build a personal relationship with their students and to monitor their productivity during class.</p>
<h2>1. Creative and Effective Approach</h2>
<p>Each aspect of online teaching, from curriculum, theory, and practice, to administration and technology, should be formulated in a way that promotes productivity and the effectiveness of online learning.</p>
<h2>2. Utilize Multimedia Tools in Lectures</h2>
<p>In the 21st century, networking is crucial in every sphere of life. In most cases, a simple and functional interface is preferred for eLearning to create ease for the students as well as the teacher.</p>
<h2>3. Respond to Regular Feedback</h2>
<p>Collecting student feedback can help identify which methods increase the effectiveness of online learning, and which ones need improvement. An effective learning environment is a continuous work in progress.</p>
###
[Title]: 4 Tips for Teachers Shifting to Teaching Online 
[Blog article]: <h1>4 Tips for Teachers Shifting to Teaching Online </h1>
<p>An educator with experience in distance learning shares what he’s learned: Keep it simple, and build in as much contact as possible.</p>
<h2>1. Simplicity Is Key</h2>
<p>Every teacher knows what it’s like to explain new instructions to their students. It usually starts with a whole group walk-through, followed by an endless stream of questions from students to clarify next steps.</p>
<h2>2. Establish a Digital Home Base</h2>
<p>In the spirit of simplicity, it’s vital to have a digital home base for your students. This can be a district-provided learning management system like Canvas or Google Classrooms, or it can be a self-created class website. I recommend Google Sites as a simple, easy-to-set-up platform.</p>
<h2>3. Prioritize Longer, Student-Driven Assignments</h2>
<p>Efficiency is key when designing distance learning experiences. Planning is going to take more time and require a high level of attention to detail. You will not be able to correct mistakes on the fly or suddenly pivot when kids are disengaged.</p>
<h2>4. Individual Touchpoints Are Game-Changers</h2>
<p>You can create these touchpoints through any medium you like: emails, video messages, phone calls, messages through your learning management system, comments on shared documents, etc.</p>
###
[Title]: 3 Best Free Image Converter Software Programs
[Blog article]: <h1>3 Best Free Image Converter Software Programs</h1>
<p>Best free image converters for JPG, BMP, PSD, TIF, GIF, RAW, and more</p>
<h2>1. XnConvert</h2>
<p>XnConvert is the Swiss Army knife of image converters. With XnView, you can convert any of around 500 image formats to your choice of around 80 others. If you have a rare image format that you can't open, XnView can probably convert it.</p>
<h2>2. CoolUtils Online Image Converter</h2>
<p>The CoolUtils Online Image Converter is just that—an image converter that exists completely online, no download required. Unlike some online image converters, this one does the image converting for you in real-time—no waiting on an email link.</p>
<h2>3. FileZigZag</h2>
<p>FileZigZag is another online image converter service that will convert most common graphics formats. Just upload the original image, choose the desired output, and then wait for the download link to appear on the page.</p>
###
[Title]: 4 Best Tips to Find a Girlfriend
[Blog article]: """,
    max_length:500,
    length_no_input=True,
    end_sequence="\n###",
    remove_input=True)
print(generation["generated_text"])

Output:

<h1>Text Generation API</h1>
<h2>What is Text Generation?</h2>
<p>Text generation is the process of generating text from a block of text. It is not simply the inverse of summarization: a text generator will take a short piece of text as an input and generate a longer one.</p>
<p>For example you have the following text:</p>
Ahead of the opening of the meeting,
<p>In this case the goal of the API is to automatically generate the following:</p>
Ahead of the opening of the meeting, Prime Minister Justin Trudeau spoke to reporters at the Montreal airport where he was travelling to speak at the conference.
<p>So basically we want to leverage the output of the API as the input for the text generator, and we will then be able to generate some sentences based on the input we already have.</p>
<h2>Text generation with GPT-2</h2>
<p>We will leverage GPT-2 in order to have a starting point for the text generator. In this case, we will use the API as the input for the text generator, and we will use the output of the API as the input for GPT-2.
However, we want to fine-tune GPT-2 to generate more complex sentences and to have better results.</p>

Isn't it impressive? This generated blog article is small but you can generate much longer articles. The structure of the generated blog post really depends on the structure you used in your few-shot examples. In order to get more complex structures and more relevant content, fine-tuning GPT-J is the key.

Conclusion

As you can see, few-shot learning is a great technique that helps GPT-J and GPT-Neo achieve amazing things! The key here is to pass a correct context before making your request.

Even for simple text generation, it is recommended to pass as much context as possible, in order to help the model.

Hope you found it useful! If you have some questions about how to make the most of these models, please don't hesitate to ask us.

Julien Salinas
CTO at NLPCloud.io