Effectively using GPT-J and GPT-Neo, the GPT-3 open-source alternatives, with few-shot learning

GPT-J and GPT-Neo, the open-source alternatives to GPT-3, are among the best NLP models as of this writing. But using them effectively can take practice. Few-shot learning is an NLP technique that works very well with these models.

GPT-J and GPT-Neo

GPT-Neo and GPT-J are both open-source NLP models, created by EleutherAI (a collective of researchers working to open source AI).

GPT-J has 6 billion parameters, which makes it the most advanced open-source NLP model as of this writing. This is a direct alternative to OpenAI's proprietary GPT-3 Curie.

These models are very versatile. They can be used for almost any NLP use case: text generation, sentiment analysis, classification, machine translation,... and much more (see below). However using them effectively sometimes takes practice. Their response time (latency) might also be longer than more standard NLP models.

GPT-J and GPT-Neo are both available on the NLP Cloud API. Below, we're showing you examples obtained using the GPT-J endpoint of NLP Cloud with cURL (command line). If you want to copy paste the examples, please don't forget to add your own API token. Also, you might have to replace new lines with \n for cURL to work.

Few-Shot Learning

Few-shot learning is about helping a machine learning model make predictions thanks to only a couple of examples. No need to train a new model here: models like GPT-J and GPT-Neo are so big that they can easily adapt to many contexts without being re-trained.

Giving only a few examples to the model does help it dramatically increase its accuracy.

In NLP, the idea is to pass these examples along with your text input. See the examples below!

Sentiment Analysis with GPT-J

curl "https://api.nlpcloud.io/v1/gpt-j/generation" -H "Authorization: Token {your_token}" -X POST -d '{
    "text":"Message: Support has been terrible for 2 weeks...
            Sentiment: Negative
            ###
            Message: I love your API, it is simple and so fast!
            Sentiment: Positive
            ###
            Message: GPT-J has been released 2 months ago.
            Sentiment: Neutral
            ###
            Message: The reactivity of your team has been amazing, thanks!
            Sentiment:",
    "length_no_input": true,
    "end_sequence": "###",
    "remove_input": true
}'

Output:

{"generated_text":"Positive
                ###"}

As you can see, the fact that we first give 3 examples with a proper format, leads GPT-J to understand that we want to perform sentiment analysis. And its result is good.

### is an arbitrary delimiter that helps GPT-J understand the different sections. We could perfectly use something else like --- or simply a new line. Then we set "end_sequence": "###" which is an NLP Cloud parameter that tells GPT-J to stop generating content after a new ###.

HTML code generation with GPT-J

curl "https://api.nlpcloud.io/v1/gpt-j/generation" -H "Authorization: Token your_token" -X POST -d '{
    "text":"description: a red button that says stop
            code: <button style=color:white; background-color:red;>Stop</button>
            ###
            description: a blue box that contains yellow circles with red borders
            code: <div style=background-color: blue; padding: 20px;><div style=background-color: yellow; border: 5px solid red; border-radius: 50%; padding: 20px; width: 100px; height: 100px;>
            ###
            description: a Headline saying Welcome to AI
            code:",
    "max_length": 500,
    "length_no_input": true,
    "end_sequence": "###",
    "remove_input": true
}'

Output:

{"generated_text": "<h1 style=color: white;>Welcome to AI</h1>
                    ###"}

Code generation with GPT-J really is amazing. This is partly thanks to the fact that GPT-J has been trained on huge code bases.

SQL code generation with GPT-J

curl "https://api.nlpcloud.io/v1/gpt-j/generation" -H "Authorization: Token your_token" -X POST -d '{
    "text":"Question: Fetch the companies that have less than five people in it.
            Answer: SELECT COMPANY, COUNT(EMPLOYEE_ID) FROM Employee GROUP BY COMPANY HAVING COUNT(EMPLOYEE_ID) < 5;
            ###
            Question: Show all companies along with the number of employees in each department
            Answer: SELECT COMPANY, COUNT(COMPANY) FROM Employee GROUP BY COMPANY;
            ###
            Question: Show the last record of the Employee table
            Answer: SELECT * FROM Employee ORDER BY LAST_NAME DESC LIMIT 1;
            ###
            Question: Fetch the three max employees from the Employee table;
            Answer:",
    "max_length": 100,
    "length_no_input": true,
    "end_sequence": "###",
    "remove_input": true
}'

Output:

{"generated_text":"SELECT * FROM Employee ORDER BY ID DESC LIMIT 3;
                    ###"}

Automatic SQL generation works very well with GPT-J, especially due to the declarative nature of SQL, and the fact that SQL is quite a limited language with relatively few possibilities (compared to most programming languages).

Entity Extraction (NER) with GPT-J

curl "https://api.nlpcloud.io/v1/gpt-j/generation" -H "Authorization: Token your_token" -X POST -d '{
    "text":"My name is Julien and I work for NLP Cloud as a Chief Technical Officer.
            Position: Chief Technical Officer
            Company: NLP Cloud
            ###
            Hi, I am a marketing assistant at Microsoft.
            Position: marketing assistant
            Company: Microsoft
            ###
            John was the CEO of AquaFun until 2020.
            Position: CEO
            Company: AquaFun
            ###
            I have been a Go developer for Google for 3 years, but now I think about quitting.",
    "length_no_input": true,
    "end_sequence": "###",
    "remove_input": true
}'

Output:

{"generated_text":"Position: Go developer
                    Company: Google
                    ###"}

This is really impressive how GPT-J solves entity extraction without any re-training even needed! Usually, extracting new types of entities (like name, position, country, etc.) takes a whole new process of annotation, training, deployment... Here, it's completely seamless.

Question Answering With GPT-J

curl "https://api.nlpcloud.io/v1/gpt-j/generation" -H "Authorization: Token your_token" -X POST -d '{
    "text":"Context: NLP Cloud was founded in 2021 when the team realized there was no easy way to reliably leverage NLP in production.
            Question: When was NLP Cloud founded?
            Answer: 2021
            ###
            Context: NLP Cloud developed their API by mid-2020 and they added many pre-trained open-source models since then.
            Question: What did NLP Cloud develop?
            Answer: API
            ###
            Context: The main challenge with GPT-J is memory consumption. Using a GPU plan is recommended.
            Question: Which plan is recommended for GPT-J?
            Answer:",
    "length_no_input": true,
    "end_sequence": "###",
    "remove_input": true
}'

Output:

{"generated_text":"GPU plan
                    ###"}

Question answering works very well. It can also be achieved with other dedicated NLP models, but maybe not with the same level of accuracy.

Tutorial Creation With GPT-J

curl "https://api.nlpcloud.io/v1/gpt-j/generation" -H "Authorization: Token your_token" -X POST -d '{
    "text":"Here a tutorial how to make a cake.
        
            1. Take some flour.
            2. Take some sugar.",
    "max_length": 500
}'

Output:

{"generated_text":"Here a tutorial how to make a cake.
        
                    1. Take some flour.
                    2. Take some sugar.
                    3. Take some butter.
                    4. Take some eggs.
                    5. Take some water.
                    6. Take some baking powder.
                    7. Take some vanilla.
                    8. Mix all together.
                    9. Bake in a pan.
                    10. Enjoy.
                    
                    Well, that's it. You can make this for your birthday or a party or you can even make it for your kids. They will love this."}

As you can see it's quite impressive how GPT-J automatically follows your initial formatting, and the generated content is also very good too. You might create a proper cake out of this (not tried yet though).

Grammar and Spelling Correction with GPT-J

curl "https://api.nlpcloud.io/v1/gpt-j/generation" -H "Authorization: Token your_token" -X POST -d '{
    "text":"I love goin to the beach.
            Correction: I love going to the beach.
            ###
            Let me hav it!
            Correction: Let me have it!
            ###
            It have too many drawbacks.
            Correction: It has too many drawbacks.
            ###
            I do not wan to go
            Correction:",
    "length_no_input": true,
    "end_sequence": "###",
    "remove_input": true
}'

Output:

{"generated_text":"I do not want to go.
                    ###"}

Spelling and grammar corrections work as expected. If you want to be more specific about the location of the mistake in the sentence, you might want to use a dedicated model though.

Machine Translation with GPT-J

curl "https://api.nlpcloud.io/v1/gpt-j/generation" -H "Authorization: Token your_token" -X POST -d '{
    "text":"Hugging Face a révolutionné le NLP.
            Translation: Hugging Face revolutionized NLP.
            ###
            Cela est incroyable!
            Translation: This is unbelievable!
            ###
            Désolé je ne peux pas.
            Translation: Sorry but I cannot.
            ###
            NLP Cloud permet de deployer le NLP en production facilement.
            Translation",
    "length_no_input": true,
    "end_sequence": "###",
    "remove_input": true
}'

Output:

{"generated_text":"NLP Cloud makes it easy to deploy NLP to production.
                    ###"}

Machine translation usually takes dedicated models (often 1 per language). Here all languages are handle out of the box by GPT-J, which is quite impressive.

Tweet Generation with GPT-J

curl "https://api.nlpcloud.io/v1/gpt-j/generation" -H "Authorization: Token your_token" -X POST -d '{
    "text":"keyword: markets
            tweet: Take feedback from nature and markets, not from people
            ###
            keyword: children
            tweet: Maybe we die so we can come back as children.
            ###
            keyword: startups
            tweet: Startups should not worry about how to put out fires, they should worry about how to start them.
            ###
            keyword: NLP
            tweet:",
    "max_length": 200,
    "length_no_input": true,
    "end_sequence": "###",
    "remove_input": true
}'

Output:

{"generated_text":"People want a way to get the benefits of NLP without paying for it.
                    ###"}

Here is a funny and easy way to generate short tweets following a context.

Conversational chatbot with GPT-J

curl "https://api.nlpcloud.io/v1/gpt-j/generation" -H "Authorization: Token your_token" -X POST -d '{
    "text":"Hello nice to meet you.
            Nice to meet you too.
            ###
            How is it going today?
            Not so bad, thank you! How about you?
            ###
            I am ok, but I am a bit sad...
            Oh? Why that?
            ###
            I broke up with my girlfriend...",
    "length_no_input": true,
    "end_sequence": "###",
    "remove_input": true
}'

Output:

{"generated_text":"Oh? How did that happen?
                    ###"}

As you can see, GPT-J properly understands that you are in a conversational mode. And the very powerful thing is that, if you change the tone in your context, the responses from the model will follow the same tone (sarcasm, anger, curiosity...).

Semantic Similarity

curl "https://api.nlpcloud.io/v1/gpt-j/generation" -H "Authorization: Token your_token" -X POST -d '{
    "text":"Sentence 1: I want to go to the beach.\nSentence 2: I would love to go to the beach.
            Sentence 2: I would love to go to the beach.
            Similar? Yes
            ###
            Sentence 1: I started coding when I was 2 years old.
            Sentence 2: I coded my first program when I was 2.
            Similar? Yes
            ###
            Sentence 1: My girlfriend loves me so much!
            Sentence 2: My girlfriend hates me...
            Similar? No
            ###
            Sentence 1: NLP is an amazing field, even if sometimes quite complex.
            Sentence 2: NLP is a great field, even if quite complex.
            Similar?",
    "length_no_input": true,
    "end_sequence": "###",
    "remove_input": true
}'

Output:

{"generated_text":"Yes
                    ###"}

Semantic similarity is a very powerful NLP feature, and GPT-J is very good at detecting it. You can try with more complex examples, and see that GPT-J often successfully detects whether 2 sentences have the same meaning or not.

Intent Classification

curl "https://api.nlpcloud.io/v1/gpt-j/generation" -H "Authorization: Token your_token" -X POST -d '{
    "text":"I want to start coding tomorrow because it seems to be so fun!
            Intent: start coding
            ###
            Show me the last pictures you have please.
            Intent: show pictures
            ###
            Search all these files as fast as possible.
            Intent: search files
            ###
            Can you please teach me Chinese next week?
            Intent:",
    "length_no_input": true,
    "end_sequence": "###",
    "remove_input": true
}'

Output:

{"generated_text":"learn chinese
                    ###"}

This is quite impressive how GPT-J can detect the intent from your sentence. It works very well for more complex sentences. You can even ask it to format the intent differently if you want. For example you could automatically generate a Javascript function name like "learnChinese".

Paraphrasing

curl "https://api.nlpcloud.io/v1/gpt-j/generation" -H "Authorization: Token your_token" -X POST -d '{
    "text":"Sentence: I would like to go with you to the cinema but my mother does not want me to...
            Paraphrase: I want to go with you to the cinema but my mother disagrees.
            ###
            Sentence: NLP is an amazing machine learning field, but it is also extremely complex.
            Paraphrase: NLP is a great machine learning area despite being very complex.
            ###
            Sentence: It all started last year when I purchased my first computer. I fell in love with programming!
            Paraphrase: I fell in love with programming last year.
            ###
            Sentence: I think I caught a cold yesterday, because I am not feeling too good right now...
            Paraphrase: ",
    "length_no_input": true,
    "end_sequence": "###",
    "remove_input": true
}'

Output:

{"generated_text":"I think I caught a cold yesterday, because I'm not feeling well right now...
                    ###"}

GPT-J made a slight modification to our sentence, while keeping the sense of the sentence, which is what paraphrasing is about. You could perfectly encourage GPT-J to return more original paraphrases, by passing different examples in the input.

Conclusion

As you can see, few-shot learning is a great technique that helps GPT-J and GPT-Neo achieve almost any NLP task! The key here is to pass a correct context before making your request.

Even for simple text generation, it is recommended to pass as much context as possible, in order to help the model.

Hope you found it useful! If you have some questions about how to make the most of these models, please don't hesitate to ask us.

Julien Salinas
CTO at NLPCloud.io