Keyword/Keyphrase Extraction API, based on GPT-J

What is Keyword/Keyphrase Extraction and Why Use GPT-J?

Keyword extraction is about extracting one or several important words from a piece of text. These words must be core ideas from the text.

For example, imagine you have the following content:

Information Retrieval (IR) is the process of obtaining resources relevant to the information need. For instance, a search query on a web search engine can be an information need. The search engine can return web pages that represent relevant resources.

The important keywords in this example could be information, resources, search.

If keywords are too simple, you might want to extract keyphrases: a combination of several words. For example, in the above content, important keyphrases could be information retrieval, relevant resources, search query, search engine.

Performing keyword and keyphrase extraction is harder than it sounds. It takes an advanced AI model to understand the core ideas from a piece of text.

GPT-J is the most advanced open-source NLP model as of this writing, and this is the best GPT-3 alternative. This model is so big that it can adapt to many situations, and sounds like it thinks like a human. For advanced use cases, it is possible to fine-tune GPT-J (train it with your own data), which is a great way to perform keyword extraction that is perfectly tailored to your use case or industry.

Why Use Keyword/Keyphrase Extraction?

Keyword and keyphrase extractions are a great way to instantly have a good grasp on a piece of text, and potentially categorize the text for later use. Here are a couple of examples:

Social Media Analysis

Tons of ideas are written in social media and you might want to understand the main ideas behind this chaos. With keyword/keyphrase extraction you can instantly do this.

Customers' Feedbacks

Asking for customers' feedbacks is great practice, but it takes a lot of time to properly analyze the results. You can easily perform qualitative analysis thanks to keyword/keyphrase extraction.

Competition Monitoring

Do you want to monitor the brand of your competitors? You can easily do it by retrieving their content and get the most important ideas.

SEO

Finding the right keywords for your positioning can be tricky. A strategy could be to analyze your competitor's websites, and understand which keywords they are positioning on.

Keyword/Keyphrase Extraction with GPT-J

In order to make the most of GPT-J, it is crucial to have in mind the so-called few-shot learning technique: by giving only a couple of examples to the AI, it is possible to dramatically improve the relevancy of the results, without even training a dedicated AI.

Sometimes, few-shot learning is not enough (for example if your extraction relies on very specific content, bound to your use case or your industry). In that case, the best solution is to fine-tune (train) GPT-J with your own data.

Building an inference API for keyword/keyphrase based on GPT-J is a necessary step as soon a you want to use extraction in production. But building such an API is hard... First because you need to code the API (easy part) but also because you need to build a highly available, fast, and scalable infrastructure to serve your models behind the hood (hardest part). It is especially hard for machine learning models as they consume a lot of resources (memory, disk space, CPU, GPU...).

Such an API is interesting because it is completely decoupled from the rest of your stack (microservice architecture), so you can easily scale it independently, and you can access it using any programming language. Most machine learning frameworks are developed in Python, but it's likely that you want to access them from other languages like Javascript, Go, Ruby...

NLP Cloud's Keyword/Keyphrase Extraction API

NLP Cloud proposes a keyword/keyphrase extraction API based on GPT-J that gives you the opportunity to perform extraction out of the box, with breathtaking results. If the base GPT-J model is not enough, you can also fine-tune/train GPT-J on NLP Cloud and automatically deploy the new model to production with only one click.

For more details, see our documentation about text generation with GPT-J. Also see our few-shot learning example dedicated to keyword/keyphrase extraction and easily test GPT-J extraction on our playground.