Generative pre-trained transformer

Generative pre-trained transformers (GPT) are a type of large language model (LLM)^[1]^[2]^[3] and a prominent framework for generative artificial intelligence.^[4]^[5] They are artificial neural networks that are used in natural language processing tasks.^[6] GPTs are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content.^[2]^[3] As of 2023, most LLMs have these characteristics^[7] and are sometimes referred to broadly as GPTs.^[8]

The first GPT was introduced in 2018 by OpenAI.^[9] OpenAI has released very influential GPT foundation models that have been sequentially numbered, to comprise its "GPT-n" series.^[10] Each of these was significantly more capable than the previous, due to increased size (number of trainable parameters) and training. The most recent of these, GPT-4, was released in March 2023.^[11] Such models have been the basis for their more task-specific GPT systems, including models fine-tuned for instruction following—which in turn power the ChatGPT chatbot service.^[1]

The term "GPT" is also used in the names and descriptions of such models developed by others. For example, other GPT foundation models include a series of models created by EleutherAI,^[12] and seven models created by Cerebras in 2023.^[13] Also, companies in different industries have developed task-specific GPTs in their respective fields, such as Salesforce's "EinsteinGPT" (for CRM)^[14] and Bloomberg's "BloombergGPT" (for finance).^[15]

History[edit]

Initial developments[edit]

Generative pretraining (GP) was a long-established concept in machine learning applications.^[16]^[17]^[18] It was originally used as a form of semi-supervised learning, as the model is trained first on an unlabelled dataset (pretraining step) by learning to generate datapoints in the dataset, and then it is trained to classify a labelled dataset.^[19]

While the unnormalized linear transformer dates back to 1992,^[20]^[21]^[22] the modern transformer architecture was not available until 2017 when it was published by researchers at Google in a paper "Attention Is All You Need".^[23] That development led to the emergence of large language models such as BERT in 2018^[24] which was a pre-trained transformer (PT) but not designed to be generative (BERT was an "encoder-only" model).^[25] Also around that time, in 2018, OpenAI published its article entitled "Improving Language Understanding by Generative Pre-Training", in which it introduced the first generative pre-trained transformer (GPT) system ("GPT-1").^[26]

Prior to transformer-based architectures, the best-performing neural NLP (natural language processing) models commonly employed supervised learning from large amounts of manually-labeled data. The reliance on supervised learning limited their use on datasets that were not well-annotated, and also made it prohibitively expensive and time-consuming to train extremely large language models.^[26]

The semi-supervised approach OpenAI employed to make a large-scale generative system—and was first to do with a transformer model—involved two stages: an unsupervised generative "pretraining" stage to set initial parameters using a language modeling objective, and a supervised discriminative "fine-tuning" stage to adapt these parameters to a target task.^[26]

Later developments[edit]

Regarding more recent GPT foundation models, OpenAI published its first versions of GPT-3 in July 2020. There were three models, with 1B, 6.7B, 175B parameters, respectively named babbage, curie, and davinci (giving initials B, C, and D).

In July 2021, OpenAI published Codex, a task-specific GPT model targeted for programming applications. This was developed by fine-tuning a 12B parameter version of GPT-3 (different from previous GPT-3 models) using code from GitHub.^[27]

In March 2022, OpenAI published two versions of GPT-3 that were fine-tuned for instruction-following (instruction-tuned), named davinci-instruct-beta (175B) and text-davinci-001,^[28] and then started beta testing code-davinci-002.^[29] text-davinci-002 was instruction-tuned from code-davinci-002. Both text-davinci-003 and ChatGPT were released in November 2022, with both building upon text-davinci-002 via reinforcement learning from human feedback (RLHF). text-davinci-003 is trained for following instructions (like its predecessors), whereas ChatGPT is further trained for conversational interaction with a human user.^[30]^[31]

OpenAI's most recent GPT foundation model, GPT-4, was released on March 14, 2023. It can be accessed directly by users via a premium version of ChatGPT, and is available to developers for incorporation into other products and services via OpenAI's API. Other producers of GPT foundation models include EleutherAI (with a series of models starting in March 2021)^[12] and Cerebras (with seven models released in March 2023).^[13]

EinsteinGPT – for sales and marketing domains, to aid with customer relationship management (uses )^[61]^[62]

GPT-3.5

BloombergGPT – for the financial domain, to aid with financial news and information (uses "freely available" AI methods, combined with their proprietary data)

[63]

Khanmigo – described as a GPT version for tutoring, in the education domain, it aids students using by guiding them through their studies without directly providing answers (powered by GPT-4)^[64]^[65]

Khan Academy

SlackGPT – for the instant-messaging service, to aid with navigating and summarizing discussions on it (uses OpenAI's API)^[66]

Slack

BioGPT – for the biomedical domain, to aid with biomedical literature text generation and mining (uses )^[67]

GPT-2

Brand issues[edit]

OpenAI, which created the first generative pre-trained transformer (GPT) in 2018, has recently asserted that "GPT" should be regarded as a brand of OpenAI.^[73] In April 2023, OpenAI revised the brand guidelines in its terms of service to indicate that other businesses using its API to run their artificial intelligence (AI) services would no longer be able to include "GPT" in such names or branding.^[74] In May 2023, OpenAI engaged a brand management service to notify its API customers of this policy, although these notifications stopped short of making overt legal claims (such as allegations of trademark infringement or demands to cease and desist).^[73] As of November 2023, OpenAI still prohibits its API licensees from naming their own products with "GPT",^[75] but it has begun enabling its ChatGPT Plus subscribers to make "custom versions of ChatGPT" that are being called GPTs on the OpenAI site.^[76] OpenAI's terms of service says that its subscribers may use "GPT" in the names of these, although it's "discouraged".^[75]

Relatedly, OpenAI has applied to the United States Patent and Trademark Office (USPTO) to seek domestic trademark registration for the term "GPT" in the field of AI.^[73] OpenAI sought to expedite handling of its application, but the USPTO declined that request in April 2023.^[77] In May 2023, the USPTO responded to the application with a determination that "GPT" was both descriptive and generic.^[78] As of November 2023, OpenAI continues to pursue its argument through the available processes. Regardless, failure to obtain a registered U.S. trademark does not preclude some level of common-law trademark rights in the U.S.,^[79] and/or trademark rights in other countries.^[80]

For any given type or scope of trademark protection in the U.S., OpenAI would need to establish that the term is actually "distinctive" to their specific offerings in addition to being a broader technical term for the kind of technology. Some media reports suggested that OpenAI may be able to obtain trademark registration based indirectly on the fame of its GPT-based chatbot product, ChatGPT,^[77]^[81] for which OpenAI has separately sought protection (and which it has sought to enforce more strongly).^[82] Other reports have indicated that registration for the bare term "GPT" seems unlikely to be granted,^[73]^[83] as it is used frequently as a common term to refer simply to AI systems that involve generative pre-trained transformers.^[3]^[84]^[85]^[86] In any event, to whatever extent exclusive rights in the term may occur the U.S., others would need to avoid using it for similar products or services in ways likely to cause confusion.^[83]^[87] If such rights ever became broad enough to implicate other well-established uses in the field, the trademark doctrine of descriptive fair use could still continue non-brand-related usage.^[88]

GPT-1: report, GitHub release.^[89]

[9]

GPT-2: blog announcement, report on its decision of "staged release",^[91] GitHub release.^[92]

[90]

GPT-3: report. No GitHub or any other form of code release thenceforth.

[36]

WebGPT: blog announcement, report,^[94]

[93]

InstructGPT: blog announcement, report.^[50]

[49]

ChatGPT: blog announcement (no report).

[54]

GPT-4: blog announcement, reports,^[96]^[97] model card.^[98]

[95]

This section lists the main official publications from OpenAI and Microsoft on their GPT models.