Generative model

In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsistent,^[a] but three major types can be distinguished, following Jebara (2004):

This article is about generative models in the context of statistical classification. For generative models of Markov decision processes, see Markov decision process § Simulator models. For Generative Modelling Language (GML) in computer graphics and generative computer programming, see Generative Modelling Language. For Generative Artificial Intelligence (Generative A.I.) models/systems, see Generative artificial intelligence.

The distinction between these last two classes is not consistently made;^[4] Jebara (2004) refers to these three classes as generative learning, conditional learning, and discriminative learning, but Ng & Jordan (2002) only distinguish two classes, calling them generative classifiers (joint distribution) and discriminative classifiers (conditional distribution or no distribution), not distinguishing between the latter two classes.^[5] Analogously, a classifier based on a generative model is a generative classifier, while a classifier based on a discriminative model is a discriminative classifier, though this term also refers to classifiers that are not based on a model.

Standard examples of each, all of which are linear classifiers, are:

In application to classification, one wishes to go from an observation x to a label y (or probability distribution on labels). One can compute this directly, without using a probability distribution (distribution-free classifier); one can estimate the probability of a label given an observation, $P(Y|X=x)$ (discriminative model), and base classification on that; or one can estimate the joint distribution $P(X,Y)$ (generative model), from that compute the conditional probability $P(Y|X=x)$ , and then base classification on that. These are increasingly indirect, but increasingly probabilistic, allowing more domain knowledge and probability theory to be applied. In practice different approaches are used, depending on the particular problem, and hybrids can combine strengths of multiple approaches.

a generative model is a model of the conditional probability of the observable X, given a target y, symbolically, $P(X\mid Y=y)$

[2]

a discriminative model is a model of the conditional probability of the target Y, given an observation x, symbolically, $P(Y\mid X=x)$

[3]

Contrast with discriminative classifiers[edit]

A generative algorithm models how the data was generated in order to categorize a signal. It asks the question: based on my generation assumptions, which category is most likely to generate this signal? A discriminative algorithm does not care about how the data was generated, it simply categorizes a given signal. So, discriminative algorithms try to learn $p(y|x)$ directly from the data and then try to classify data. On the other hand, generative algorithms try to learn $p(x,y)$ which can be transformed into $p(y|x)$ later to classify the data. One of the advantages of generative algorithms is that you can use $p(x,y)$ to generate new data similar to existing data. On the other hand, it has been proved that some discriminative algorithms give better performance than some generative algorithms in classification tasks.^[6]

Despite the fact that discriminative models do not need to model the distribution of the observed variables, they cannot generally express complex relationships between the observed and target variables. But in general, they don't necessarily perform better than generative models at classification and regression tasks. The two classes are seen as complementary or as different views of the same procedure.^[7]

Deep generative models[edit]

With the rise of deep learning, a new family of methods, called deep generative models (DGMs),^[8]^[9] is formed through the combination of generative models and deep neural networks. An increase in the scale of the neural networks is typically accompanied by an increase in the scale of the training data, both of which are required for good performance.^[10]

Popular DGMs include variational autoencoders (VAEs), generative adversarial networks (GANs), and auto-regressive models. Recently, there has been a trend to build very large deep generative models.^[8] For example, GPT-3, and its precursor GPT-2,^[11] are auto-regressive neural language models that contain billions of parameters, BigGAN^[12] and VQ-VAE^[13] which are used for image generation that can have hundreds of millions of parameters, and Jukebox is a very large generative model for musical audio that contains billions of parameters.^[14]

Types[edit]