History and background[edit]

The average-case performance of algorithms has been studied since modern notions of computational efficiency were developed in the 1950s. Much of this initial work focused on problems for which worst-case polynomial time algorithms were already known.^[3] In 1973, Donald Knuth^[4] published Volume 3 of the Art of Computer Programming which extensively surveys average-case performance of algorithms for problems solvable in worst-case polynomial time, such as sorting and median-finding.

An efficient algorithm for $NP$ -complete problems is generally characterized as one which runs in polynomial time for all inputs; this is equivalent to requiring efficient worst-case complexity. However, an algorithm which is inefficient on a "small" number of inputs may still be efficient for "most" inputs that occur in practice. Thus, it is desirable to study the properties of these algorithms where the average-case complexity may differ from the worst-case complexity and find methods to relate the two.

The fundamental notions of average-case complexity were developed by Leonid Levin in 1986 when he published a one-page paper^[5] defining average-case complexity and completeness while giving an example of a complete problem for $distNP$ , the average-case analogue of $NP$ .

Definitions[edit]

Efficient average-case complexity[edit]

The first task is to precisely define what is meant by an algorithm which is efficient "on average". An initial attempt might define an efficient average-case algorithm as one which runs in expected polynomial time over all possible inputs. Such a definition has various shortcomings; in particular, it is not robust to changes in the computational model. For example, suppose algorithm $A$ runs in time $t A (x)$ on input $x$ and algorithm $B$ runs in time $t A (x) 2$ on input $x$ ; that is, $B$ is quadratically slower than $A$ . Intuitively, any definition of average-case efficiency should capture the idea that $A$ is efficient-on-average if and only if $B$ is efficient on-average. Suppose, however, that the inputs are drawn randomly from the uniform distribution of strings with length $n$ , and that $A$ runs in time $n 2$ on all inputs except the string $1 n$ for which $A$ takes time $2 n$ . Then it can be easily checked that the expected running time of $A$ is polynomial but the expected running time of $B$ is exponential.^[3]

To create a more robust definition of average-case efficiency, it makes sense to allow an algorithm $A$ to run longer than polynomial time on some inputs but the fraction of inputs on which $A$ requires larger and larger running time becomes smaller and smaller. This intuition is captured in the following formula for average polynomial running time, which balances the polynomial trade-off between running time and fraction of inputs:

Applications[edit]

Sorting algorithms[edit]

As mentioned above, much early work relating to average-case complexity focused on problems for which polynomial-time algorithms already existed, such as sorting. For example, many sorting algorithms which utilize randomness, such as Quicksort, have a worst-case running time of $O(n 2)$ , but an average-case running time of $O(n log(n))$ , where $n$ is the length of the input to be sorted.^[2]

Cryptography[edit]

For most problems, average-case complexity analysis is undertaken to find efficient algorithms for a problem that is considered difficult in the worst-case. In cryptographic applications, however, the opposite is true: the worst-case complexity is irrelevant; we instead want a guarantee that the average-case complexity of every algorithm which "breaks" the cryptographic scheme is inefficient.^[11]

Thus, all secure cryptographic schemes rely on the existence of one-way functions.^[3] Although the existence of one-way functions is still an open problem, many candidate one-way functions are based on hard problems such as integer factorization or computing the discrete log. Note that it is not desirable for the candidate function to be $NP$ -complete since this would only guarantee that there is likely no efficient algorithm for solving the problem in the worst case; what we actually want is a guarantee that no efficient algorithm can solve the problem over random inputs (i.e. the average case). In fact, both the integer factorization and discrete log problems are in $NP \cap$ $coNP$ , and are therefore not believed to be $NP$ -complete.^[7] The fact that all of cryptography is predicated on the existence of average-case intractable problems in $NP$ is one of the primary motivations for studying average-case complexity.

Other results[edit]

In 1990, Impagliazzo and Levin showed that if there is an efficient average-case algorithm for a $distNP$ -complete problem under the uniform distribution, then there is an average-case algorithm for every problem in $NP$ under any polynomial-time samplable distribution.^[12] Applying this theory to natural distributional problems remains an outstanding open question.^[3]

In 1992, Ben-David et al. showed that if all languages in $distNP$ have good-on-average decision algorithms, they also have good-on-average search algorithms. Further, they show that this conclusion holds under a weaker assumption: if every language in $NP$ is easy on average for decision algorithms with respect to the uniform distribution, then it is also easy on average for search algorithms with respect to the uniform distribution.^[13] Thus, cryptographic one-way functions can exist only if there are $distNP$ problems over the uniform distribution that are hard on average for decision algorithms.

In 1993, Feigenbaum and Fortnow showed that it is not possible to prove, under non-adaptive random reductions, that the existence of a good-on-average algorithm for a $distNP$ -complete problem under the uniform distribution implies the existence of worst-case efficient algorithms for all problems in $NP$ .^[14] In 2003, Bogdanov and Trevisan generalized this result to arbitrary non-adaptive reductions.^[15] These results show that it is unlikely that any association can be made between average-case complexity and worst-case complexity via reductions.^[3]

Probabilistic analysis of algorithms

NP-complete problems

Worst-case complexity

Amortized analysis

Best, worst and average case

Franco, John (1986), "On the probabilistic performance of algorithms for the satisfiability problem", Information Processing Letters, 23 (2): 103–106, :10.1016/0020-0190(86)90051-7.

doi

(1986), "Average case complete problems", SIAM Journal on Computing, 15 (1): 285–286, doi:10.1137/0215020.

Levin, Leonid

; Vitter, J. S. (August 1987), Average-case analysis of algorithms and data structures, Tech. Report, Institut National de Recherche en Informatique et en Automatique, B.P. 105-78153 Le Chesnay Cedex France.

Flajolet, Philippe

; Shelah, Saharon (1987), "Expected computation time for Hamiltonian path problem", SIAM Journal on Computing, 16 (3): 486–502, CiteSeerX 10.1.1.359.8982, doi:10.1137/0216034.

Gurevich, Yuri

Ben-David, Shai; ; Goldreich, Oded; Luby, Michael (1989), "On the theory of average case complexity", Proc. 21st Annual Symposium on Theory of Computing, Association for Computing Machinery, pp. 204–216.

Chor, Benny

(1991), "Average case completeness", Journal of Computer and System Sciences, 42 (3): 346–398, doi:10.1016/0022-0000(91)90007-R, hdl:2027.42/29307. See also 1989 draft.

Gurevich, Yuri

Selman, B.; Mitchell, D.; Levesque, H. (1992), "Hard and easy distributions of SAT problems", Proc. 10th National Conference on Artificial Intelligence, pp. 459–465.

Schuler, Rainer; Yamakami, Tomoyuki (1992), "Structural average case complexity", Proc. Foundations of Software Technology and Theoretical Computer Science, Lecture Notes in Computer Science, vol. 652, Springer-Verlag, pp. 128–139.

Reischuk, Rüdiger; Schindelhauer, Christian (1993), "Precise average case complexity", Proc. 10th Annual Symposium on Theoretical Aspects of Computer Science, pp. 650–661.

Venkatesan, R.; Rajagopalan, S. (1992), "Average case intractability of matrix and Diophantine problems", , Association for Computing Machinery, pp. 632–642.

Proc. 24th Annual Symposium on Theory of Computing

Cox, Jim; Ericson, Lars; Mishra, Bud (1995), (PDF), Technical Report TR1995-711, New York University Computer Science Department.

The average case complexity of multilevel syllogistic

(April 17, 1995), A personal view of average-case complexity, University of California, San Diego.

Impagliazzo, Russell

Paul E. Black, , in Dictionary of Algorithms and Data Structures[online]Paul E. Black, ed., U.S. National Institute of Standards and Technology. 17 December 2004.Retrieved Feb. 20/09.

"Θ"

Christos Papadimitriou (1994). Computational Complexity. Addison-Wesley.

The literature of average case complexity includes the following work: