Parallel computing

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously.^[1] Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling.^[2] As power consumption (and consequently heat generation) by computers has become a concern in recent years,^[3] parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.^[4]

"Parallelization" redirects here. For parallelization of manifolds, see Parallelization (mathematics).

In computer science, parallelism and concurrency are two different things: a parallel program uses multiple CPU cores, each core performing a task independently. On the other hand, concurrency enables a program to deal with multiple tasks even on a single CPU core; the core switches between tasks (i.e. threads) without necessarily completing each one. A program can have both, neither of or a combination of parallelism and concurrency characteristics.^[5]

Parallel computers can be roughly classified according to the level at which the hardware supports parallelism, with multi-core and multi-processor computers having multiple processing elements within a single machine, while clusters, MPPs, and grids use multiple computers to work on the same task. Specialized parallel computer architectures are sometimes used alongside traditional processors, for accelerating specific tasks.

In some cases parallelism is transparent to the programmer, such as in bit-level or instruction-level parallelism, but explicitly parallel algorithms, particularly those that use concurrency, are more difficult to write than sequential ones,^[6] because concurrency introduces several new classes of potential software bugs, of which race conditions are the most common. Communication and synchronization between the different subtasks are typically some of the greatest obstacles to getting optimal parallel program performance.

A theoretical upper bound on the speed-up of a single program as a result of parallelization is given by Amdahl's law, which states that it is limited by the fraction of time for which the parallelization can be utilised.

S_latency is the potential in latency of the execution of the whole task;

speedup

s is the speedup in latency of the execution of the parallelizable part of the task;

p is the percentage of the execution time of the whole task concerning the parallelizable part of the task before parallelization.

Hardware[edit]

Memory and communication[edit]

Main memory in a parallel computer is either shared memory (shared between all processing elements in a single address space), or distributed memory (in which each processing element has its own local address space).^[37] Distributed memory refers to the fact that the memory is logically distributed, but often implies that it is physically distributed as well. Distributed shared memory and memory virtualization combine the two approaches, where the processing element has its own local memory and access to the memory on non-local processors. Accesses to local memory are typically faster than accesses to non-local memory. On the supercomputers, distributed shared memory space can be implemented using the programming model such as PGAS. This model allows processes on one compute node to transparently access the remote memory of another compute node. All compute nodes are also connected to an external shared memory system via high-speed interconnect, such as Infiniband, this external shared memory system is known as burst buffer, which is typically built from arrays of non-volatile memory physically distributed across multiple I/O nodes.

Dense

linear algebra

Sparse linear algebra

Spectral methods (such as )

Cooley–Tukey fast Fourier transform

(such as Barnes–Hut simulation)

N-body problems

problems (such as Lattice Boltzmann methods)

Structured grid

problems (such as found in finite element analysis)

Unstructured grid

Monte Carlo method

(such as brute-force cryptographic techniques)

Combinational logic

(such as sorting algorithms)

Graph traversal

Dynamic programming

methods

Branch and bound

(such as detecting hidden Markov models and constructing Bayesian networks)

Graphical models

a concise message-passing model^[62]

HBJ model

simulation

Finite-state machine

As parallel computers become larger and faster, we are now able to solve problems that had previously taken too long to run. Fields as varied as bioinformatics (for protein folding and sequence analysis) and economics have taken advantage of parallel computing. Common types of problems in parallel computing applications include:^[61]

Thomas R. Blakeslee,

[77]

,^[78]^[79]

Michael S. Gazzaniga

,^[80]

Robert E. Ornstein

,^[81]^[82]

Ernest Hilgard

,^[83]

Michio Kaku

,^[84]

George Ivanovich Gurdjieff

Neurocluster Brain Model.

[85]

In the early 1970s, at the MIT Computer Science and Artificial Intelligence Laboratory, Marvin Minsky and Seymour Papert started developing the Society of Mind theory, which views the biological brain as massively parallel computer. In 1986, Minsky published The Society of Mind, which claims that "mind is formed from many little agents, each mindless by itself".^[75] The theory attempts to explain how what we call intelligence could be a product of the interaction of non-intelligent parts. Minsky says that the biggest source of ideas about the theory came from his work in trying to create a machine that uses a robotic arm, a video camera, and a computer to build with children's blocks.^[76]

Similar models (which also view the biological brain as a massively parallel computer, i.e., the brain is made up of a constellation of independent or semi-independent agents) were also described by:

Rodriguez, C.; Villagra, M.; Baran, B. (29 August 2008). "Asynchronous team algorithms for Boolean Satisfiability". Bio-Inspired Models of Network, Information and Computing Systems, 2007. Bionetics 2007. 2nd: 66–69. :10.1109/BIMNICS.2007.4610083. S2CID 15185219.

doi

Sechin, A.; Parallel Computing in Photogrammetry. GIM International. #1, 2016, pp. 21–23.