Persistent data structure

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article.^[1]

Not to be confused with Persistent storage.

A data structure is partially persistent if all versions can be accessed but only the newest version can be modified. The data structure is fully persistent if every version can be both accessed and modified. If there is also a meld or merge operation that can create a new version from two previous versions, the data structure is called confluently persistent. Structures that are not persistent are called ephemeral.^[2]

These types of data structures are particularly common in logical and functional programming,^[2] as languages in those paradigms discourage (or fully forbid) the use of mutable data.

Partial versus full persistence[edit]

In the partial persistence model, a programmer may query any previous version of a data structure, but may only update the latest version. This implies a linear ordering among each version of the data structure.^[3] In the fully persistent model, both updates and queries are allowed on any version of the data structure. In some cases the performance characteristics of querying or updating older versions of a data structure may be allowed to degrade, as is true with the rope data structure.^[4] In addition, a data structure can be referred to as confluently persistent if, in addition to being fully persistent, two versions of the same data structure can be combined to form a new version which is still fully persistent.^[5]

A stratified tree with m elements is implemented using dynamic perfect hashing.

The tree is pruned by dividing the m elements into buckets of size log(log n) such that the elements of bucket 1 is smaller than the elements of bucket 2 and so on.

The maximal element in each bucket is stored in the stratified tree and each bucket is stored in the structure as an unordered linked list.

A type of data structure where user may query any version of the structure but may only update the latest version.

An ephermeral data structure can be converted to partially persistent data structure using a few techniques.

One of the technique is by using randomized version of Van Emde Boas Tree which is created using dynamic perfect hashing. This data structure is created as follows:

The size of this data structure is bounded by the number of elements stored in the structure that is O(m). The insertion of a new maximal element is done in constant O(1) expected and amortized time. Finally query to find an element can be done in this structure in O(log(log n)) worst-case time.^[6]

Techniques for preserving previous versions[edit]

Copy-on-write[edit]

One method for creating a persistent data structure is to use a platform provided ephemeral data structure such as an array to store the data in the data structure and copy the entirety of that data structure using copy-on-write semantics for any updates to the data structure. This is an inefficient technique because the entire backing data structure must be copied for each write, leading to worst case O(n·m) performance characteristics for m modifications of an array of size n.

Fat node[edit]

The fat node method is to record all changes made to node fields in the nodes themselves, without erasing old values of the fields. This requires that nodes be allowed to become arbitrarily “fat”. In other words, each fat node contains the same information and pointer fields as an ephemeral node, along with space for an arbitrary number of extra field values. Each extra field value has an associated field name and a version stamp which indicates the version in which the named field was changed to have the specified value. Besides, each fat node has its own version stamp, indicating the version in which the node was created. The only purpose of nodes having version stamps is to make sure that each node only contains one value per field name per version. In order to navigate through the structure, each original field value in a node has a version stamp of zero.

CREATE-NODE(): Creates a new vertex with no incoming or outgoing edges.

CHANGE-EDGE( $v$ , $i$ , $u$ ): Changes the $i$ ^th edge of $v$ to point to $u$

CHANGE-LABEL( $v$ , $x$ ): Changes the value of the data stored at $v$ to $x$

Applications of persistent data structures[edit]

Next element search or point location[edit]

One of the useful applications that can be solved efficiently using persistence is the Next Element Search. Assume that there are $n$ non intersecting line segments that don't cross each other that are parallel to the x-axis. We want to build a data structure that can query a point $p$ and return the segment above $p$ (if any). We will start by solving the Next Element Search using the naïve method then we will show how to solve it using the persistent data structure method.

Usage in programming languages[edit]

Haskell[edit]

Haskell is a pure functional language and therefore does not allow for mutation. Therefore, all data structures in the language are persistent, as it is impossible to not preserve the previous state of a data structure with functional semantics.^[15] This is because any change to a data structure that would render previous versions of a data structure invalid would violate referential transparency.

In its standard library Haskell has efficient persistent implementations for linked lists,^[16] Maps (implemented as size balanced trees),^[17] and Sets^[18] among others.^[19]

Clojure[edit]

Like many programming languages in the Lisp family, Clojure contains an implementation of a linked list, but unlike other dialects its implementation of a linked list has enforced persistence instead of being persistent by convention.^[20] Clojure also has efficient implementations of persistent vectors, maps, and sets based on persistent hash array mapped tries. These data structures implement the mandatory read-only parts of the Java collections framework.^[21]

The designers of the Clojure language advocate the use of persistent data structures over mutable data structures because they have value semantics which gives the benefit of making them freely shareable between threads with cheap aliases, easy to fabricate, and language independent.^[22]

These data structures form the basis of Clojure's support for parallel computing since they allow for easy retries of operations to sidestep data races and atomic compare and swap semantics.^[23]

Elm[edit]

The Elm programming language is purely functional like Haskell, which makes all of its data structures persistent by necessity. It contains persistent implementations of linked lists as well as persistent arrays, dictionaries, and sets.^[24]

Elm uses a custom virtual DOM implementation that takes advantage of the persistent nature of Elm data. As of 2016 it was reported by the developers of Elm that this virtual DOM allows the Elm language to render HTML faster than the popular JavaScript frameworks React, Ember, and Angular.^[25]

Java[edit]

The Java programming language is not particularly functional. Despite this, the core JDK package java.util.concurrent includes CopyOnWriteArrayList and CopyOnWriteArraySet which are persistent structures, implemented using copy-on-write techniques. The usual concurrent map implementation in Java, ConcurrentHashMap, is not persistent, however. Fully persistent collections are available in third-party libraries,^[26] or other JVM languages.

Garbage collection[edit]

Because persistent data structures are often implemented in such a way that successive versions of a data structure share underlying memory^[38] ergonomic use of such data structures generally requires some form of automatic garbage collection system such as reference counting or mark and sweep.^[39] In some platforms where persistent data structures are used it is an option to not use garbage collection which, while doing so can lead to memory leaks, can in some cases have a positive impact on the overall performance of an application.^[40]

Persistent data structure

Partial versus full persistence[edit]

Techniques for preserving previous versions[edit]

Copy-on-write[edit]

Fat node[edit]

Applications of persistent data structures[edit]

Next element search or point location[edit]

Usage in programming languages[edit]

Haskell[edit]

Clojure[edit]

Elm[edit]

Java[edit]

Garbage collection[edit]

Copy-on-write

Navigational database

Persistent data

Retroactive data structures

Purely functional data structure

Lightweight Java implementation of Persistent Red-Black Trees

Efficient persistent structures in C#