Unicode

Unicode, formally The Unicode Standard,^{[note 1]} is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard^[A] defines 149813 characters^[3] and 161 scripts used in various ordinary, literary, academic, and technical contexts.

Alias(es)

Universal Coded Character Set (UCS)
ISO/IEC 10646

See list of scripts

Unicode Standard

(uncommon)

(obsolete)

ISO/IEC 8859
various others

Many common characters, including numerals, punctuation, and other symbols, are unified within the standard and are not treated as specific to any given writing system. Unicode encodes thousands of emoji, with the continued development thereof conducted by the Consortium as a part of the standard.^[4] Moreover, the widespread adoption of Unicode was in large part responsible for the initial popularization of emoji outside of Japan. Unicode is ultimately capable of encoding more than 1.1 million characters.

Unicode has largely supplanted the previous environment of myriad incompatible character sets, each used within different locales and on different computer architectures. Unicode is used to encode the vast majority of text on the Internet, including most web pages, and relevant Unicode support has become a common consideration in contemporary software development.

The Unicode character repertoire is synchronized with ISO/IEC 10646, each being code-for-code identical with one another. However, The Unicode Standard is more than just a repertoire within which characters are assigned. To aid developers and designers, the standard also provides charts and reference data, as well as annexes explaining concepts germane to various scripts, providing guidance for their implementation. Topics covered by these annexes include character normalization, character composition and decomposition, collation, and directionality.^[5]

Unicode text is processed and stored as binary data using one of several encodings, which define how to translate the standard's abstracted codes for characters into sequences of bytes. The Unicode Standard itself defines three encodings: UTF-8, UTF-16, and UTF-32, though several others exist. Of these, UTF-8 is the most widely used by a large margin, in part due to its backwards-compatibility with ASCII.

Private Use Area: U+E000–U+F8FF (6400 characters),

Supplementary Private Use Area-A: U+F0000–U+FFFFD (65534 characters),

Supplementary Private Use Area-B: U+100000–U+10FFFD (65534 characters).

FFFE or FFFF.

most of the ,

C0 control codes

the permanently unassigned code points D800–DFFF,

U+034F ͏ : Does not join graphemes.^[116]

COMBINING GRAPHEME JOINER

U+2118 ℘ : This is a small letter. The capital is U+1D4AB 𝒫 MATHEMATICAL SCRIPT CAPITAL P.^[117]

SCRIPT CAPITAL P

U+A015 ꀕ : This is not a Yi syllable, but a Yi iteration mark.

YI SYLLABLE WU

U+FE18 ︘ PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET: bracket is spelled incorrectly. (Spelling errors are resolved by using Unicode alias names.)

[118]

Comparison of Unicode encodings

(ICU), now as ICU-TC a part of Unicode

International Components for Unicode

List of binary codes

List of Unicode characters

List of XML and HTML character entity references

(LMBCS), a parallel development with similar intentions

Lotus Multi-Byte Character Set

Open-source Unicode typefaces

Religious and political symbols in Unicode

Standards related to Unicode

Unicode symbols

Universal Coded Character Set

Haralambous, Yannis; Martin Dürst (2019). "Unicode from a Linguistic Point of View". In Haralambous, Yannis (ed.). . Brest: Fluxus Editions. pp. 167–183. doi:10.36824/2018-graf-hara1. ISBN 978-2-9570549-1-6.

Proceedings of Graphemics in the 21st Century, Brest 2018

Unicode, Inc.

Unicode Technical Site

– contains lists of word processors with Unicode capability; fonts and characters are grouped by type; characters are presented in lists, not grids.

Alan Wood's Unicode Resources

at Curlie

Unicode

– displays the Unicode 6.1 value of any character in a document, including in the Private Use Area, rather than the glyph itself.

Unicode BMP Fallback Font

all 294 known writing systems with their Unicode status (131 not yet encoded as of 2023)

Unicode

Alias(es)

Alias(es)

Language(s)

Standard

Encoding formats

Preceded by

C0 control codes

COMBINING GRAPHEME JOINER

SCRIPT CAPITAL P

YI SYLLABLE WU

[118]

Comparison of Unicode encodings

International Components for Unicode

List of binary codes

List of Unicode characters

List of XML and HTML character entity references

Lotus Multi-Byte Character Set

Open-source Unicode typefaces

Religious and political symbols in Unicode

Standards related to Unicode

Unicode symbols

Universal Coded Character Set

Proceedings of Graphemics in the 21st Century, Brest 2018

Unicode Technical Site

Alan Wood's Unicode Resources

Unicode

Unicode BMP Fallback Font

The World's Writing Systems