PDF

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.^[2]^[3] Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991.^[4] PDF was standardized as ISO 32000 in 2008.^[5] The last edition as ISO 32000-2:2020 was published in December 2020.

For other uses, see PDF (disambiguation).

Filename extension

.pdf

application/pdf,^[1]
application/x-pdf
application/x-bzpdf
application/x-gzpdf

PDF ^[1] (including a single trailing space)

com.adobe.pdf

%PDF

Adobe Inc. (1991–2008)
ISO (2008–)

June 15, 1993 (1993-06-15)

2.0

PDF/A, PDF/E, PDF/UA, PDF/VT, PDF/X

ISO 32000-2

Yes

iso.org/standard/75839.html

PDF files may contain a variety of content besides flat text and graphics including logical structuring elements, interactive elements such as annotations and form-fields, layers, rich media (including video content), three-dimensional objects using U3D or PRC, and various other data formats. The PDF specification also provides for encryption and digital signatures, file attachments, and metadata to enable workflows requiring these features.

Typeset text stored as content streams (i.e., not encoded in );

plain text

Vector graphics for illustrations and designs that consist of shapes and lines;

Raster graphics for photographs and other types of images; and

Other multimedia objects.

values, representing true or false

Boolean

Real numbers

Integers

enclosed within parentheses ((...)) or represented as hexadecimal within single angle brackets (<...>). Strings may contain 8-bit characters.

Strings

Names, starting with a forward slash (/)

ordered collections of objects enclosed within square brackets ([...])

Arrays

collections of objects indexed by names enclosed within double angle brackets (<<...>>)

Dictionaries

usually containing large amounts of optionally compressed binary data, preceded by a dictionary and enclosed between the stream and endstream keywords.

Streams

The object

null

A PDF file is organized using ASCII characters, except for certain elements that may have binary content. The file starts with a header containing a magic number (as a readable string) and the version of the format, for example %PDF-1.7. The format is a subset of a COS ("Carousel" Object Structure) format.^[23] A COS tree file consists primarily of objects, of which there are nine types:^[16]

Comments using 8-bit characters prefixed with the percent sign (%) may be inserted.

Objects may be either direct (embedded in another object) or indirect. Indirect objects are numbered with an object number and a generation number and defined between the obj and endobj keywords if residing in the document root. Beginning with PDF version 1.5, indirect objects (except other streams) may also be located in special streams known as object streams (marked /Type /ObjStm). This technique enables non-stream objects to have standard stream filters applied to them, reduces the size of files that have large numbers of small indirect objects and is especially useful for Tagged PDF. Object streams do not support specifying an object's generation number (other than 0).

An index table, also called the cross-reference table, is located near the end of the file and gives the byte offset of each indirect object from the start of the file.^[24] This design allows for efficient random access to the objects in the file, and also allows for small changes to be made without rewriting the entire file (incremental update). Before PDF version 1.5, the table would always be in a special ASCII format, be marked with the xref keyword, and follow the main body composed of indirect objects. Version 1.5 introduced optional cross-reference streams, which have the form of a standard stream object, possibly with filters applied. Such a stream may be used instead of the ASCII cross-reference table and contains the offsets and other information in binary format. The format is flexible in that it allows for integer width specification (using the /W array), so that for example, a document not exceeding 64 KiB in size may dedicate only 2 bytes for object offsets.

At the end of a PDF file is a footer containing

If a cross-reference stream is not being used, the footer is preceded by the trailer keyword followed by a dictionary containing information that would otherwise be contained in the cross-reference stream object's dictionary:

Within each page, there are one or multiple content streams that describe the text, vector and images being drawn on the page. The content stream is stack-based, similar to PostScript.^[25]

There are two layouts to the PDF files: non-linearized (not "optimized") and linearized ("optimized"). Non-linearized PDF files can be smaller than their linear counterparts, though they are slower to access because portions of the data required to assemble pages of the document are scattered throughout the PDF file. Linearized PDF files (also called "optimized" or "web optimized" PDF files) are constructed in a manner that enables them to be read in a Web browser plugin without waiting for the entire file to download, since all objects required for the first page to display are optimally organized at the start of the file.^[26] PDF files may be optimized using Adobe Acrobat software or QPDF.

Page dimensions are not limited by the format itself. However, Adobe Acrobat imposes a limit of 15 million by 15 million inches, or 225 trillion in² (145,161 km²).^[2]

The current transformation matrix (CTM), which determines the coordinate system

The

clipping path

The

color space

The , which is a key component of transparency

alpha constant

control (introduced in PDF 2.0)

Black point compensation

Additional features[edit]

Logical structure and accessibility[edit]

A "tagged" PDF (see clause 14.8 in ISO 32000) includes document structure and semantics information to enable reliable text extraction and accessibility. Technically speaking, tagged PDF is a stylized use of the format that builds on the logical structure framework introduced in PDF 1.3. Tagged PDF defines a set of standard structure types and attributes that allow page content (text, graphics, and images) to be extracted and reused for other purposes.^[30]

Tagged PDF is not required in situations where a PDF file is intended only for print. Since the feature is optional, and since the rules for Tagged PDF were relatively vague in ISO 32000-1, support for tagged PDF among consuming devices, including assistive technology (AT), is uneven as of 2021.^[31] ISO 32000-2, however, includes an improved discussion of tagged PDF which is anticipated to facilitate further adoption.

An ISO-standardized subset of PDF specifically targeted at accessibility, PDF/UA, was first published in 2012.

Optional Content Groups (layers)[edit]

With the introduction of PDF version 1.5 (2003) came the concept of Layers. Layers, more formally known as Optional Content Groups (OCGs), refer to sections of content in a PDF document that can be selectively viewed or hidden by document authors or viewers. This capability is useful in CAD drawings, layered artwork, maps, multi-language documents, etc.

Basically, it consists of an Optional Content Properties Dictionary added to the document root. This dictionary contains an array of Optional Content Groups (OCGs), each describing a set of information and each of which may be individually displayed or suppressed, plus a set of Optional Content Configuration Dictionaries, which give the status (Displayed or Suppressed) of the given OCGs.

Encryption and signatures[edit]

A PDF file may be encrypted, for security, in which case a password is needed to view or edit the contents. PDF 2.0 defines 256-bit AES encryption as the standard for PDF 2.0 files. The PDF Reference also defines ways that third parties can define their own encryption systems for PDF.

PDF files may be digitally signed, to provide secure authentication; complete details on implementing digital signatures in PDF are provided in ISO 32000-2.

PDF files may also contain embedded DRM restrictions that provide further controls that limit copying, editing, or printing. These restrictions depend on the reader software to obey them, so the security they provide is limited.

The standard security provided by PDF consists of two different methods and two different passwords: a user password, which encrypts the file and prevents opening, and an owner password, which specifies operations that should be restricted even when the document is decrypted, which can include modifying, printing, or copying text and graphics out of the document, or adding or modifying text notes and AcroForm fields. The user password encrypts the file, while the owner password does not, instead relying on client software to respect these restrictions. An owner password can easily be removed by software, including some free online services.^[32] Thus, the use restrictions that a document author places on a PDF document are not secure, and cannot be assured once the file is distributed; this warning is displayed when applying such restrictions using Adobe Acrobat software to create or edit PDF files.

Even without removing the password, most freeware or open source PDF readers ignore the permission "protections" and allow the user to print or make copy of excerpts of the text as if the document were not limited by password protection.^[33]^[34]^[35]

Beginning with PDF 1.5, Usage rights (UR) signatures are used to enable additional interactive features that are not available by default in a particular PDF viewer application. The signature is used to validate that the permissions have been granted by a bona fide granting authority. For example, it can be used to allow a user:^[36]

Licensing[edit]

Anyone may create applications that can read and write PDF files without having to pay royalties to Adobe Systems; Adobe holds patents to PDF, but licenses them for royalty-free use in developing software complying with its PDF specification.^[59]

Web page

XSL Formatting Objects

Page margin

PDF portfolio

Hardy, M. R. B.; Brailsford, D. F. (2002). "Mapping and displaying structural transformations between XML and PDF". (PDF). Proceedings of the 2002 ACM symposium on Document engineering. pp. 95–102. doi:10.1145/585058.585077. ISBN 1-58113-594-7. S2CID 9371237.

Proceedings of the 2002 ACM symposium on Document engineering – DocEng '02

PDF 2.0 . International Organization for Standardization. Retrieved December 16, 2020.

"ISO 32000-2:2020(en), Document management — Portable document format — Part 2: PDF 2.0"

PDF 2.0 . International Organization for Standardization. August 3, 2017. Retrieved January 31, 2019.

"ISO 32000-2:2017(en), Document management — Portable document format — Part 2: PDF 2.0"

PDF 1.7 (ISO 32000-1:2008)

and errata to 1.7 at the Wayback Machine (archived March 6, 2022)

PDF 1.7

(ISBN 0-321-30474-8) and errata to 1.6 at the Wayback Machine (archived March 6, 2022)

PDF 1.6

and errata to 1.5 at the Wayback Machine (archived December 22, 2021)

PDF 1.5

(ISBN 0-201-75839-3) and errata to 1.4 at the Wayback Machine (archived March 6, 2022)

PDF 1.4

(ISBN 0-201-61588-6) and errata to 1.3 at the Wayback Machine (archived March 6, 2022)

PDF 1.3

PDF 1.2

(ISBN 0-201-62628-4)

PDF 1.0

PDF Association

PDF Specification Index

at the Wayback Machine (archived 2010-10-07)

Adobe PDF 101: Summary of PDF

at the Wayback Machine (archived 2016-04-13) – Official introductory comparison of PS, EPS vs. PDF.

Adobe: PostScript vs. PDF

at the Wayback Machine (archived 2011-04-24) – Information about PDF/E and PDF/UA specification for accessible documents file format

PDF Standards....transitioning the PDF specification from a de facto standard to a de jure standard

published by the International Organization for Standardization (with corrigenda)

PDF/A-1 ISO standard

at the Wayback Machine (archived 2021-01-16)

PDF Reference and Adobe Extensions to the PDF Specification

– Introduction to PDF vs. PostScript and PDF internals (up to v1.3)

Portable Document Format: An Introduction for Programmers

at the Wayback Machine (archived 2019-04-22) – the paper in which John Warnock outlined the project that created PDF

The Camelot Paper

at the Wayback Machine (archived 2016-01-18) – recording of a talk by Leonard Rosenthol (45 mins) (Adobe Systems) at TUG 2007

PDF

Developed by

Initial release

Extended to

Open format?

Website