Filename extension
A filename extension, file name extension or file extension is a suffix to the name of a computer file (for example, .txt
, .docx
, .md
). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically delimited from the rest of the filename with a full stop (period), but in some systems[1] it is separated with spaces.
Some file systems implement filename extensions as a feature of the file system itself and may limit the length and format of the extension, while others treat filename extensions as part of the filename without special distinction.
Operating system and file system support[edit]
The Multics file system stores the file name as a single string, not split into base name and extension components, allowing the "." to be just another character allowed in file names. It allows for variable-length filenames, permitting more than one dot, and hence multiple suffixes, as well as no dot, and hence no suffix. Some components of Multics, and applications running on it, use suffixes to indicate file types, but not all files are required to have a suffix — for example, executables and ordinary text files usually have no suffixes in their names.
File systems for UNIX-like operating systems also store the file name as a single string, with "." as just another character in the file name. A file with more than one suffix is sometimes said to have more than one extension, although terminology varies in this regard, and most authors define extension in a way that does not allow more than one in the same file name. More than one extension usually represents nested transformations, such as files.tar.gz
(the .tar
indicates that the file is a tar archive of one or more files, and the .gz
indicates that the tar archive file is compressed with gzip). Programs transforming or creating files may add the appropriate extension to names inferred from input file names (unless explicitly given an output file name), but programs reading files usually ignore the information; it is mostly intended for the human user.
It is more common, especially in binary files, for the file to contain internal or external metadata describing its contents.
This model generally requires the full filename to be provided in commands, whereas the metadata approach often allows the extension to be omitted.
In DOS and 16-bit Windows, file names have a maximum of 8 characters, a period, and an extension of up to three letters. The FAT file system for DOS and Windows stores file names as an 8-character name and a three-character extension. The period character is not stored.
The High Performance File System (HPFS), used in Microsoft and IBM's OS/2 stores the file name as a single string, with the "." character as just another character in the file name. The convention of using suffixes continued, even though HPFS supports extended attributes for files, allowing a file's type to be stored in the file as an extended attribute.
Microsoft's Windows NT's native file system, NTFS, and the later ReFS, also store the file name as a single string; again, the convention of using suffixes to simulate extensions continued, for compatibility with existing versions of Windows. In Windows NT 3.5, a variant of the FAT file system, called VFAT appeared; it supports longer file names, with the file name being treated as a single string.
Windows 95, with VFAT, introduced support for long file names, and removed the 8.3 name/extension split in file names from non-NT Windows.
The classic Mac OS disposed of filename-based extension metadata entirely; it used, instead, a distinct file type code to identify the file format. Additionally, a creator code was specified to determine which application would be launched when the file's icon was double-clicked.[2] macOS, however, uses filename suffixes as a consequence of being derived from the UNIX-like NeXTSTEP operating system, in addition to using type and creator codes.
In Commodore systems, files can only have four extensions: PRG, SEQ, USR, REL. However, these are used to separate data types used by a program and are irrelevant for identifying their contents.
With the advent of graphical user interfaces, the issue of file management and interface behavior arose. Microsoft Windows allowed multiple applications to be associated with a given extension, and different actions were available for selecting the required application, such as a context menu offering a choice between viewing, editing or printing the file. The assumption was still that any extension represented a single file type; there was an unambiguous mapping between extension and icon.
When the Internet age first arrived, those using Windows systems that were still restricted to 8.3 filename formats had to create web pages with names ending in .HTM
, while those using Macintosh or UNIX computers could use the recommended .html
filename extension. This also became a problem for programmers experimenting with the Java programming language, since it requires the four-letter suffix .java
for source code files and the five-letter suffix .class
for Java compiler object code output files.[3]
Security issues[edit]
The default behavior of File Explorer, the file browser provided with Microsoft Windows, is for filename extensions to not be displayed. Malicious users have tried to spread computer viruses and computer worms by using file names formed like LOVE-LETTER-FOR-YOU.TXT.vbs
. The hope is that this will appear as LOVE-LETTER-FOR-YOU.TXT
, a harmless text file, without alerting the user to the fact that it is a harmful computer program, in this case, written in VBScript. Default behavior for ReactOS is to display filename extensions in ReactOS Explorer.
Later Windows versions (starting with Windows XP Service Pack 2 and Windows Server 2003) included customizable lists of filename extensions that should be considered "dangerous" in certain "zones" of operation, such as when downloaded from the web or received as an e-mail attachment. Modern antivirus software systems also help to defend users against such attempted attacks where possible.
Some viruses take advantage of the similarity between the ".com" top-level domain and the ".COM" filename extension by emailing malicious, executable command-file attachments under names superficially similar to URLs (e.g., "myparty.yahoo.com"), with the effect that unaware users click on email-embedded links that they think lead to websites but actually download and execute the malicious attachments.
There have been instances of malware crafted to exploit vulnerabilities in some Windows applications which could cause a stack-based buffer overflow when opening a file with an overly long, unhandled filename extension.
The filename extension is just a marker and the content of the file does not have to match it.[9] This can be used to disguise malicious content. When trying to identify a file for security reasons, it is therefore considered dangerous to rely on the extension alone and a proper analysis of the content of the file is preferred. For example, on UNIX-like systems, it is not uncommon to find files with no extensions at all, as commands such as file
are meant to be used instead, and will read the file's header to determine its content.