Data and information visualization
Data and information visualization (data viz/vis or info viz/vis)[2] is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount[3] of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items. Typically based on data and information collected from a certain domain of expertise, these visualizations are intended for a broader audience to help them visually explore and discover, quickly understand, interpret and gain important insights into otherwise difficult-to-identify structures, relationships, correlations, local and global patterns, trends, variations, constancy, clusters, outliers and unusual groupings within data (exploratory visualization).[4][5][6] When intended for the general public (mass communication) to convey a concise version of known, specific information in a clear and engaging manner (presentational or explanatory visualization),[4] it is typically called information graphics.
Data visualization is concerned with visually presenting sets of primarily quantitative raw data in a schematic form. The visual formats used in data visualization include tables, charts and graphs (e.g. pie charts, bar charts, line charts, area charts, cone charts, pyramid charts, donut charts, histograms, spectrograms, cohort charts, waterfall charts, funnel charts, bullet graphs, etc.), diagrams, plots (e.g. scatter plots, distribution plots, box-and-whisker plots), geospatial maps (such as proportional symbol maps, choropleth maps, isopleth maps and heat maps), figures, correlation matrices, percentage gauges, etc., which sometimes can be combined in a dashboard.
Information visualization, on the other hand, deals with multiple, large-scale and complicated datasets which contain quantitative (numerical) data as well as qualitative (non-numerical, i.e. verbal or graphical) and primarily abstract information and its goal is to add value to raw data, improve the viewers' comprehension, reinforce their cognition and help them derive insights and make decisions as they navigate and interact with the computer-supported graphical display. Visual tools used in information visualization include maps (such as tree maps), animations, infographics, Sankey diagrams, flow charts, network diagrams, semantic networks, entity-relationship diagrams, venn diagrams, timelines, mind maps, etc.
Emerging technologies like virtual, augmented and mixed reality have the potential to make information visualization more immersive, intuitive, interactive and easily manipulable and thus enhance the user's visual perception and cognition.[7] In data and information visualization, the goal is to graphically present and explore abstract, non-physical and non-spatial data collected from databases, information systems, file systems, documents, business and financial data, etc. (presentational and exploratory visualization) which is different from the field of scientific visualization, where the goal is to render realistic images based on physical and spatial scientific data to confirm or reject hypotheses (confirmatory visualization).[8]
Effective data visualization is properly sourced, contextualized, simple and uncluttered. The underlying data is accurate and up-to-date to make sure that insights are reliable. Graphical items are well-chosen for the given datasets and aesthetically appealing, with shapes, colors and other visual elements used deliberately in a meaningful and non-distracting manner. The visuals are accompanied by supporting texts (labels and titles). These verbal and graphical components complement each other to ensure clear, quick and memorable understanding. Effective information visualization is aware of the needs and concerns and the level of expertise of the target audience, deliberately guiding them to the intended conclusion.[9][3] Such effective visualization can be used not only for conveying specialized, complex, big data-driven ideas to a wider group of non-technical audience in a visually appealing, engaging and accessible manner, but also to domain experts and executives for making decisions, monitoring performance, generating new ideas and stimulating research.[9][4] In addition, data scientists, data analysts and data mining specialists use data visualization to check the quality of data, find errors, unusual gaps and missing values in data, clean data, explore the structures and features of data and assess outputs of data-driven models.[4] In business, data and information visualization can constitute a part of data storytelling, where they are paired with a coherent narrative structure or storyline to contextualize the analyzed data and communicate the insights gained from analyzing the data clearly and memorably with the goal of convincing the audience into making a decision or taking an action in order to create business value.[3][10] This can be contrasted with the field of statistical graphics, where complex statistical data are communicated graphically in an accurate and precise manner among researchers and analysts with statistical expertise to help them perform exploratory data analysis or to convey the results of such analyses, where visual appeal, capturing attention to a certain issue and storytelling are not as important.[11]
The field of data and information visualization is of interdisciplinary nature as it incorporates principles found in the disciplines of descriptive statistics (as early as the 18th century),[12] visual communication, graphic design, cognitive science and, more recently, interactive computer graphics and human-computer interaction.[13] Since effective visualization requires design skills, statistical skills and computing skills, it is argued by authors such as Gershon and Page that it is both an art and a science.[14] The neighboring field of visual analytics marries statistical data analysis, data and information visualization and human analytical reasoning through interactive visual interfaces to help human users reach conclusions, gain actionable insights and make informed decisions which are otherwise difficult for computers to do.
Research into how people read and misread various types of visualizations is helping to determine what types and features of visualizations are most understandable and effective in conveying information.[15][16] On the other hand, unintentionally poor or intentionally misleading and deceptive visualizations (misinformative visualization) can function as powerful tools which disseminate misinformation, manipulate public perception and divert public opinion toward a certain agenda.[17] Thus data visualization literacy has become an important component of data and information literacy in the information age akin to the roles played by textual, mathematical and visual literacy in the past.[18]
Data visualization involves specific terminology, some of which is derived from statistics. For example, author Stephen Few defines two types of data, which are used in combination to support a meaningful analysis or visualization:
The distinction between quantitative and categorical variables is important because the two types require different methods of visualization.
Two primary types of information displays are tables and graphs.
Eppler and Lengler have developed the "Periodic Table of Visualization Methods," an interactive chart displaying various data visualization methods. It includes six types of data visualization methods: data, information, concept, strategy, metaphor and compound.[52] In "Visualization Analysis and Design" Tamara Munzner writes "Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively." Munzner agues that visualization "is suitable when there is a need to augment human capabilities rather than replace people with computational decision-making methods."[53]
Interactive data visualization enables direct actions on a graphical plot to change elements and link between multiple plots.[56]
Interactive data visualization has been a pursuit of statisticians since the late 1960s. Examples of the developments can be found on the American Statistical Association video lending library.[57]
Common interactions include:
There are different approaches on the scope of data visualization. One common focus is on information presentation, such as Friedman (2008). Friendly (2008) presumes two main parts of data visualization: statistical graphics, and thematic cartography.[58] In this line the "Data Visualization: Modern Approaches" (2007) article gives an overview of seven subjects of data visualization:[59]
All these subjects are closely related to graphic design and information representation.
On the other hand, from a computer science perspective, Frits H. Post in 2002 categorized the field into sub-fields:[26][60]
Within The Harvard Business Review, Scott Berinato developed a framework to approach data visualisation.[61] To start thinking visually, users must consider two questions; 1) What you have and 2) what you're doing. The first step is identifying what data you want visualised. It is data-driven like profit over the past ten years or a conceptual idea like how a specific organisation is structured. Once this question is answered one can then focus on whether they are trying to communicate information (declarative visualisation) or trying to figure something out (exploratory visualisation). Scott Berinato combines these questions to give four types of visual communication that each have their own goals.[61]
These four types of visual communication are as follows;
Data and information visualization insights are being applied in areas such as:[19]
Notable academic and industry laboratories in the field are:
Conferences in this field, ranked by significance in data visualization research,[63] are:
For further examples, see: Category:Computer graphics organizations
There are numerous tools available for data visualization, each with its own strengths and applications. Some of the most widely used tools include:
These tools vary in their complexity, cost, and the level of customization they offer, catering to different needs from simple charting to complex interactive visualizations.