Katana VentraIP

Content analysis

Content analysis is the study of documents and communication artifacts, which might be texts of various formats, pictures, audio or video. Social scientists use content analysis to examine patterns in communication in a replicable and systematic manner.[1] One of the key advantages of using content analysis to analyse social phenomena is their non-invasive nature, in contrast to simulating social experiences or collecting survey answers.

Practices and philosophies of content analysis vary between academic disciplines. They all involve systematic reading or observation of texts or artifacts which are assigned labels (sometimes called codes) to indicate the presence of interesting, meaningful pieces of content.[2][3] By systematically labeling the content of a set of texts, researchers can analyse patterns of content quantitatively using statistical methods, or use qualitative methods to analyse meanings of content within texts.


Computers are increasingly used in content analysis to automate the labeling (or coding) of documents. Simple computational techniques can provide descriptive data such as word frequencies and document lengths. Machine learning classifiers can greatly increase the number of texts that can be labeled, but the scientific utility of doing so is a matter of debate. Further, numerous computer-aided text analysis (CATA) computer programs are available that analyze text for predetermined linguistic, semantic, and psychological characteristics.[4]

Qualitative and quantitative content analysis[edit]

Quantitative content analysis highlights frequency counts and statistical analysis of these coded frequencies.[7] Additionally, quantitative content analysis begins with a framed hypothesis with coding decided on before the analysis begins. These coding categories are strictly relevant to the researcher's hypothesis. Quantitative analysis also takes a deductive approach.[8] Examples of content-analytical variables and constructs can be found, for example, in the open-access database DOCA. This database compiles, systematizes, and evaluates relevant content-analytical variables of communication and political science research areas and topics.


Siegfried Kracauer provides a critique of quantitative analysis, asserting that it oversimplifies complex communications in order to be more reliable. On the other hand, qualitative analysis deals with the intricacies of latent interpretations, whereas quantitative has a focus on manifest meanings. He also acknowledges an "overlap" of qualitative and quantitative content analysis.[7] Patterns are looked at more closely in qualitative analysis, and based on the latent meanings that the researcher may find, the course of the research could be changed. It is inductive and begins with open research questions, as opposed to a hypothesis.[8]

Codebooks[edit]

The data collection instrument used in content analysis is the codebook or coding scheme. In qualitative content analysis the codebook is constructed and improved during coding, while in quantitative content analysis the codebook needs to be developed and pretested for reliability and validity before coding.[4] The codebook includes detailed instructions for human coders plus clear definitions of the respective concepts or variables to be coded plus the assigned values.


According to current standards of good scientific practice, each content analysis study should provide their codebook in the appendix or as supplementary material so that reproducibility of the study is ensured. On the Open Science Framework (OSF) server of the Center for Open Science a lot of codebooks of content analysis studies are freely available via search for "codebook".


Furthermore, the Database of Variables for Content Analysis (DOCA) provides an open access archive of pretested variables and established codebooks for content analyses.[9] Measures from the archive can be adopted in future studies to ensure the use of high-quality and comparable instruments. DOCA covers, among others, measures for the content analysis of fictional media and entertainment (e.g., measures for sexualization in video games[10]), of user-generated media content (e.g., measures for online hate speech[11]), and of news media and journalism (e.g., measures for stock photo use in press reporting on child sexual abuse,[12] and measures of personalization in election campaign coverage[13]).

Computational tools[edit]

With the rise of common computing facilities like PCs, computer-based methods of analysis are growing in popularity.[14][15][16] Answers to open ended questions, newspaper articles, political party manifestos, medical records or systematic observations in experiments can all be subject to systematic analysis of textual data.


By having contents of communication available in form of machine readable texts, the input is analyzed for frequencies and coded into categories for building up inferences.


Computer-assisted analysis can help with large, electronic data sets by cutting out time and eliminating the need for multiple human coders to establish inter-coder reliability. However, human coders can still be employed for content analysis, as they are often more able to pick out nuanced and latent meanings in text. A study found that human coders were able to evaluate a broader range and make inferences based on latent meanings.[17]

make about the antecedents of a communication

inferences

describe and make inferences about characteristics of a communication

make inferences about the of a communication.

effects

Holsti groups fifteen uses of content analysis into three basic categories:[28]


He also places these uses into the context of the basic communication paradigm.


The following table shows fifteen uses of content analysis in terms of their general purpose, element of the communication paradigm to which they apply, and the general question they are intended to answer.


As a counterpoint, there are limits to the scope of use for the procedures that characterize content analysis. In particular, if access to the goal of analysis can be obtained by direct means without material interference, then direct measurement techniques yield better data.[30] Thus, while content analysis attempts to quantifiably describe communications whose features are primarily categorical——limited usually to a nominal or ordinal scale——via selected conceptual units (the unitization) which are assigned values (the categorization) for enumeration while monitoring intercoder reliability, if instead the target quantity manifestly is already directly measurable——typically on an interval or ratio scale——especially a continuous physical quantity, then such targets usually are not listed among those needing the "subjective" selections and formulations of content analysis.[31][32][33][34][35][36][20][37] For example (from mixed research and clinical application), as medical images communicate diagnostic features to physicians, neuroimaging's stroke (infarct) volume scale called ASPECTS is unitized as 10 qualitatively delineated (unequal) brain regions in the middle cerebral artery territory, which it categorizes as being at least partly versus not at all infarcted in order to enumerate the latter, with published series often assessing intercoder reliability by Cohen's kappa. The foregoing italicized operations impose the uncredited form of content analysis onto an estimation of infarct extent, which instead is easily enough and more accurately measured as a volume directly on the images.[38][39] ("Accuracy ... is the highest form of reliability."[40]) The concomitant clinical assessment, however, by the National Institutes of Health Stroke Scale (NIHSS) or the modified Rankin Scale (mRS), retains the necessary form of content analysis. Recognizing potential limits of content analysis across the contents of language and images alike, Klaus Krippendorff affirms that "comprehen[sion] ... may ... not conform at all to the process of classification and/or counting by which most content analyses proceed,"[41] suggesting that content analysis might materially distort a message.

Donald Wayne Foster

Hermeneutics

Text mining

The Polish Peasant in Europe and America

Transition words

Video content analysis

Grounded theory

Graneheim, Ulla Hällgren; Lundman, Berit (2004). "Qualitative content analysis in nursing research: concepts, procedures and measures to achieve trustworthiness". Nurse Education Today. 24 (2): 105–112. :10.1016/j.nedt.2003.10.001. PMID 14769454. S2CID 17354453.

doi

Budge, Ian, ed. (2001). Mapping Policy Preferences. Estimates for Parties, Electors and Governments 1945-1998. Oxford, UK: Oxford University Press.

Krippendorff, Klaus; Bock, Mary Angela, eds. (2008). The Content Analysis Reader. Thousand Oaks, CA: Sage.

Neuendorf, Kimberly (2017). The Content Analysis Guidebook (2nd ed.). Thousand Oaks, CA: Sage.

Roberts, Carl, ed. (1997). Text Analysis for the Social Sciences: Methods for Drawing Inferences from Texts and Transcripts. Mahwah, NJ: Lawrence Erlbaum.

Wimmer, Roger; Dominick, Joseph (2005). Mass Media Research: An Introduction (8th ed.). Belmont, CA: Wadsworth.