Association rule learning

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.^[1] In any given transaction with a variety of items, association rules are meant to discover the rules that determine how or why certain items are connected.

Based on the concept of strong rules, Rakesh Agrawal, Tomasz Imieliński and Arun Swami^[2] introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule $\{\mathrm {onions,potatoes} \}\Rightarrow \{\mathrm {burger} \}$ found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements.

In addition to the above example from market basket analysis, association rules are employed today in many application areas including Web usage mining, intrusion detection, continuous production, and bioinformatics. In contrast with sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions.

The association rule algorithm itself consists of various parameters that can make it difficult for those without some expertise in data mining to execute, with many rules that are arduous to understand.^[3]

All-confidence

[17]

Collective strength

[18]

Leverage

[19]

History[edit]

The concept of association rules was popularized particularly due to the 1993 article of Agrawal et al.,^[2] which has acquired more than 23,790 citations according to Google Scholar, as of April 2021, and is thus one of the most cited papers in the Data Mining field. However, what is now called "association rules" is introduced already in the 1966 paper^[22] on GUHA, a general data mining method developed by Petr Hájek et al.^[23]

An early (circa 1989) use of minimum support and confidence to find all association rules is the Feature Based Modeling framework, which found all rules with $\mathrm {supp} (X)$ and $\mathrm {conf} (X\Rightarrow Y)$ greater than user defined constraints.^[24]

Statistically sound associations[edit]

One limitation of the standard approach to discovering associations is that by searching massive numbers of possible associations to look for collections of items that appear to be associated, there is a large risk of finding many spurious associations. These are collections of items that co-occur with unexpected frequency in the data, but only do so by chance. For example, suppose we are considering a collection of 10,000 items and looking for rules containing two items in the left-hand-side and 1 item in the right-hand-side. There are approximately 1,000,000,000,000 such rules. If we apply a statistical test for independence with a significance level of 0.05 it means there is only a 5% chance of accepting a rule if there is no association. If we assume there are no associations, we should nonetheless expect to find 50,000,000,000 rules. Statistically sound association discovery^[25]^[26] controls this risk, in most cases reducing the risk of finding any spurious associations to a user-specified significance level.

Other types of association rule mining[edit]

Multi-Relation Association Rules (MRAR): These are association rules where each item may have several relations. These relations indicate indirect relationships between the entities. Consider the following MRAR where the first item consists of three relations live in, nearby and humid: “Those who live in a place which is nearby a city with humid climate type and also are younger than 20 $\implies$ their health condition is good”. Such association rules can be extracted from RDBMS data or semantic web data.^[37]

Contrast set learning is a form of associative learning. Contrast set learners use rules that differ meaningfully in their distribution across subsets.^[38]^[39]

Weighted class learning is another form of associative learning where weights may be assigned to classes to give focus to a particular issue of concern for the consumer of the data mining results.

High-order pattern discovery facilitates the capture of high-order (polythetic) patterns or event associations that are intrinsic to complex real-world data. ^[40]

K-optimal pattern discovery provides an alternative to the standard approach to association rule learning which requires that each pattern appear frequently in the data.

Approximate Frequent Itemset mining is a relaxed version of Frequent Itemset mining that allows some of the items in some of the rows to be 0.^[41]

Generalized Association Rules hierarchical taxonomy (concept hierarchy)

Quantitative Association Rules categorical and quantitative data

Interval Data Association Rules e.g. partition the age into 5-year-increment ranged

Sequential pattern mining discovers subsequences that are common to more than minsup (minimum support threshold) sequences in a sequence database, where minsup is set by the user. A sequence is an ordered list of transactions.^[42]

Subspace Clustering, a specific type of clustering high-dimensional data, is in many variants also based on the downward-closure property for specific clustering models.^[43]

Warmr, shipped as part of the ACE data mining suite, allows association rule learning for first order relational rules.^[44]

Sequence mining

Production system (computer science)

Learning classifier system

Rule-based machine learning

Archived 2017-02-19 at the Wayback Machine by M. Hahsler