Recommender system
A recommender system, or a recommendation system (sometimes replacing "system" with terms such as "platform", "engine", or "algorithm"), is a subclass of information filtering system that provides suggestions for items that are most pertinent to a particular user.[1][2] Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.[1][3]
Typically, the suggestions refer to various decision-making processes, such as what product to purchase, what music to listen to, or what online news to read.[1] Recommender systems are used in a variety of areas, with commonly recognised examples taking the form of playlist generators for video and music services, product recommenders for online stores, or content recommenders for social media platforms and open web content recommenders.[4][5] These systems can operate using a single type of input, like music, or multiple inputs within and across platforms like news, books and search queries. There are also popular recommender systems for specific topics like restaurants and online dating. Recommender systems have also been developed to explore research articles and experts,[6] collaborators,[7] and financial services.[8]
Recommender systems usually make use of either or both collaborative filtering and content-based filtering (also known as the personality-based approach), as well as other systems such as knowledge-based systems. Collaborative filtering approaches build a model from a user's past behavior (items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by other users. This model is then used to predict items (or ratings for items) that the user may have an interest in.[9] Content-based filtering approaches utilize a series of discrete, pre-tagged characteristics of an item in order to recommend additional items with similar properties.[10]
We can demonstrate the differences between collaborative and content-based filtering by comparing two early music recommender systems – Last.fm and Pandora Radio.
Each type of system has its strengths and weaknesses. In the above example, Last.fm requires a large amount of information about a user to make accurate recommendations. This is an example of the cold start problem, and is common in collaborative filtering systems.[12][13][14][15][16][17] Whereas Pandora needs very little information to start, it is far more limited in scope (for example, it can only make recommendations that are similar to the original seed).
Recommender systems are a useful alternative to search algorithms since they help users discover items they might not have found otherwise. Of note, recommender systems are often implemented using search engines indexing non-traditional data.
Recommender systems have been the focus of several granted patents.[18][19][20][21][22]
History[edit]
Elaine Rich created the first recommender system in 1979, called Grundy.[23][24] She looked for a way to recommend users books they might like. Her idea was to create a system that asks users specific questions and classifies them into classes of preferences, or "stereotypes", depending on their answers. Depending on users' stereotype membership, they would then get recommendations for books they might like.
Another early recommender system, called a "digital bookshelf", was described in a 1990 technical report by Jussi Karlgren at Columbia University,[25] and implemented at scale and worked through in technical reports and publications from 1994 onwards by Jussi Karlgren, then at SICS,[26][27]
and research groups led by Pattie Maes at MIT,[28] Will Hill at Bellcore,[29] and Paul Resnick, also at MIT[30][3]
whose work with GroupLens was awarded the 2010 ACM Software Systems Award.
Montaner provided the first overview of recommender systems from an intelligent agent perspective.[31] Adomavicius provided a new, alternate overview of recommender systems.[32] Herlocker provides an additional overview of evaluation techniques for recommender systems,[33] and Beel et al. discussed the problems of offline evaluations.[34] Beel et al. have also provided literature surveys on available research paper recommender systems and existing challenges.[35][36]
Technologies[edit]
Session-based recommender systems[edit]
These recommender systems use the interactions of a user within a session[55] to generate recommendations. Session-based recommender systems are used at YouTube[56] and Amazon.[57] These are particularly useful when history (such as past clicks, purchases) of a user is not available or not relevant in the current user session. Domains, where session-based recommendations are particularly relevant, include video, e-commerce, travel, music and more. Most instances of session-based recommender systems rely on the sequence of recent interactions within a session without requiring any additional details (historical, demographic) of the user. Techniques for session-based recommendations are mainly based on generative sequential models such as Recurrent Neural Networks,[55][58] Transformers,[59] and other deep learning based approaches[60][61]
Reinforcement learning for recommender systems[edit]
The recommendation problem can be seen as a special instance of a reinforcement learning problem whereby the user is the environment upon which the agent, the recommendation system acts upon in order to receive a reward, for instance, a click or engagement by the user.[56][62][63] One aspect of reinforcement learning that is of particular use in the area of recommender systems is the fact that the models or policies can be learned by providing a reward to the recommendation agent. This is in contrast to traditional learning techniques which rely on supervised learning approaches that are less flexible, reinforcement learning recommendation techniques allow to potentially train models that can be optimized directly on metrics of engagement, and user interest.[64]
Multi-criteria recommender systems[edit]
Multi-criteria recommender systems (MCRS) can be defined as recommender systems that incorporate preference information upon multiple criteria. Instead of developing recommendation techniques based on a single criterion value, the overall preference of user u for the item i, these systems try to predict a rating for unexplored items of u by exploiting preference information on multiple criteria that affect this overall preference value. Several researchers approach MCRS as a multi-criteria decision making (MCDM) problem, and apply MCDM methods and techniques to implement MCRS systems.[65] See this chapter[66] for an extended introduction.
Risk-aware recommender systems[edit]
The majority of existing approaches to recommender systems focus on recommending the most relevant content to users using contextual information, yet do not take into account the risk of disturbing the user with unwanted notifications. It is important to consider the risk of upsetting the user by pushing recommendations in certain circumstances, for instance, during a professional meeting, early morning, or late at night. Therefore, the performance of the recommender system depends in part on the degree to which it has incorporated the risk into the recommendation process. One option to manage this issue is DRARS, a system which models the context-aware recommendation as a bandit problem. This system combines a content-based technique and a contextual bandit algorithm.[67]
Evaluation[edit]
Performance measures[edit]
Evaluation is important in assessing the effectiveness of recommendation algorithms. To measure the effectiveness of recommender systems, and compare different approaches, three types of evaluations are available: user studies, online evaluations (A/B tests), and offline evaluations.[34]
The commonly used metrics are the mean squared error and root mean squared error, the latter having been used in the Netflix Prize. The information retrieval metrics such as precision and recall or DCG are useful to assess the quality of a recommendation method. Diversity, novelty, and coverage are also considered as important aspects in evaluation.[76] However, many of the classic evaluation measures are highly criticized.[77]
Evaluating the performance of a recommendation algorithm on a fixed test dataset will always be extremely challenging as it is impossible to accurately predict the reactions of real users to the recommendations. Hence any metric that computes the effectiveness of an algorithm in offline data will be imprecise.
User studies are rather a small scale. A few dozens or hundreds of users are presented recommendations created by different recommendation approaches, and then the users judge which recommendations are best.
In A/B tests, recommendations are shown to typically thousands of users of a real product, and the recommender system randomly picks at least two different recommendation approaches to generate recommendations. The effectiveness is measured with implicit measures of effectiveness such as conversion rate or click-through rate.
Offline evaluations are based on historic data, e.g. a dataset that contains information about how users previously rated movies.[78]
The effectiveness of recommendation approaches is then measured based on how well a recommendation approach can predict the users' ratings in the dataset. While a rating is an explicit expression of whether a user liked a movie, such information is not available in all domains. For instance, in the domain of citation recommender systems, users typically do not rate a citation or recommended article. In such cases, offline evaluations may use implicit measures of effectiveness. For instance, it may be assumed that a recommender system is effective that is able to recommend as many articles as possible that are contained in a research article's reference list. However, this kind of offline evaluations is seen critical by many researchers.[79][80][81][34] For instance, it has been shown that results of offline evaluations have low correlation with results from user studies or A/B tests.[81][82] A dataset popular for offline evaluation has been shown to contain duplicate data and thus to lead to wrong conclusions in the evaluation of algorithms.[83] Often, results of so-called offline evaluations do not correlate with actually assessed user-satisfaction.[84] This is probably because offline training is highly biased toward the highly reachable items, and offline testing data is highly influenced by the outputs of the online recommendation module.[79][85] Researchers have concluded that the results of offline evaluations should be viewed critically.[86]
Beyond accuracy[edit]
Typically, research on recommender systems is concerned with finding the most accurate recommendation algorithms. However, there are a number of factors that are also important.