User Tools

Site Tools


itemset_mining

Itemset Mining

Description

Frequent itemset mining is a popular pattern mining task. It is used to find values that frequently appear together in data. Frequent itemset mining is used to analyze data that has the form of a table described using binary attributes.

An example

A typical example of such data is a transaction database, which lists the items (products) purchased by one or more customers. For example, the table below shows a small transaction database containing four transactions. The first transaction indicates that a customer has purchased some items pasta, lemon, bread, and orange together.

Transaction items appearing in the transaction
T1{pasta, lemon, bread, orange}
T2{pasta, lemon}
T3{pasta, orange, cake}
T4{pasta, lemon, orange, cake}

Each transaction has a name called its transaction identifier. For example, the transaction identifiers of the four transactions depicted above are T1, T2, T3 and T4, respectively.

The goal of frequent itemset mining is to find all the frequent itemsets. A frequent itemset is a set of values that appears many times together. To find frequent itemsets, a user must have a transaction database (as the one presented above), and must also choose a value for a parameter called minsup , which means minimum support threshold.

The output of frequent itemset mining is all sets of values (items) that appear at least minsup times in the database.

For example, for the above database and minsup = 2, the following frequent itemsets are found:

{lemon}, {pasta}, {orange}, {cake}, {lemon, pasta}, {lemon, orange}, {pasta, orange}, {pasta, cake}, {orange, cake}, {lemon, pasta, orange}

For example, the itemset {pasta, orange} is said to be a frequent itemset because it appears in at least two transactions. In fact, it appears in the transactions T1, T3 and T4.

Applications

There are many applications. A few examples are:

  • Finding the products purchased together by several customers in a store (each transaction indicates a set of products purchased by a customer).
  • Finding the courses that students often select together in a university (each course is an item, and the course selection of each student is a transaction).
  • Finding the words that frequently appear together in the same sentence of a text (each sentence is a transaction and each word is an item).

Algorithms

Numerous algorithms have been designed for frequent itemset mining. But most of them are inspired by one of the following classic algorithms: Apriori, Eclat, LCM and FPGrowth.

Survey papers

Here are a few survey papers that gives an overview of itemset mining.

  • Fournier-Viger, P., Lin, J. C.-W., Vo, B, Chi, T.T., Zhang, J., Le, H. B. (2017). A Survey of Itemset Mining. WIREs Data Mining and Knowledge Discovery, Wiley, e1207 doi: 10.1002/widm.1207, 18 pages.
  • Luna, J. M., Fournier-Viger, P., Ventura, S. (2019). Frequent Itemset Mining: a 25 Years Review. WIREs Data Mining and Knowledge Discovery, Wiley, 9(6):e1329.

Key papers

  • R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. Research Report RJ 9839, IBM Almaden Research Center, San Jose, California, June 1994. The problem of frequent itemset mining was first introduced by Agrawal & Srkikant in 1993. That paper also presented the famous Apriori algorithm. Note that in this paper “frequent itemsets” are called “large itemsets”. Because this is not a very good name, researchers are nowadays using the term “frequent itemsets”.
  • .Han, J, Pei, J, Ying, Y, Mao, R. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov., 2004, 8(1):53–87. This paper introduced the popular FP-Growth algorithm for frequent itemset mining, which is very efficient and has been extended in numerous papers. It proposed the concept of pattern-growth algorithm to avoid generating candidates that do not exist in a database.
  • Zaki, MJ. Scalable Algorithms for Association Mining. IEEE Trans. Knowl. Data Eng., 2000, 12(3):372–390, 2000. This paper presented the Eclat algorithm, a fast algorithm that is also easy to implement. It has been also extended in many papers.
  • Uno, T, Kiyomi, M, Arimura, H. LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. Proc. ICDM’04 Workshop on Frequent Itemset Mining Implementations, CEUR, 2004. This paper introduced the LCM algorithm, which has won some competitions for the most efficient itemset mining algorithm at FIMI 2004/2005. It introduced some interesting ideas, and there has been several versions of LCM.

Tutorial videos

Software and datasets

To apply frequent itemset mining, the SPMF software provides open-source efficient implementations of many algorithms and variations. The SPMF software can be downloaded from the website: http://www.philippe-fournier-viger.com/spmf/ .

To install the software, you may follow the instructions on the download page of that website. Then, you may check the documentation page which provides examples of how to run various algorithms such as Apriori, Eclat, LCM and FPGrowth for frequent itemset mining. Besides, you may check the datasetspage of that website provides several benchmark datasets for testing the algorithms and comparing their performance.

  • Periodic pattern mining: This task can be viewed as an extension of itemset mining where transactions are sequentially ordered (e.g. by time). Then the goal is to find patterns that are not only frequent but also repeating regularly over time. For example, a periodic pattern may be that a person buys wine with cheese more or less every week.
itemset_mining.txt · Last modified: 2021/06/29 22:42 by philfv