Table of Contents

Itemset Mining

Description

Frequent itemset mining is a popular pattern mining task. It is used to find values that frequently appear together in data. Frequent itemset mining is used to analyze data that has the form of a table described using binary attributes.

An example

A typical example of such data is a transaction database, which lists the items (products) purchased by one or more customers. For example, the table below shows a small transaction database containing four transactions. The first transaction indicates that a customer has purchased some items pasta, lemon, bread, and orange together.

Transaction items appearing in the transaction
T1{pasta, lemon, bread, orange}
T2{pasta, lemon}
T3{pasta, orange, cake}
T4{pasta, lemon, orange, cake}

Each transaction has a name called its transaction identifier. For example, the transaction identifiers of the four transactions depicted above are T1, T2, T3 and T4, respectively.

The goal of frequent itemset mining is to find all the frequent itemsets. A frequent itemset is a set of values that appears many times together. To find frequent itemsets, a user must have a transaction database (as the one presented above), and must also choose a value for a parameter called minsup , which means minimum support threshold.

The output of frequent itemset mining is all sets of values (items) that appear at least minsup times in the database.

For example, for the above database and minsup = 2, the following frequent itemsets are found:

{lemon}, {pasta}, {orange}, {cake}, {lemon, pasta}, {lemon, orange}, {pasta, orange}, {pasta, cake}, {orange, cake}, {lemon, pasta, orange}

For example, the itemset {pasta, orange} is said to be a frequent itemset because it appears in at least two transactions. In fact, it appears in the transactions T1, T3 and T4.

Applications

There are many applications. A few examples are:

Algorithms

Numerous algorithms have been designed for frequent itemset mining. But most of them are inspired by one of the following classic algorithms: Apriori, Eclat, LCM and FPGrowth.

Survey papers

Here are a few survey papers that gives an overview of itemset mining.

Key papers

Tutorial videos

Software and datasets

To apply frequent itemset mining, the SPMF software provides open-source efficient implementations of many algorithms and variations. The SPMF software can be downloaded from the website: http://www.philippe-fournier-viger.com/spmf/ .

To install the software, you may follow the instructions on the download page of that website. Then, you may check the documentation page which provides examples of how to run various algorithms such as Apriori, Eclat, LCM and FPGrowth for frequent itemset mining. Besides, you may check the datasetspage of that website provides several benchmark datasets for testing the algorithms and comparing their performance.