User Tools

Site Tools


sequential_pattern_mining

Sequential Pattern Mining

Description

Sequential pattern mining is a popular pattern mining task. It is used to find subsequences that appear frequently appear in a set of sequences. Sequential pattern mining is used to analyze data that is represented as sequences of symbols.

An Example

The input data for sequential pattern mining is a sequence database. It is a set of sequences. Each sequence is an ordered list of itemsets. And an itemset is a set of symbols. For example, consider the sequence database below, which represents shopping data. This database contains four sequences (rows) called S1, S2, S3 and S4.

Sequence items
S1(a), (a b c), (a c), (d), (c f)
S2(a d), ( c ) , (b c), (a e)
S3(e f), (a b), (d f), ( c ), (b)
S4(e), (g), (a f), ( c ), (b), ( c )

The first sequence indicates that a customer has purchased some item (a). Then the customer has bought some items a, b and c together. Then the customer, has bought items a and c together. Then, he bought item d, and finally the customer bought items c and f together.

The goal of sequential pattern mining is to find all the frequent sequential patterns. A frequent sequential pattern is a subsequence that appears in many input sequences. To find frequent sequential patterns, a user must have a sequence database (as the one presented above), and must also choose a value for a parameter called minsup , which means minimum support threshold.

The output of frequent sequential pattern mining is all subsequences that appear at least minsup sequences of the database.

For example, for the above database and minsup = 2, the three following subsequences are frequent sequential patterns:

(b c) (a) because it appears in at least two sequences
(f) (b) because it appears in at least two sequences
(f) (b) ( c ) because it appears in at least two sequences

There are also other frequent sequential patterns, but here only a few are listed for brevity.

Applications

There are many applications. A few examples are:

  • Find the frequent sequences of purchases made by customers in a store (each sequence is a list of purchased items by a customer).
  • Find the frequent sequence of locations visited by tourists in a city (each sequence is a list of locations visited by a tourist).
  • Find the frequent sequences of words that appear in a book (each sentence is a sequence of words)

Algorithms

Numerous algorithms have been designed for sequential pattern mining. But most of them are inspired by one of the following classic algorithms: SPAM, SPADE, PrefixSpan, GSP, CM-SPAM and CM-SPADE.

  • GSP: R. Agrawal, and R. Srikant, Mining sequential patterns, ICDE 1995, pp. 3–14, 1995.
  • SPAM: Ayres, J. Flannick, J. Gehrke, and T. Yiu, Sequential pattern mining using a bitmap representation, KDD 2002, pp. 429–435, 2002.
  • SPADE: M. J. Zaki, SPADE: An efficient algorithm for mining frequent sequences, Machine learning, vol. 42(1-2), pp. 31–60, 2001.
  • PrefixSpan: J. Pei, et al. Mining sequential patterns by pattern-growth: The prefixspan approach, IEEE Transactions on knowledge and data engineering, vol. 16(11), pp. 1424–1440, 2004.
  • CM-SPAM and CM-SPADE: P. Fournier-Viger, A. Gomariz, M. Campos, and R. Thomas, Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information, PAKDD 2014, pp. 40–52, 2014.

Survey papers

Here are a few survey papers that gives an overview of itemset mining.

  • Fournier-Viger, P., Lin, J. C.-W., Kiran, R. U., Koh, Y. S., Thomas, R. (2017). A Survey of Sequential Pattern Mining. Data Science and Pattern Recognition (DSPR), vol. 1(1), pp. 54-77.
  • Gan, W., Lin, J. C. W., Fournier-Viger, P., Chao, H.-C., Yu, P. S. (2019) A Survey of Parallel Sequential Pattern Mining. ACM Transactions on Knowledge Discovery from Data (TKDD), 13(3): 25:1-25:34.

Tutorial videos

Software and datasets

To apply frequent itemset mining, the SPMF software provides open-source efficient implementations of many algorithms and variations. The SPMF software can be downloaded from the website: http://www.philippe-fournier-viger.com/spmf/ .

To install the software, you may follow the instructions on the download page of that website. Then, you may check the documentation page which provides examples of how to run various algorithms such as GSP, PrefixSpan, CM-SPAM, CM-SPADE and others for sequential pattern mining. Besides, you may check the datasets page of that website provides several benchmark datasets for testing the algorithms and comparing their performance.

sequential_pattern_mining.txt · Last modified: 2022/02/11 07:33 by philfv