====== Sequential Pattern Mining ======

===== Description =====

**Sequential pattern mining** is a popular [[:pattern_mining|pattern mining]] task. It is used to find subsequences that appear frequently appear in a set of sequences. **Sequential pattern mining** is used to analyze data that is represented as sequences of symbols.
===== An Example =====

The input data for sequential pattern mining is a **sequence database. **It is a set of sequences. Each **sequence **is an ordered list of** itemsets. **And an **itemset** is a set of symbols. For example, consider the sequence database below, which represents shopping data. This database contains four sequences (rows) called S1, S2, S3 and S4.

|**Sequence** |**items ** |
|S1|(a), (a b c), (a c), (d), (c f)|
|S2|(a d), ( c ) , (b c), (a e)|
|S3|(e f), (a b), (d f), ( c ), (b)|
|S4|(e), (g), (a f), ( c ), (b), ( c )|

The first sequence indicates that a customer has purchased some item (a). Then the customer has bought some items a, b and c together. Then the customer, has bought items a and c together. Then, he bought item d, and finally the customer bought items c and f together.

The goal of sequential pattern mining is to find all the **frequent sequential patterns. A frequent sequential pattern **is a subsequence that appears in many input sequences. To find frequent sequential patterns, a user must have a **sequence database** (as the one presented above), and must also choose a value for a parameter called** //minsup//**  , which means **minimum support threshold.**

The **output of frequent sequential pattern mining **is all subsequences that appear at least //minsup// sequences of the database.

For example, for the above database and //minsup// = 2, the three following subsequences are frequent sequential patterns:

(b c) (a) because it appears in at least two sequences\\
(f) (b)  because it appears in at least two sequences\\
(f) (b) ( c ) because it appears in at least two sequences

There are also other frequent sequential patterns, but here only a few are listed for brevity.

===== Applications =====

There are many applications. A few examples are:

  * Find the frequent sequences of purchases made by customers in a store (each sequence is a list of purchased items by a customer).
  * Find the frequent sequence of locations visited  by tourists in a city (each sequence is a list of locations visited by a tourist).
  * Find the frequent sequences of words that appear in a book (each sentence is a sequence of words)

===== Algorithms =====

Numerous algorithms have been designed for sequential pattern mining. But most of them are inspired by one of the following classic algorithms: **SPAM**, **SPADE**, **PrefixSpan**, **GSP**, **CM-SPAM **and **CM-SPADE**.

   * **GSP**: R. Agrawal, and R. Srikant, Mining sequential patterns, ICDE 1995, pp. 3–14, 1995.
  * **SPAM**: Ayres, J. Flannick, J. Gehrke, and T. Yiu, Sequential pattern mining using a bitmap representation, KDD 2002, pp. 429–435, 2002.
  * **SPADE**: M. J. Zaki, SPADE: An efficient algorithm for mining frequent sequences, Machine learning, vol. 42(1-2), pp. 31–60, 2001.
  * **PrefixSpan:**  J. Pei, et al. Mining sequential patterns by pattern-growth: The prefixspan approach, IEEE Transactions on knowledge and data engineering, vol. 16(11), pp. 1424–1440, 2004.
  * **CM-SPAM and CM-SPADE**: P. Fournier-Viger, A. Gomariz, M. Campos, and R. Thomas, Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information, PAKDD 2014, pp. 40–52, 2014. 
===== Survey papers =====

Here are a few survey papers that gives an overview of itemset mining.

  * Fournier-Viger, P., Lin, J. C.-W., Kiran, R. U., Koh, Y. S., Thomas, R. (2017). [[http://www.philippe-fournier-viger.com/dspr-paper5.pdf|A Survey of Sequential Pattern Mining]]. Data Science and Pattern Recognition (DSPR), vol. 1(1), pp. 54-77.
  * Gan, W., Lin, J. C. W., Fournier-Viger, P., Chao, H.-C., Yu, P. S. (2019) **A Survey of Parallel Sequential Pattern Mining**. ACM Transactions on Knowledge Discovery from Data (TKDD), 13(3): 25:1-25:34.

===== Tutorial videos =====

  * …

===== Software and datasets =====

To apply** frequent itemset mining**, the SPMF software provides open-source efficient implementations of many algorithms and variations. The **SPMF software**  can be downloaded from the website: [[http://www.philippe-fournier-viger.com/spmf/ |http://www.philippe-fournier-viger.com/spmf/ ]].

To install the software, you may follow the instructions on the **download page **of that website. Then, you may check the **documentation page **which provides examples of how to run various algorithms such as GSP, PrefixSpan, CM-SPAM, CM-SPADE and others for sequential pattern mining. Besides, you may check the **datasets ****page **of that website provides several benchmark datasets for testing the algorithms and comparing their performance.