User Tools

Site Tools


episode_mining

Episode Mining

Introduction

The goal of episode mining is to find some patterns that are appearing in a sequence of events or symbols. The traditional problem called frequent episode mining consists of finding subsequences that appear many times in a sequence of events.

An example

Applications

Key papers

Key papers about frequent episode mining:

  • Mannila et al. (1995) Discovering frequent episodes in sequences.
    Introduced the problem of frequent episode mining and the two first algorithms, called WINEPI and MINEPI. While WINEPI calculates the frequency of episodes using a window, MINEPI uses the concept of minimal occurrences.
  • Huang et a. (2008) Efficient mining of frequent episodes from complex sequences
    This paper observed that window-based algorithms for episode mining such as MINEPI and WINEPI sometimes produce some strange results. o address this issue, a novel measure is proposed to count the support (number of occurrences) of an episode called the head support or head frequency. Two algorithms are designed to find frequent episodes using the head frequency: EMMA (a depth-first search algorithm) and MINEPI+ (a modified version of MINEPI)
  • Fournier-Viger et al. (2019) TKE: Mining Top-K Frequent Episodes
    This paper makes the observation that it is difficult for users to set the minsup parameter for frequent episode mining. As a result users often have to spend considerable time fine tuning the parameters and may find too many or too few episodes. As a solution this paper redefine the task of episode mining as top-k frequent episode mining, where the user can directly choose the number of patterns to be discovered (which is more intuitive). The TKE algorithm is defined to efficiently find the top-k most frequent episodes.
  • Zhou et al. (2010) Mining closed episodes from event sequences efficiently
    This paper designed an algorithm to find a summary of all frequent episodes, called the closed episodes. The goal is to reduce the number of episodes presented to the user. An algorithm called Clo_episode is proposed but only serial episodes are considered.
  • Tati and Cule (2011) Mining closed episodes with simultaneous events
    This is an improvement of the paper of Zhou et al. (201), where closed episodes are found for the general case where events may bemay be simultaneous.
  • Laxman et al.(2007) Discovering frequent episodes and learning Hidden Markov Models: A formal connection
    Laxman proposes a novel method to count the occurrences of frequent episodes called the non overlapping frequency. The idea is to only count the non overlapping occurrences of episodes.
  • Laxman et al. (2007) A Fast Algorithm For Finding Frequent Episodes In Event Streams
    Laxman presents algorithms to find episodes in a data stream.
  • Ao et al. (2018) Large-scale Frequent Episode Mining from Complex Event Sequences with Hierarchies LA-FEMH
    This paper presents a big data algorithm for mining episodes, named LA-FEMH.

Key papers about episode rule mining:

  • Mannila et al. (1995) Discovering frequent episodes in sequences.
    Besides, introducing the problem of frequent episode mining, Mannila et al. also proposed the concept of episode rules, which are generated by a simple post-processing algorithm, applied after finding the frequent episodes
  • Oualid et al. (2021) Mining Episode Rules from Event Sequences Under Non-overlapping Frequency
    The authors designed an algorithm called NONEPI for mining episode rules using the concept of non overlapping frequency.
  • Fournier-Viger et al (2021) Mining Partially-Ordered Episode Rules in an Event Sequence
    This paper defines are more general type of episode rules called partially-ordered episode rules. These rules loosen the ordering constraint between events. As a result, a partially-ordered episode rule can summarize multiple traditional episode rules. An algorithm called POERM was designed.
  • Fahed et al. (2018) DEER: Distant and Essential Episode Rules for early prediction DEER
    This paper presented an algorithm named DEER to find episode rules that can predict distant events rather than events that appear closely after other events. The paper does not handle simultaneous events.

Key papers about high utility episode mining:

  • Wu et al. (2013) Mining high utility episodes in complex event sequences
    Wu et al. generalized the problem of frequent episode mining to include quantities and weights. The resulting problem is called high utility episode mining. For instance, it can be used to find sequences of purchases made by customers that yield a high profit. To find the pattern, the US-SPAN algorithm was proposed.
  • Fournier-Viger et al. (2019) HUE-Span: Fast High Utility Episode Mining
    This paper makes the important observation that the previous algorithm for high utility episode mining US-SPAN can underestimate the utilitiy of episodes by not taking into account all timestamps of minimal occurrences for utility calculations. This problem is addressed by redefining the way utility is calculated and a new algorithm called HUE-SPAN is proposed.
  • S. Rathore, S. Dawar, V. Goyal, and D. Patel (2016) Top-k high utility episode mining from a complex event sequence
    This paper proposed the problem of top-k high utility episode mining.

Tutorial videos

Software and datasets

To apply episode mining, the SPMF software provides open-source efficient implementations of many algorithms and variations such as TKE, EMMA, MINEPI+, MINEPI, POERM etc. These algorithms can be used to find periodic patterns in a single sequence or multiple sequences. The SPMF software can be downloaded from the website: http://www.philippe-fournier-viger.com/spmf/ .

To install the software, you may follow the instructions on the download page of that website. Then, you may check the documentation page which provides examples of how to run various algorithms such as PFPM and MPFPS for periodic pattern mining. Besides, you may check the datasetspage of that website provides several benchmark datasets for testing the algorithms and comparing their performance.

episode_mining.txt · Last modified: 2021/08/25 23:31 by philfv