The Data Mining Forum
This forum is about data mining
, data science
and big data
: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger
. No registration is required to use this forum!
Find sequences and periods in events
Date: April 26, 2017 01:16AM
Hello forum members,
I discovered SPMF lib and it looks great! Thanks for all the effort
The available algorithms are a lot and my knowledge is very limited.
I hope to find some directions and help.
The situation is a follows:
The data is a series of sensor events, originated from a Smart Home system.
The structure is very simple: (timestamp, source_device, state).
State is not important in this context, but mentioned for completeness.
"2017-04-20 13:35" "Light_bath" on
"2017-04-20 13:40" "Light_bath" off
"2017-04-20 13:41" "Motion_corridor" on
"2017-04-20 13:42" "Light_2" on
The data is in a SQL database and I am interested to find:
1. Sequences of sensor events. Say Motion_corridor is always followed by Light_2
2. and periods of this sequences. Say (Motion_corridor, Light_2) every 2 hours or every day once.
My problem is that a lot of the algorithm description examples use a different transaction format and I am not sure how to apply it on this situation.
I would be happy for some help/directions.
Re: Find sequences and periods in events
Date: April 29, 2017 06:32AM
Glad you like the software. Yes, to apply the algorithms, it would be necessary to do some pre-processing to convert your data to an appropriate format that can be used by the algorithms.
If you have several sequences, then you could see this as a problem of sequential pattern mining or sequential rule mining, in my opinion.
With sequential pattern mining algorithms such as CM-SPAM, you could find some sequences of events that are common to several of your sequences. For example, maybe you could find something like
(Motion_corridor, Light 1), (Motion_bedoom), (Motion, kitchen)...
Then, if you use sequential rules with algorithms such as RuleGrowth, you could find patterns such as :
(Motion_bedroom, Light 1 --> (Motion, kitchen) with a probability of 60 %. This is interesting as it indicates the probability that some events will be followed by another event.
Lastly, I think that you could also check the periodic pattern mining but these algorithms are only applied to a single sequence. So for a long sequence of events, you could find some patterns such that something is repeating approximately every X events.
In the documentation, there is an example for each algorithm. My recommendation could be to make some tests with the examples to see what kind of results you can get and help you to choose some algorithm.
Re: Find sequences and periods in events
Date: May 14, 2017 11:29PM
To add to what Philippe Fournier-Viger already mentioned.
In my own PhD project I work with many datasets from smart home environments or other type of logged events from human behavior. From my experience with such data sets a couple of things are particularly important.
First, when each sequence spans a long period of time (e.g. a day), putting time constraints on the duration of a pattern is needed, otherwise many patterns are mined that span the whole day (i.e., they start with something happening at 9:00, and end with something happening at 23:58). Such patterns are undesirable, because with such long time gaps it is unlikely that the occurrence of the 9:00 event and of the 23:58 still have something to do with each other.
Secondly, I noticed that in human behavior there are many routines that are not strictly sequential. One quite likely morning routine would e.g. be to first wake up ("bedroom motion" in your log), then in any arbitrary order have breakfast (kitchen motion) and take a shower (bathroom motion), and finally leave the house for work (hallway motion). In sequential pattern mining one would need two separate patterns for the two arbitrary orderings of breakfast and shower, however, lowering the support threshold enough to mine these two patterns might result in a plethora of patterns being found. Recently I have developed a technique to mine patterns that are more expressive, and can contain arbitrarily ordered blocks, loops, and choices next to sequential parts.
The technique is currently implemented in the process mining tool ProM, and I hope that the algorithm will soon also be included into SPMF.