7 days ago

webmasterphilfv

Great. If you want to share back some code to SPMF after you add some other measures, it could be great if the code is well implemented!
Best regards,

7 days ago

webmasterphilfv

Hi, yes, in theory, all the sequence prediction models provided in SPMF are updated incrementally by inserting one sequence by one sequence one after the other, and you can ask for a prediction for a new sequence at any time.
So I think that it meets your requirement.
Best regards,

9 days ago

webmasterphilfv

Hi Munira,
Thanks. I am happy that the library is useful. The goal of this library is to help other researchers by sharing implementations ;-)
Most algorithms for sequential rule mining in the library use the support and confidence measure to evaluate the interestingness of rules.
However, the CMDEO algorithm will indicate not only the support and confidence but also the lift of rules in

9 days ago

webmasterphilfv

Dear Fran,
I see. This feature would be possible but it is maybe not so easy to add. Something tricky about this is that a pattern may appear multiple times in a same sequence. For example, the pattern (A,(A) appears at least 5 times in the sequence (A,B,C)(A,,(A,B,D)(A,C). Thus, if I understand well, you would like that the program indicate each occurrence of the pattern in that sequence?

12 days ago

webmasterphilfv

Hi,
Thanks for message. I have taken a little bit time to answer because I have been quite busy this week. Also, I have modified the code a little bit to try to provide some feature that is close to what you asked but is maybe not exactly what you want.
Let me explain. There is a hidden feature in SPMF where you can specify the names of the items. To do that, if you download the new version

12 days ago

webmasterphilfv

They are offered in the SPMF library.

12 days ago

webmasterphilfv

This is not a bug. It is because the algorithm needs to use too much memory and run out of free memory on your computer. Actually, the problem of sequential pattern mining is a hard problem. If you want to make the algorithm faster and decrease memory:
1) You can increase the minsup threshold
2) You can use some additional constraints. For example, if you use algorithms such as CM-SPAM, you can

14 days ago

webmasterphilfv

Yes, there are many big data algorithms for association rule mining.

14 days ago

webmasterphilfv

Hi Maya,
For GSpan, there is a good description in the book of Mohamed Zaki about Data mining.
For the other algorithms I don't know. I think you can probably find some PPT (powerpoint) files if you search in Google with "filetype:ppt" and then the name of the algorithms.
I agree that it would be useful to have more examples of these algorithms.
Best regards,

19 days ago

webmasterphilfv

Hello all,
Recently I have found that some researchers from India at the CVR colege have plagiarized some of my papers on sequential pattern mining.
The researcher is S Venkata Suryanaryana, which is a professor of the CVR college. Another other of the paper is Kalli S N Prasad of the GVIT college of engineering.
If you are interested by that story, you can click on the above link to know

30 days ago

webmasterphilfv

You are welcome. I am glad that the library is useful. The reason why I have started this library is to share code to help other researchers so that everyone can avoid always programming the same algorithms over and over again. :-)
Best regards,
Philippe

30 days ago

webmasterphilfv

Hi,
There are many different ways of spiting a time series to obtain multiple sequences.
1) A simple way is to split the time series into some segments having the same lengths. For example, each sequence could be a day, a week or a month of data.
In SPMF, there is a tool to split a time series into several segments, and a tool to convert a time series to a sequence. However, these tools

4 weeks ago

webmasterphilfv

Hi,
If you want to do a bit of programming, you could modify the code for reading the input file so that it can read your input format instead of the default format. This should not be difficult if you know a little bit about Java programming. You need to find the class that contains the algorithm that you are interested in and then find the code for reading the file and then modify it.
Oth

4 weeks ago

webmasterphilfv

Hi,
I am not sure about the details of pSpade, but in SPMF you can run Spade using several threads. If you are using the source code, you can run the example:
MainTestSPADE_AGP_Parallelized_FatBitMap_saveToFile
It is supposed to run SPADE using multiple threads. However, it was not developed by me but by A. Gomariz. Thus, if you have question about the implementation, I will not be able

5 weeks ago

webmasterphilfv

Hi all,
I have quickly make a list of data mining conferences with upcoming call for papers in 2018. This list is not an exhaustive list. You can add more conferences by posting below.
Deadline in February 2018
MLDM 2018 (International Conference on Machine Learning and Data Mining)
Location: Newark, New Jersey, USA
Date: July 14-19, 2018
Website: http://www.mldm.de/mldm2018.php
Notif

5 weeks ago

webmasterphilfv

I have recently found that a few researchers named Divvela SRINIVASA Rao and Kilaru Gowthami from the Lakireddy Balireddy College of Engineering (LBRCE) have plagiarized my paper, and the paper of someone else. I wrote a blog post about this topic in the link below.
http://data-mining.philippe-fournier-viger.com/plagiarism-divvela-srinivasa-rao-lakireddy-balireddy-college-engineering-lbrce/

5 weeks ago

webmasterphilfv

Hi,
On the download page of the SPMF website, there is instructions about how to install the source code. Then, in the documentation on the website, there is an example for each algorithm. The example for EIHI is Example 52:
http://www.philippe-fournier-viger.com/spmf/EIHI.php
To run the example:
If you are using the source code version of SPMF, launch the file "MainTestEIHI.jav

6 weeks ago

webmasterphilfv

Hello,
Here is the paper in PDF:
http://www.philippe-fournier-viger.com/2016_PHM_Periodic_High_Utility_itemsets.pdf
Here is the powerpoint:
http://www.philippe-fournier-viger.com/PHM_Periodic_High_utility_itemsets.pdf
And you can get the source code in my SPMF library:
http://www.philippe-fournier-viger.com/spmf/
After you download the SPMF library, you can also read the examp

7 weeks ago

webmasterphilfv

Yes, this field is a little old. But there are a lot of open problems that have not been solved yet. You can send me an e-mail and we can discuss that (philfv8 AT yahoo DOT com).
For universities, I think you should rather look for professors working in that field and then contact the professors to ask if they have positions for master, phd or else.

7 weeks ago

webmasterphilfv

There are many techniques suitable for big data. There is a conference every year called IEEE Big Data for example with hundreds of papers about applying data mining to big data. Besides, there are other conferences and journal with many papers about data mining in big data. Making a list of all the models that can be applied to big data would be too long. I think in general many data mining techn

7 weeks ago

webmasterphilfv

Yes, there are many papers that uses a sliding window in pattern mining.
For example, one of my them is my TRuleGrowth paper about sequential rule mining with a sliding window: http://www.philippe-fournier-viger.com/spmf/TKDE2015_sequential_rules.pdf
And you can get the source code in SPMF.
But there are many other papers. For example, many papers about mining patterns in a stream will use

8 weeks ago

webmasterphilfv

You can get the FPGrowth source code in Java in the SPMF library:
http://www.philippe-fournier-viger.com/spmf/

2 months ago

webmasterphilfv

Hello,
Thanks for using SPMF.
The traditional algorithm for generating association rules from itemset is implemented in SPMF. It is called AlgoAgrawalFaster94 in the source code, and is based on the paper of Agrawal for the Apriori algorithm. However, this code is designed to be applied with frequent itemsets instead of high utility itemsets.
One could think that it is easy to apply the

2 months ago

webmasterphilfv

Hello, Thanks for the feedback. Maybe you can send me your input file to my e-mail : philfv8 AT yahoo DOT com and I will investigate the problem. And are you are using the graphical interface or the source code of SPMF?
Best regards,
Philippe

2 months ago

webmasterphilfv

The input format is explained in the documentation on the website and there is an example input file for each algorithm.
If you think that the result is wrong, you may send me the file to my e-mail : philfv8 AT yahoo DOT COM and let me know the parameters that you use and why you think that the result is wrong.

3 months ago

webmasterphilfv

Sorry for the delay to answer your questions. I am currently travelling to attend an international conference and have been busy this week. My answers are below:
>I noticed that there is lift in SPMF's CMDeo algorithm but not ERMiner. Is there any reason for this?
Yes, the reason is that calculating the lift requires additional information that is not required for calculating the confide

3 months ago

webmasterphilfv

Fpgrowth is much faster than Apriori. You should use FpGrowth.
You can find the implementation in SPMF.
Best regards,

4 months ago

webmasterphilfv

> Re: A database has four transactions with min support=60% and min confidence=80%.if it is given in percentage,then what will be the min support count?
support count = 60 % * 4 transactions = 3 transactions (because we will round up)

4 months ago

webmasterphilfv

Hello,
In general, clustering is an unsupervised type of data mining technique. This means that you don't need training and testing data.
You can just apply some clustering algorithms on some data to find clusters directly. Then how to evaluate these clusters? There are several ways:
1) you could ask some experts to look at your clusters visually to see if they make sense
2) you could us

4 months ago

webmasterphilfv

Hello,
In general, in itemset mining, there is no order between the items in an itemset. Thus, {a,b,c} and {a,c,b} are the same itemset.
The PFPM algorithm is thus designed with that assumption that there is no order in an itemset.
You could change that. But that would require some programming. If you want to do that, you should read the code or make sure that you understand the algorith

