The Data Mining Forum                             open-source data mining software data science journal data mining conferences machine learning in software engineering MLISE 2021 utility mining workshop at ICDM 2021
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
About GoKrimp, its performance and the SignTest
Posted by: vrodriguezf
Date: January 10, 2019 01:50AM

Hi,

I want to use GoKrimp to compress my sequence database with a set of meaningful patterns. This is an extract of the input file I am using, in case anyone want to reproduce my study case
link to the input file

Currently I am running GoKrimp for the first time with this input file, and it is taking a while to execute (it did not finish yet). Reading the tips & performance information in the documentation of the algorithm, I have two questions:

1. The documentation says: "GoKrimp is very efficient. It output one pattern in each step so you can terminate the algorithm at anytime." Is there a way to monitor how many patterns have been output during the eectuion of the algorithm?

2. Regarding the SignTest used in GoKrimp, the documentation says "If you have a very long sequence instead of a database of many sequences, you should split the long sequences into a set of short sequences.". This is my case indeed. I have a sequence database composed of 1134 sequences, and the average length is 1466.60. I am wondering what is a good sequence length for this algorithm, and which are the negative consequences of spiting the database, in terms of pattern consistency.

Best!



Edited 2 time(s). Last edit at 01/10/2019 01:51AM by vrodriguezf.

Options: ReplyQuote
Re: About GoKrimp, its performance and the SignTest
Date: January 10, 2019 07:00AM

Hi again,

For 1), the feature is not implemented in SPMF. But, this should be easy to add. Basically, it would require to add a counter, a if statement and a System.out.println() statement, I think (I did not look at the code). If you need that feature, I think I could do it quite quickly because I think it would be simple. If you tell me that you want it, I will try to do it.

For 2), it would be best to ask the first author of the paper who provided the code to SPMF.

Best regards,

Philippe



Edited 1 time(s). Last edit at 01/10/2019 07:01AM by webmasterphilfv.

Options: ReplyQuote
Re: About GoKrimp, its performance and the SignTest
Posted by: vrodriguezf
Date: January 10, 2019 07:03AM

Hi Philippe,

Many thanks for your help. I am not in a rush with this analysis so take your time to add that small logging functionality. Regarding point 2), I will contact the author of the paper.

Best!



Edited 1 time(s). Last edit at 01/10/2019 07:04AM by vrodriguezf.

Options: ReplyQuote
Re: About GoKrimp, its performance and the SignTest
Posted by: Rémi Adon
Date: March 28, 2021 07:46AM

Hi there,

any update concerning the sign test ?

I am implementing a version of GoKRIMP, and my v0 has no sign test (this test was not described in early version of the paper)

I am trying to get a grasp on how this test impacts robustness on different datasets. Also if user need to understand the inners of the algorithm to use it, that's definitely a painpoint

Cheers,
Rémi

Options: ReplyQuote
Re: About GoKrimp, its performance and the SignTest
Date: April 09, 2021 05:31PM

Hi, Just curious, have you succeeded to finish your implementation and get feedback from authors?

Best regards,

Philippe

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.