I want to use GoKrimp to compress my sequence database with a set of meaningful patterns. This is an extract of the input file I am using, in case anyone want to reproduce my study case
link to the input file
Currently I am running GoKrimp for the first time with this input file, and it is taking a while to execute (it did not finish yet). Reading the tips & performance information in the documentation of the algorithm, I have two questions:
1. The documentation says: "GoKrimp is very efficient. It output one pattern in each step so you can terminate the algorithm at anytime.
" Is there a way to monitor how many patterns have been output during the eectuion of the algorithm?
2. Regarding the SignTest used in GoKrimp, the documentation says "If you have a very long sequence instead of a database of many sequences, you should split the long sequences into a set of short sequences.
". This is my case indeed. I have a sequence database composed of 1134 sequences, and the average length is 1466.60. I am wondering what is a good sequence length for this algorithm, and which are the negative consequences of spiting the database, in terms of pattern consistency.
Edited 2 time(s). Last edit at 01/10/2019 01:51AM by vrodriguezf.