This forum is about

Results 1 - 30 of 1648

Today

webmasterphilfv

Do you just want to count how often each word appear? This would not be difficult and do not require any advanced algorithm.
But maybe you want to find some sequences of words that appear in many sentences? It that is the case, maybe you could extract the sequential patterns. For example, you may check this simple tutorial about how to extract sequential patterns from a Sherlock Holmes book usi

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

Today

webmasterphilfv

Hi,
I have tried and noticed this issue. Thanks for reporting it.
A quick solution is the following:
1) In the file
ca.pfv.spmf.algorithms.sequenceprediction.ipredict.predictor.CPT.CPT.CPTPredictor.Java
change the line:
if(size <= minSize) {
by:
if(size < minSize) {
2) In the file MainTestCPT.java
replace:
recursiveDividerMin:1
by:
recursiveDivide

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 days ago

webmasterphilfv

Hi,
Thanks for your interest. I would like to add it but I don't have the code and it does not depend on me since I am not the corresponding author of that paper. Please send an e-mail to Wenshen Gan or Prof. Jerry Chun-Wei Lin (the corresponding author) to ask for the source code.
Best regards,
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

3 days ago

webmasterphilfv

Glad it works.
I will update the documentation soon to make this more clear ;-)

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

4 days ago

webmasterphilfv

Thanks a lot for the information.
Yes, in the code, there is an assumption that the data and label files must be in the same folder. Thus, if use the following command, I think it will work:
java -jar spmf.jar run GoKrimp /home/victor/SPMF/test_goKrimp.dat output.txt test_goKrimp.lab
(here: I have removed the path for the label file because it is assumed that they are in the same folder

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

4 days ago

webmasterphilfv

Ok. Great. I will add your name when I will update the website.
I try:
java -jar spmf.jar run GoKrimp test_goKrimp.dat output.txt test_goKrimp.lab
I get the file output.txt as follows:
support vector machin #SUP: 1922.0710148279322
real world #SUP: 598.4753133154009
machin learn #SUP: 514.3586664227769
state art #SUP: 412.9730013575172
high dimension #SUP: 362.7776787300827

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

4 days ago

webmasterphilfv

Hi,
I see. There was indeed a bug in the command line/user interface. It was not working when no label file was provided. I have fixed it and upload a new .jar file and .zip file. You can download it again:
http://www.philippe-fournier-viger.com/spmf/spmf.jar
and it will work. If I type:
java -jar spmf.jar run GoKrimp test_goKrimp.dat output.txt
Then the result is:
519 323 2 #SU

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

4 days ago

webmasterphilfv

Hi,
I have checked and it does not work with jmlr_*.dat and jmlr*.lab
But GoKrimp is working correctly with test_goKrimp.dat and test_goKrimp.lab.
I think that there is some error in the file format of jmlr*.dat and jmlr*.lab because it contains the item "0". You can ignore these files and just look at "test_goKrimp.dat"/ "test_goKrimp.lab" as example to se

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

5 days ago

webmasterphilfv

PhD research fellow in Sequence-Driven Analytics and Prediction
The Department of Computing, Mathematics and Physics at Western Norway University of Applied Sciences, has a vacancy for a research fellow (PhD position) in Sequence-Driven Analytcis and Prediction for a period of 4 years.
The PhD research fellow will be part of the PhD programme in Computer Science: Software Engineering, Sensor N

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 days ago

webmasterphilfv

11. Re: GoKrimp ERROR MESSAGE = java.lang.StringIndexOutOfBoundsException: String index out of range: 0

You are welcome. Thanks for using it and glad it is useful ;-)

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 days ago

webmasterphilfv

Hi again,
For 1), the feature is not implemented in SPMF. But, this should be easy to add. Basically, it would require to add a counter, a if statement and a System.out.println() statement, I think (I did not look at the code). If you need that feature, I think I could do it quite quickly because I think it would be simple. If you tell me that you want it, I will try to do it.
For 2), it wou

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 days ago

webmasterphilfv

Hi,
That is great. I am not sure. In SPMF, I have implemented many algorithms but the GoKrimp/SeqKrimp algorithms have not been implemented by me. I think they work quite differently from other algorithms but I have never read the details of the paper for those algorithms.
If you have question, you may contact the main author of the paper (Hoang Thanh Lam). He is quite friendly and I guess t

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 days ago

webmasterphilfv

Hello all,
In case, you have not noticed, I have released a new version of SPMF at the begining of the week. SPFM v.2.36 includes 10 new algorithms:
-the CHUI-Miner(Max) algorithm for discovering maximal high utility itemsets in a transaction database with utility information
- the NegFIN and dFIN algorithms for frequent itemset mining (by Nader Aryabarzan et al. )
- the HUIF-PSO, HUIF-GA

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 days ago

webmasterphilfv

15. Re: GoKrimp ERROR MESSAGE = java.lang.StringIndexOutOfBoundsException: String index out of range: 0

Hi,
I have checked. The issue is that there are two empty lines at the end of the file "inputSPMF.txt". Normally, it should not cause a problem but there was a bug in the code such that if there is some empty line, it would crash.
For now, you can just delete the empty lines and it will work. In the next release of SPMF the bug will have been fixed. ;-)
Thanks for reporting it

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 days ago

webmasterphilfv

Thanks for the reference to the paper. It is an interesting paper. Yes, that is what I was thinking by different counting methods for the occurrences. For some algorithms, I think it may not be easy such as SPAM. But for PrefixSpan, I think it would be possible but would require quite some work. Basically PrefixSpan only keep the first occurrence in each sequence. To modify PrefixSpan one would h

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

8 days ago

webmasterphilfv

Hi,
Currently, the number of occurrences in each sequence is not considered. If we consider the number of occurrences in each sequence, the problem become more complex and there could be multiple way of counting depending for example on if we allow occurrences to overlap, or keep only the minimal or maximal occurrences... It can get a bit complicated. For example, if you have a pattern ABC app

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

10 days ago

webmasterphilfv

Thanks. ;-) Glad my surveys are useful.
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

10 days ago

webmasterphilfv

Hi,
I see. It means that the search space is perhaps too big. There are a few solutions:
- You can use the optional parameter "maximum antecedent size. If you set it to a small value, it will greatly reduce the search space and the algorithm will consume less memory.
- You can also use the optional parameter to set some required items in the consequen of rules. This will also reduce the

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

24 days ago

webmasterphilfv

Hi,
Welcome to this forum and the SPMF software.
Yes, different algorithms will have different performance. PSP is a somewhat old algorithm. You could definitely try some newer algorithms offered in SPMF.
But I would like to give you some advices:
- You do not have that many sequences but the sequences are indeed very long. If the sequences are also very similar, it is possible that th

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

28 days ago

webmasterphilfv

Hi JDB,
Yes, it could be done. The information is already stored in the data structure, so it would only necessary to add a little bit code to save it to the file and add a new parameter to activate that. Then, I would have to udpate the documentation.
If you need it, I can do it for the next version of SPMF next week.
Best regards,
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

4 weeks ago

webmasterphilfv

Hi,
That is great. Writing a survey is a good thing to do. If you write the survey well, the paper can be cited by many people.
I think that the most important when writing a survey is to make sure sure that you understand the topic well. Your survey should provide a summary of the existing papers. Thus, you need to read many papers, to understand them, and then to extract what is important

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

5 weeks ago

webmasterphilfv

You may want to check this:
"Comment spam detection by sequence mining"
R Kant, SH Sengamedu, KS Kumar (2012)
They have applied sequential pattern mining.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 weeks ago

webmasterphilfv

List has been updated again with a few more conferences.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 weeks ago

webmasterphilfv

The diagram looks quite nice. It seems like a good idea for path visualization. But I also don't know how to generate such visualization. If you find something good please share it. Maybe other people know?

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 weeks ago

webmasterphilfv

You can also check Kaggle. It has a lot of data.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 weeks ago

webmasterphilfv

Hello,
Ruhallah Ahmadian Wrote:
-------------------------------------------------------
> I'm looking for these algorithm, are there these
> in SPMF ?
> Apriori with Transaction reduction
No.
> Apriori with Partitioning
No.
> Apriori with Sampling
No.
> Apriori with Dynamic itemset counting
No.
> Vertical Data Format
Yes, several algorithm

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 weeks ago

webmasterphilfv

tianikowa Wrote:
-------------------------------------------------------
> Thank you for your attention, Yes, I'll definitely
> be looking for spmf.
> But just one question of "get the initial
> population P with the proposed problem-specific
> initialize strategy" :
>
> What is the concept of the following sentence?
>
> "the child individuals

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 weeks ago

webmasterphilfv

Hi,
Thanks for your interest in the software. The main reason why that algorithm is not implemented is because my time is limited and there are hundreds of new algorithms every year. Thus, I have to choose which algorithms I will implement carefully. I usually pick some algorithm that I am interested in. But often, some people will also contribute some code to the software. Then, I don't need t

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 weeks ago

webmasterphilfv

Hi,
I am not the author of this paper so I do not have examples for these algorithms, and I do not have time to read that paper, understand it and write an example for that. But if you are interested, I can tell you that in the next version of SPMF, we will release the code for another paper:
Wei Song, Chaomin Huang. Mining High Utility Itemsets Using Bio-Inspired Algorithms: A Diverse Optim

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum