The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining workshop
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  

Pages: 12345...LastNext
Current Page: 1 of 55
Results 1 - 30 of 1648
Today
webmasterphilfv
Do you just want to count how often each word appear? This would not be difficult and do not require any advanced algorithm. But maybe you want to find some sequences of words that appear in many sentences? It that is the case, maybe you could extract the sequential patterns. For example, you may check this simple tutorial about how to extract sequential patterns from a Sherlock Holmes book usi
Forum: The Data Mining / Big Data Forum
Today
webmasterphilfv
Hi, I have tried and noticed this issue. Thanks for reporting it. A quick solution is the following: 1) In the file ca.pfv.spmf.algorithms.sequenceprediction.ipredict.predictor.CPT.CPT.CPTPredictor.Java change the line: if(size <= minSize) { by: if(size < minSize) { 2) In the file MainTestCPT.java replace: recursiveDividerMin:1 by: recursiveDivide
Forum: The Data Mining / Big Data Forum
2 days ago
webmasterphilfv
Hi, Thanks for your interest. I would like to add it but I don't have the code and it does not depend on me since I am not the corresponding author of that paper. Please send an e-mail to Wenshen Gan or Prof. Jerry Chun-Wei Lin (the corresponding author) to ask for the source code. Best regards, Philippe
Forum: The Data Mining / Big Data Forum
2 days ago
webmasterphilfv
What errors?
Forum: The Data Mining / Big Data Forum
3 days ago
webmasterphilfv
Glad it works. I will update the documentation soon to make this more clear ;-)
Forum: The Data Mining / Big Data Forum
4 days ago
webmasterphilfv
Thanks a lot for the information. Yes, in the code, there is an assumption that the data and label files must be in the same folder. Thus, if use the following command, I think it will work: java -jar spmf.jar run GoKrimp /home/victor/SPMF/test_goKrimp.dat output.txt test_goKrimp.lab (here: I have removed the path for the label file because it is assumed that they are in the same folder
Forum: The Data Mining / Big Data Forum
4 days ago
webmasterphilfv
Ok. Great. I will add your name when I will update the website. I try: java -jar spmf.jar run GoKrimp test_goKrimp.dat output.txt test_goKrimp.lab I get the file output.txt as follows: support vector machin #SUP: 1922.0710148279322 real world #SUP: 598.4753133154009 machin learn #SUP: 514.3586664227769 state art #SUP: 412.9730013575172 high dimension #SUP: 362.7776787300827
Forum: The Data Mining / Big Data Forum
4 days ago
webmasterphilfv
Hi, I see. There was indeed a bug in the command line/user interface. It was not working when no label file was provided. I have fixed it and upload a new .jar file and .zip file. You can download it again: http://www.philippe-fournier-viger.com/spmf/spmf.jar and it will work. If I type: java -jar spmf.jar run GoKrimp test_goKrimp.dat output.txt Then the result is: 519 323 2 #SU
Forum: The Data Mining / Big Data Forum
4 days ago
webmasterphilfv
Hi, I have checked and it does not work with jmlr_*.dat and jmlr*.lab But GoKrimp is working correctly with test_goKrimp.dat and test_goKrimp.lab. I think that there is some error in the file format of jmlr*.dat and jmlr*.lab because it contains the item "0". You can ignore these files and just look at "test_goKrimp.dat"/ "test_goKrimp.lab" as example to se
Forum: The Data Mining / Big Data Forum
5 days ago
webmasterphilfv
PhD research fellow in Sequence-Driven Analytics and Prediction The Department of Computing, Mathematics and Physics at Western Norway University of Applied Sciences, has a vacancy for a research fellow (PhD position) in Sequence-Driven Analytcis and Prediction for a period of 4 years. The PhD research fellow will be part of the PhD programme in Computer Science: Software Engineering, Sensor N
Forum: The Data Mining / Big Data Forum
6 days ago
webmasterphilfv
You are welcome. Thanks for using it and glad it is useful ;-)
Forum: The Data Mining / Big Data Forum
6 days ago
webmasterphilfv
Hi again, For 1), the feature is not implemented in SPMF. But, this should be easy to add. Basically, it would require to add a counter, a if statement and a System.out.println() statement, I think (I did not look at the code). If you need that feature, I think I could do it quite quickly because I think it would be simple. If you tell me that you want it, I will try to do it. For 2), it wou
Forum: The Data Mining / Big Data Forum
6 days ago
webmasterphilfv
Hi, That is great. I am not sure. In SPMF, I have implemented many algorithms but the GoKrimp/SeqKrimp algorithms have not been implemented by me. I think they work quite differently from other algorithms but I have never read the details of the paper for those algorithms. If you have question, you may contact the main author of the paper (Hoang Thanh Lam). He is quite friendly and I guess t
Forum: The Data Mining / Big Data Forum
7 days ago
webmasterphilfv
Hello all, In case, you have not noticed, I have released a new version of SPMF at the begining of the week. SPFM v.2.36 includes 10 new algorithms: -the CHUI-Miner(Max) algorithm for discovering maximal high utility itemsets in a transaction database with utility information - the NegFIN and dFIN algorithms for frequent itemset mining (by Nader Aryabarzan et al. ) - the HUIF-PSO, HUIF-GA
Forum: The Data Mining / Big Data Forum
7 days ago
webmasterphilfv
Hi, I have checked. The issue is that there are two empty lines at the end of the file "inputSPMF.txt". Normally, it should not cause a problem but there was a bug in the code such that if there is some empty line, it would crash. For now, you can just delete the empty lines and it will work. In the next release of SPMF the bug will have been fixed. ;-) Thanks for reporting it
Forum: The Data Mining / Big Data Forum
7 days ago
webmasterphilfv
Thanks for the reference to the paper. It is an interesting paper. Yes, that is what I was thinking by different counting methods for the occurrences. For some algorithms, I think it may not be easy such as SPAM. But for PrefixSpan, I think it would be possible but would require quite some work. Basically PrefixSpan only keep the first occurrence in each sequence. To modify PrefixSpan one would h
Forum: The Data Mining / Big Data Forum
8 days ago
webmasterphilfv
Hi, Currently, the number of occurrences in each sequence is not considered. If we consider the number of occurrences in each sequence, the problem become more complex and there could be multiple way of counting depending for example on if we allow occurrences to overlap, or keep only the minimal or maximal occurrences... It can get a bit complicated. For example, if you have a pattern ABC app
Forum: The Data Mining / Big Data Forum
10 days ago
webmasterphilfv
Thanks. ;-) Glad my surveys are useful. Philippe
Forum: The Data Mining / Big Data Forum
10 days ago
webmasterphilfv
Hi, I see. It means that the search space is perhaps too big. There are a few solutions: - You can use the optional parameter "maximum antecedent size. If you set it to a small value, it will greatly reduce the search space and the algorithm will consume less memory. - You can also use the optional parameter to set some required items in the consequen of rules. This will also reduce the
Forum: The Data Mining / Big Data Forum
24 days ago
webmasterphilfv
Hi, Welcome to this forum and the SPMF software. Yes, different algorithms will have different performance. PSP is a somewhat old algorithm. You could definitely try some newer algorithms offered in SPMF. But I would like to give you some advices: - You do not have that many sequences but the sequences are indeed very long. If the sequences are also very similar, it is possible that th
Forum: The Data Mining / Big Data Forum
28 days ago
webmasterphilfv
Hi JDB, Yes, it could be done. The information is already stored in the data structure, so it would only necessary to add a little bit code to save it to the file and add a new parameter to activate that. Then, I would have to udpate the documentation. If you need it, I can do it for the next version of SPMF next week. Best regards, Philippe
Forum: The Data Mining / Big Data Forum
4 weeks ago
webmasterphilfv
Hi, That is great. Writing a survey is a good thing to do. If you write the survey well, the paper can be cited by many people. I think that the most important when writing a survey is to make sure sure that you understand the topic well. Your survey should provide a summary of the existing papers. Thus, you need to read many papers, to understand them, and then to extract what is important
Forum: The Data Mining / Big Data Forum
5 weeks ago
webmasterphilfv
You may want to check this: "Comment spam detection by sequence mining" R Kant, SH Sengamedu, KS Kumar (2012) They have applied sequential pattern mining.
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
List has been updated again with a few more conferences.
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
The diagram looks quite nice. It seems like a good idea for path visualization. But I also don't know how to generate such visualization. If you find something good please share it. Maybe other people know?
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
You can also check Kaggle. It has a lot of data.
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
Hello, Ruhallah Ahmadian Wrote: ------------------------------------------------------- > I'm looking for these algorithm, are there these > in SPMF ? > Apriori with Transaction reduction No. > Apriori with Partitioning No. > Apriori with Sampling No. > Apriori with Dynamic itemset counting No. > Vertical Data Format Yes, several algorithm
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
tianikowa Wrote: ------------------------------------------------------- > Thank you for your attention, Yes, I'll definitely > be looking for spmf. > But just one question of "get the initial > population P with the proposed problem-specific > initialize strategy" : > > What is the concept of the following sentence? > > "the child individuals
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
Hi, Thanks for your interest in the software. The main reason why that algorithm is not implemented is because my time is limited and there are hundreds of new algorithms every year. Thus, I have to choose which algorithms I will implement carefully. I usually pick some algorithm that I am interested in. But often, some people will also contribute some code to the software. Then, I don't need t
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
Hi, I am not the author of this paper so I do not have examples for these algorithms, and I do not have time to read that paper, understand it and write an example for that. But if you are interested, I can tell you that in the next version of SPMF, we will release the code for another paper: Wei Song, Chaomin Huang. Mining High Utility Itemsets Using Bio-Inspired Algorithms: A Diverse Optim
Forum: The Data Mining / Big Data Forum
Pages: 12345...LastNext
Current Page: 1 of 55

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.