The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining workshop
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  

Pages: 12345...LastNext
Current Page: 1 of 52
Results 1 - 30 of 1558
21 days ago
webmasterphilfv
Hello, Thanks for reading our papers ;-) It is a little bit late, so I will answer the easy questions first, and answer other questions maybe tomorrow. Philippe > after I read the article of EFIM and I'm lost at > certain page. > correct me if my understanding is not correct. > The FHM(you created) is able to > accelerate(improve) the performance of MUI-MINER, >
Forum: The Data Mining / Big Data Forum
28 days ago
webmasterphilfv
Hello all, As many of you know, I have a data mining blog that talk about various topics related to data mining and research: http://data-mining.philippe-fournier-viger.com/ Recently, I have tried to update the blog once every week. I have actually prepared weekly blog posts already until the end of August. If you are interested, you can have a look ;-) Philippe
Forum: The Data Mining / Big Data Forum
28 days ago
webmasterphilfv
The most likely reason is that you need to adjust the parameters. It is possible that in your training data, there is no sequence that match with the sequence that you want to predict. Thus, no prediction is given. If you adjust the parameters of the prediction model, I think we will get some prediction ;-) By the way, sorry for the delay to answer. I have been a bit busy during the last few da
Forum: The Data Mining / Big Data Forum
28 days ago
webmasterphilfv
Hello, Pradivana Wrote: ------------------------------------------------------- > Hi, i did some research using spmf sequence > pattern library and i would like to know is there > anyway to show the accuracy from spmf library? > especially for Markov, CPT and TDAG algorithm > > I've read the documentation and i think i miss > something, please help, and i'm sorry f
Forum: The Data Mining / Big Data Forum
4 weeks ago
webmasterphilfv
Hi, Thanks for using SPMF! I checked the paper just now, and I think that is not explained clearly what Checking_and_removing_item() is supposed to do. Thus, it would be hard to implement this function. If you want to implement it, I think that you should contact the authors of the paper to ask more details about what this function is supposed to do. But with the current paper, I think we don
Forum: The Data Mining / Big Data Forum
5 weeks ago
webmasterphilfv
Hello, In algorithms like HUI-Miner, the items are sorted according to a total order. What is a total order? It means that there is some order between the items. For example, it could be the alphabetical order. According to the alphabetical order, an item "a" must be processed before an item "b", and "b" must be processed before an item "c". The al
Forum: The Data Mining / Big Data Forum
5 weeks ago
webmasterphilfv
Hello, Sure, if you want to discuss this in the forum, you can share details. Is it improving the performance by a great amount? If so, your improvement could be integrated in the SPMF library and you could become a contributor. Best regards, Philippe
Forum: The Data Mining / Big Data Forum
5 weeks ago
webmasterphilfv
Hi, If you want to use sequential patterns to generate rules, then you could check the RuleGen algorithm from SPMF which allows to do that. But it would need to be modified because this algorithms does not consider timestamps. I mean you could maybe draw inspiration from that... Another algorithm in SPMF that find rules and with a windows constraint is TRuleGrowth. But it does not consider t
Forum: The Data Mining / Big Data Forum
5 weeks ago
webmasterphilfv
Hi, In HUI-Miner, you should only combine two itemsets if they are identical except for one item. Thus, you should not combine the itemset d,f,g with the itemset f,g,b because they have two different items (d and b). This is one part of the problem. Best, regads
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
Hello, If you clicked on "GENERATE_DATASET.bat" to generate the dataset, it should generate a sequence database, where each line is a sequence. But I did not use that IBM generator for a long time, so I do not remember how it works. There is some database generators that are perhaps easier to use in my SPMF library. Best regards,
Forum: The Data Mining / Big Data Forum
7 weeks ago
webmasterphilfv
In frequent subgraph mining, you typically have edges and vertices that have names. For example, you could have a graph about a water molecule, and you would have two nodes that have the same label "Hydrogen" and one node with the label "Oxygen" Hydrogen ---- Oxygen ----- Hydrogen Now, when you check if two graphs are isomorphic, yes, you need to check that the structure
Forum: The Data Mining / Big Data Forum
7 weeks ago
webmasterphilfv
Those are mostly the same thing. In simple words, two graphs are isomorphic if we map the edges and vertices of one graph to the other and they are equivalent. Subgraph isomorphim checking is the same thing. But since you add the word "subgraph" it means that you are comparing subgraphs of a graph to check if these subgraphs are equivalent. Yes, the idea of graph isomorphism is
Forum: The Data Mining / Big Data Forum
7 weeks ago
webmasterphilfv
Hello, Yes, I call this a class sequential rules. There is this algorithm in SPMF: the TopSeqClassRules algorithm for mining the top-k class sequential rules that does that. It will let you select {i} to find the k most frequent sequential rules of the form X --> {i}. This algorithm is similar to RuleGrowth but modified to do that. Best Philippe
Forum: The Data Mining / Big Data Forum
8 weeks ago
webmasterphilfv
Hello, Sorry for the delay to answer. I saw your e-mail but actually was too busy in the last few days. I will provide some answer/opinion/suggestion below. How to represent the data is always a good question because depending on how you represent the data, you may obtain different results using a data mining algorithms. A possibility could be that each sequence represents a sequence o
Forum: The Data Mining / Big Data Forum
8 weeks ago
webmasterphilfv
Hi Victor, I see. There is no such implementation in SPMF that does exactly that. It could be done, I think, but it would require some programming to modify the algorithm and it can be more or less complicated. If one modifies it, then it would need to check to make sure that the algorithm remains correct, and sometimes combining two ideas results in an algorithm that cannot find all the patter
Forum: The Data Mining / Big Data Forum
8 weeks ago
webmasterphilfv
Hi, 1) the likely reason is that the input format is not correct. At the end of each itemset, there should be a -1 to separate. For example, the first sequence should be in this format: <10> 42 45 -1 <11> 31 42 45 -1 <20> 18 23 31 42 45 -1 <36> 48 -1 -2 It is the same for the other sequences. 2) Yes, if you have a pattern: <0> 1 2 <1> 3
Forum: The Data Mining / Big Data Forum
8 weeks ago
webmasterphilfv
Great. I will fix the error in the documentation. Thanks for reporting it.
Forum: The Data Mining / Big Data Forum
2 months ago
webmasterphilfv
Hi Victor, I will answer your question below. > So how large this large should be set to make it > exact not approximate algorithm? Does it depends > on data size? The problem with the maximal time interval constraint is that if you apply this constraint when doing closed sequential pattern mining, you may miss some patterns. If you don't care about missing a few patterns
Forum: The Data Mining / Big Data Forum
2 months ago
webmasterphilfv
Dear Victor, > My current way of defining itemset is that for a > sequence of events that a customer took in the > history, an itemset is the events happened in the > same day. So in the final frequent sequence, I am > able to know this sequence covers how many days. > But I also want to reserve the original order of > events in an itemset. So what if I reserve the &
Forum: The Data Mining / Big Data Forum
2 months ago
webmasterphilfv
Dear Victor, I have now added the feature. It is available in the new version of SPMF (2.33) Best regards, Philippe
Forum: The Data Mining / Big Data Forum
2 months ago
webmasterphilfv
Hello all, This is to let you know that we are looking for paper for the third and fourth issues of the DSPR journal (Data Science and Pattern Recognition). See the website below for information: http://dspr.ikelab.net/ Best regards, Philippe
Forum: The Data Mining / Big Data Forum
2 months ago
webmasterphilfv
Dear Sandra, Very happy to hear that it works well. :-) Best regards,
Forum: The Data Mining / Big Data Forum
2 months ago
webmasterphilfv
Hi Victor, Yes, in sequential pattern mining the order in itemsets should not be important. However, for practical purposes, all itemsets should be sorted according to some order in your input file, as explained in the documentation: QuoteNote that it is assumed that items are sorted according to a total order in each itemset and that no item appears twice in the same itemset. That orde
Forum: The Data Mining / Big Data Forum
2 months ago
webmasterphilfv
Hello, Yes, there exists a few algorithms for mining closed sequential patterns with gap constraints. In SPMF, you would need to use the Fournier08-closed+time algorithm to get that. That algorithms actually is designed to work with timestamps but if you set all the timestamps to 0, it should do what you want. This is the example from the documentation: http://www.philippe-fournier-viger.
Forum: The Data Mining / Big Data Forum
2 months ago
webmasterphilfv
In theory, results should be the same. What parameters have you used? What dataset? If you send me the data, I can check it. My e-mail is philfv8 AT yahoo.com By the way, note that if the input format is incorrect, it is possible that the algorithms would not generate the correct result because of this.
Forum: The Data Mining / Big Data Forum
2 months ago
webmasterphilfv
Hi Victor, I think that it can be possible. I will check how to do it tomorrow. I am currently attending PAKDD 2018. But I should have a bit of time tomorrow to see if i can implement that feature easily ;-) Best, Philippe
Forum: The Data Mining / Big Data Forum
2 months ago
webmasterphilfv
Hello all, A new survey on parallel sequential mining has been published on Arxiv by my collaborators and me: Gan, W., Lin, J. C.-W., Fournier-Viger, P., Chao, H.-C., Tseng, V. S., Yu, P.. A Survey of Parallel Sequential Pattern Mining. https://arxiv.org/pdf/1805.10515.pdf Best,
Forum: The Data Mining / Big Data Forum
2 months ago
webmasterphilfv
Hello all, A new survey on high utility pattern mining has been published on Arxiv by my collaborators and me: Gan, W., Lin, J. C.-W., Fournier-Viger, P., Chao, H.-C., Tseng, V. S., Yu, P.. A Survey of Utility-Oriented Pattern Mining. https://arxiv.org/pdf/1805.10511.pdf If you want to know more about high utility pattern mining, you can read it. It is quite comprehensive as it surveys mo
Forum: The Data Mining / Big Data Forum
Pages: 12345...LastNext
Current Page: 1 of 52

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.