The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining book
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  

Pages: 12345...LastNext
Current Page: 1 of 49
Results 1 - 30 of 1464
2 days ago
webmasterphilfv
> Re: A database has four transactions with min support=60% and min confidence=80%.if it is given in percentage,then what will be the min support count? support count = 60 % * 4 transactions = 3 transactions (because we will round up)
Forum: The Data Mining / Big Data Forum
4 days ago
webmasterphilfv
Hello, In general, clustering is an unsupervised type of data mining technique. This means that you don't need training and testing data. You can just apply some clustering algorithms on some data to find clusters directly. Then how to evaluate these clusters? There are several ways: 1) you could ask some experts to look at your clusters visually to see if they make sense 2) you could us
Forum: The Data Mining / Big Data Forum
6 days ago
webmasterphilfv
Hello, In general, in itemset mining, there is no order between the items in an itemset. Thus, {a,b,c} and {a,c,b} are the same itemset. The PFPM algorithm is thus designed with that assumption that there is no order in an itemset. You could change that. But that would require some programming. If you want to do that, you should read the code or make sure that you understand the algorith
Forum: The Data Mining / Big Data Forum
7 days ago
webmasterphilfv
In my opinion, it makes more sense to round up to 2 (take the ceiling of the number) because we do not want to accept something below the minimum support. If you round to 1 then you will accept patterns that do not satisfy the minimum support. This would not be good. So this is how it is implemented in the SPMF software. In other data mining software, it may be implemented in some other ways.
Forum: The Data Mining / Big Data Forum
13 days ago
webmasterphilfv
Hello, On the dataset page of the website, you should only use the datasets provided in the subsection"Datasets for Sequential Pattern Mining / Sequential Rule Mining / Sequence Prediction" for the sequential rule mining algorithms. The Chess and Mushroom dataset are in the category for itemset mining and association rule mining, which does not have the -1 and -2 as you have observ
Forum: The Data Mining / Big Data Forum
14 days ago
webmasterphilfv
I am not so sure about how to run it. Maybe you can send an e-mail to Prof. Zhihong Deng to ask about how to run his code. By the way, I converted his code of PrePost to Java. It is available in SPMF. You can also try that version if you want to use Java. Best regards,
Forum: The Data Mining / Big Data Forum
15 days ago
webmasterphilfv
Hello, I see. You are confusing the TWU measure and the utility measure. The property 4 is for the TWU measure. It is not for the utility measure. Thus, the fact that itemset {5} has a utility that is lower than minutil is not sufficient to apply Property 4. If you want to apply Property 4, the TWU of {5} must be lower than minutil (it is not the utility that must be lower than minu
Forum: The Data Mining / Big Data Forum
16 days ago
webmasterphilfv
You could look at the MOA software for mining data streams. It has a stream dataset generator and maybe there are some datasets on the website.
Forum: The Data Mining / Big Data Forum
16 days ago
webmasterphilfv
Hello, Yes, it is possible that the itemset {5} is not a high utility itemsets but that the itemset {1,2,3,4,5} is a high utility itemset. Actually, in general, if you have two itemsets A and B such that A is a subset of B, the utility of A can be lower than, equal or greater than the utility of B. In other words, if A = {5} and B = {1,2,3,4,5}, the utility of B can be greater, lower or eq
Forum: The Data Mining / Big Data Forum
18 days ago
webmasterphilfv
Thanks for sharing. All these things that can be done with deep learning are quite impressive. It is a very interesting field of research ;-)
Forum: The Artificial Intelligence Forum
24 days ago
webmasterphilfv
Hello, I am not sure exactly for this case. But you could have a look at the topic of "change detection / concept drift" in data streams. In data stream mining, there are various algorithms in pattern mining that attempts to detect whether there is some significant change between two time windows. Maybe that you could find some ideas by looking at this topic. Best regards, Phi
Forum: The Data Mining / Big Data Forum
29 days ago
webmasterphilfv
You are welcome. Best regards, Philippe
Forum: The Data Mining / Big Data Forum
30 days ago
webmasterphilfv
Hello, - Current version of GoKrimp does not work for a sequence of itemsets but for list of items. Does it mean that 1 -1 1 2 3 -1 1 3 -1 4 -1 3 6 -1 -2 ISNT supported, but 1 -1 123 -1 13 -1 4 -1 36 -1 -2 IS OK?? I will explain what this means. A sequence can be viewed as a sequence of events. For example, a sequence 1 -1 2 3 -1 4 -1 -2 means that the event 1 appeared, was followed by e
Forum: The Data Mining / Big Data Forum
30 days ago
webmasterphilfv
Yes, the reason is that the transaction utility value in your example file is wrong. The file that you gave to me is: 1 2 3 4:0.215:0.385 0.077 0.385 0.077 2 4:0.077:0.077 0.077 1 4:0.215:0.385 0.077 1 2 4 5:0.144:0.385 0.077 0.077 0.077 1 2 3 4 5:0.187:0.385 0.077 0.385 0.077 0.077 2 3 5:0.167:0.077 0.385 0.077 Consider the first transaction: 1 2 3 4:0.215:0.385 0.077 0.385
Forum: The Data Mining / Big Data Forum
4 weeks ago
webmasterphilfv
Hi, The length constraint has not been implemented for these algorithms. But it has been implemented for the CM-SPAM algorithm, which takes the same input and produce the same output as GSP/Spade/Spam. So the easiest solution would be to use CM-SPAM which have these features already and should be faster than those algorithms. Best, Philippe
Forum: The Data Mining / Big Data Forum
4 weeks ago
webmasterphilfv
Yes, the name is not clearly mentioned in the documentation but there is an example: http://www.philippe-fournier-viger.com/spmf/HirateYamana.php You will find all the details about the input format and how to use that algorithm there!
Forum: The Data Mining / Big Data Forum
5 weeks ago
webmasterphilfv
Hello, About this code: private int compareItems(String item1, String item2) { // ... } T there is a reason for using: int compare = (int)( mapItemToTWU.get(item1) - mapItemToTWU.get(item2)); It is that if we sort the items by ascending order according of TWU, the algorithm will be faster than if you just sort by alphabetical order. So in your function, your are sorting by al
Forum: The Data Mining / Big Data Forum
5 weeks ago
webmasterphilfv
You can download a simple and fast Java implementation of K-Means from the SPMF open source data mining library: http://www.philippe-fournier-viger.com/spmf/
Forum: The Data Mining / Big Data Forum
5 weeks ago
webmasterphilfv
Hi Tarun, You may have discovered a bug! Could you please send me your input file with the algorithm named and parameters that you have used to my e-mail? philfv8 AT yahoo DOT com Then, I will try to see what is the problem. It seems like a bug. Then, i will fix it. Best, Philippe
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
You can get the code here: http://www.philippe-fournier-viger.com/spmf/
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
You are welcome. Glad that it is faster than CHUI-Miner. By the way, there is also an algorithm called CLS-Miner which is an improvement of CHUI-Miner. It is not in SPMF but if you contact the main author maybe he can give you the code. Best, Philippe
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
Hi, For the EFIM algorithm, there is some detailed example in the journal paper: Zida, S., Fournier-Viger, P., Lin, J. C.-W., Wu, C.-W., Tseng, V.-S. (2017). EFIM: A Fast and Memory Efficient Algorithm for High-Utility Itemset Mining . Knowledge and Information Systems (KAIS), Springer, 51(2), 595-625 http://philippe-fournier-viger.com/EFIM_JOURNAL_VERSION%20KAIS%202016.pdf For the EFI
Forum: The Data Mining / Big Data Forum
7 weeks ago
webmasterphilfv
To calculate the accuracy, what you need to do is do a loop where you try to predict the class of multiple records (instances). You need to count how many records are correctly predicted. Then you divide the number of correctly predicted record by the total number of records used for making predictions. So you just need to write this in Java to get the accuracy.
Forum: The Data Mining / Big Data Forum
7 weeks ago
webmasterphilfv
You mean how complicated? It can be quite complicated since data mining software typically offers many algorithms and features. For example, I am the founder of the SPMF data mining software. It took a few years of work to develop that software. But if someone has more programmers, it could be done faster. But in real life, you don't need to implement all algorithms. For example, if you wa
Forum: The Data Mining / Big Data Forum
7 weeks ago
webmasterphilfv
Hi, Glad you like the tool. In terms of file format, the format is explained in the documentation for each algorithm. Moreover, the ARFF format, which is used by some other data mining software is also supported for some algorithms. However, in general, there is a strict file format that must be used. So if you want to use some algorithm, you likely need to transform your data to the prop
Forum: The Data Mining / Big Data Forum
7 weeks ago
webmasterphilfv
Hi, It really depends. There are some good reasons to say that the lift is better than the confidence. But there are also some cases where the lift is not the best measure and some other measures could be used. In fact, each measures has some cases where it works well and some cases where it does not work well. If you want to know more about this you can read section 6.7 of this chapter:
Forum: The Data Mining / Big Data Forum
7 weeks ago
webmasterphilfv
Hello, I am not sure if it does not work. Maybe you can send your whole dataset to my e-mail : philfv8 AT yahoo.com Because, in the above example, I cannot see anything wrong. Actually, the pattern 9 -1 9 -1 9 92 -1 #SUP: 34848 does not seem to appear in the sequences that you have shown to me. By the way, to more easily check the results, you can set the parameter "Show sequences i
Forum: The Data Mining / Big Data Forum
7 weeks ago
webmasterphilfv
Hello, I have tried with the above file and minutil = 0.1 and I get the following results: 1 #UTIL: 1.54 1 2 #UTIL: 1.3859999 1 2 3 #UTIL: 1.694 1 2 3 4 #UTIL: 1.8479998 1 2 3 4 5 #UTIL: 1.0009998 1 2 3 5 #UTIL: 0.9239999 1 2 4 #UTIL: 1.6169999 1 2 4 5 #UTIL: 1.2319999 1 2 5 #UTIL: 1.078 1 3 #UTIL: 1.54 1 3 4 #UTIL: 1.694 1 3 4 5 #UTIL: 0.924 1 3 5 #UTIL: 0.847 1 4 #UTIL: 1.8479
Forum: The Data Mining / Big Data Forum
8 weeks ago
webmasterphilfv
Hello, Thanks for the comment. I have fixed the code for you. I have added a new version of FHM called FHM(float), which can process datasets with float values. You can download the code from the SPFM website. If you are using the command line or user interface, the algorithm is called "FHM(float)". If you want to look at the code, it is located in the package: ca.pfv.spmf.
Forum: The Data Mining / Big Data Forum
Pages: 12345...LastNext
Current Page: 1 of 49

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.