This forum is about

Results 1 - 30 of 1464

2 days ago

webmasterphilfv

> Re: A database has four transactions with min support=60% and min confidence=80%.if it is given in percentage,then what will be the min support count?
support count = 60 % * 4 transactions = 3 transactions (because we will round up)

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

4 days ago

webmasterphilfv

Hello,
In general, clustering is an unsupervised type of data mining technique. This means that you don't need training and testing data.
You can just apply some clustering algorithms on some data to find clusters directly. Then how to evaluate these clusters? There are several ways:
1) you could ask some experts to look at your clusters visually to see if they make sense
2) you could us

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 days ago

webmasterphilfv

Hello,
In general, in itemset mining, there is no order between the items in an itemset. Thus, {a,b,c} and {a,c,b} are the same itemset.
The PFPM algorithm is thus designed with that assumption that there is no order in an itemset.
You could change that. But that would require some programming. If you want to do that, you should read the code or make sure that you understand the algorith

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 days ago

webmasterphilfv

In my opinion, it makes more sense to round up to 2 (take the ceiling of the number) because we do not want to accept something below the minimum support. If you round to 1 then you will accept patterns that do not satisfy the minimum support. This would not be good.
So this is how it is implemented in the SPMF software. In other data mining software, it may be implemented in some other ways.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

13 days ago

webmasterphilfv

5. Re: datasets

Hello,
On the dataset page of the website, you should only use the datasets provided in the subsection"Datasets for Sequential Pattern Mining / Sequential Rule Mining / Sequence Prediction" for the sequential rule mining algorithms.
The Chess and Mushroom dataset are in the category for itemset mining and association rule mining, which does not have the -1 and -2 as you have observ

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

14 days ago

webmasterphilfv

I am not so sure about how to run it. Maybe you can send an e-mail to Prof. Zhihong Deng to ask about how to run his code.
By the way, I converted his code of PrePost to Java. It is available in SPMF. You can also try that version if you want to use Java.
Best regards,

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

15 days ago

webmasterphilfv

Hello,
I see. You are confusing the TWU measure and the utility measure.
The property 4 is for the TWU measure. It is not for the utility measure.
Thus, the fact that itemset {5} has a utility that is lower than minutil is not sufficient to apply Property 4.
If you want to apply Property 4, the TWU of {5} must be lower than minutil (it is not the utility that must be lower than minu

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

16 days ago

webmasterphilfv

You could look at the MOA software for mining data streams. It has a stream dataset generator and maybe there are some datasets on the website.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

16 days ago

webmasterphilfv

Hello,
Yes, it is possible that the itemset {5} is not a high utility itemsets but that the itemset {1,2,3,4,5} is a high utility itemset.
Actually, in general, if you have two itemsets A and B such that A is a subset of B, the utility of A can be lower than, equal or greater than the utility of B. In other words, if A = {5} and B = {1,2,3,4,5}, the utility of B can be greater, lower or eq

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

18 days ago

webmasterphilfv

Thanks for sharing. All these things that can be done with deep learning are quite impressive. It is a very interesting field of research ;-)

Forum: The Artificial Intelligence Forum

Forum: The Artificial Intelligence Forum

24 days ago

webmasterphilfv

Hello,
I am not sure exactly for this case. But you could have a look at the topic of "change detection / concept drift" in data streams. In data stream mining, there are various algorithms in pattern mining that attempts to detect whether there is some significant change between two time windows. Maybe that you could find some ideas by looking at this topic.
Best regards,
Phi

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

29 days ago

webmasterphilfv

You are welcome.
Best regards,
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

30 days ago

webmasterphilfv

Hello,
- Current version of GoKrimp does not work for a sequence of itemsets but for list of items.
Does it mean that 1 -1 1 2 3 -1 1 3 -1 4 -1 3 6 -1 -2 ISNT supported, but 1 -1 123 -1 13 -1 4 -1 36 -1 -2 IS OK??
I will explain what this means. A sequence can be viewed as a sequence of events. For example, a sequence 1 -1 2 3 -1 4 -1 -2 means that the event 1 appeared, was followed by e

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

30 days ago

webmasterphilfv

Yes, the reason is that the transaction utility value in your example file is wrong.
The file that you gave to me is:
1 2 3 4:0.215:0.385 0.077 0.385 0.077
2 4:0.077:0.077 0.077
1 4:0.215:0.385 0.077
1 2 4 5:0.144:0.385 0.077 0.077 0.077
1 2 3 4 5:0.187:0.385 0.077 0.385 0.077 0.077
2 3 5:0.167:0.077 0.385 0.077
Consider the first transaction:
1 2 3 4:0.215:0.385 0.077 0.385

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

4 weeks ago

webmasterphilfv

Hi,
The length constraint has not been implemented for these algorithms. But it has been implemented for the CM-SPAM algorithm, which takes the same input and produce the same output as GSP/Spade/Spam. So the easiest solution would be to use CM-SPAM which have these features already and should be faster than those algorithms.
Best,
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

4 weeks ago

webmasterphilfv

Yes, the name is not clearly mentioned in the documentation but there is an example:
http://www.philippe-fournier-viger.com/spmf/HirateYamana.php
You will find all the details about the input format and how to use that algorithm there!

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

5 weeks ago

webmasterphilfv

Hello,
About this code:
private int compareItems(String item1, String item2) {
// ...
}
T there is a reason for using:
int compare = (int)( mapItemToTWU.get(item1) - mapItemToTWU.get(item2));
It is that if we sort the items by ascending order according of TWU, the algorithm will be faster than if you just sort by alphabetical order.
So in your function, your are sorting by al

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

5 weeks ago

webmasterphilfv

You can download a simple and fast Java implementation of K-Means from the SPMF open source data mining library:
http://www.philippe-fournier-viger.com/spmf/

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

5 weeks ago

webmasterphilfv

Hi Tarun,
You may have discovered a bug!
Could you please send me your input file with the algorithm named and parameters that you have used to my e-mail? philfv8 AT yahoo DOT com
Then, I will try to see what is the problem. It seems like a bug. Then, i will fix it.
Best,
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 weeks ago

webmasterphilfv

You can get the code here:
http://www.philippe-fournier-viger.com/spmf/

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 weeks ago

webmasterphilfv

You are welcome. Glad that it is faster than CHUI-Miner. By the way, there is also an algorithm called CLS-Miner which is an improvement of CHUI-Miner. It is not in SPMF but if you contact the main author maybe he can give you the code.
Best,
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 weeks ago

webmasterphilfv

Hi,
For the EFIM algorithm, there is some detailed example in the journal paper:
Zida, S., Fournier-Viger, P., Lin, J. C.-W., Wu, C.-W., Tseng, V.-S. (2017). EFIM: A Fast and Memory Efficient Algorithm for High-Utility Itemset Mining . Knowledge and Information Systems (KAIS), Springer, 51(2), 595-625
http://philippe-fournier-viger.com/EFIM_JOURNAL_VERSION%20KAIS%202016.pdf
For the EFI

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 weeks ago

webmasterphilfv

To calculate the accuracy, what you need to do is do a loop where you try to predict the class of multiple records (instances).
You need to count how many records are correctly predicted.
Then you divide the number of correctly predicted record by the total number of records used for making predictions.
So you just need to write this in Java to get the accuracy.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 weeks ago

webmasterphilfv

You mean how complicated?
It can be quite complicated since data mining software typically offers many algorithms and features. For example, I am the founder of the SPMF data mining software. It took a few years of work to develop that software. But if someone has more programmers, it could be done faster.
But in real life, you don't need to implement all algorithms. For example, if you wa

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 weeks ago

webmasterphilfv

Hi,
Glad you like the tool.
In terms of file format, the format is explained in the documentation for each algorithm. Moreover, the ARFF format, which is used by some other data mining software is also supported for some algorithms.
However, in general, there is a strict file format that must be used. So if you want to use some algorithm, you likely need to transform your data to the prop

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 weeks ago

webmasterphilfv

Hi,
It really depends. There are some good reasons to say that the lift is better than the confidence. But there are also some cases where the lift is not the best measure and some other measures could be used. In fact, each measures has some cases where it works well and some cases where it does not work well. If you want to know more about this you can read section 6.7 of this chapter:

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 weeks ago

webmasterphilfv

Hello,
I am not sure if it does not work. Maybe you can send your whole dataset to my e-mail : philfv8 AT yahoo.com
Because, in the above example, I cannot see anything wrong. Actually, the pattern 9 -1 9 -1 9 92 -1 #SUP: 34848 does not seem to appear in the sequences that you have shown to me.
By the way, to more easily check the results, you can set the parameter "Show sequences i

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 weeks ago

webmasterphilfv

Hello,
I have tried with the above file and minutil = 0.1 and I get the following results:
1 #UTIL: 1.54
1 2 #UTIL: 1.3859999
1 2 3 #UTIL: 1.694
1 2 3 4 #UTIL: 1.8479998
1 2 3 4 5 #UTIL: 1.0009998
1 2 3 5 #UTIL: 0.9239999
1 2 4 #UTIL: 1.6169999
1 2 4 5 #UTIL: 1.2319999
1 2 5 #UTIL: 1.078
1 3 #UTIL: 1.54
1 3 4 #UTIL: 1.694
1 3 4 5 #UTIL: 0.924
1 3 5 #UTIL: 0.847
1 4 #UTIL: 1.8479

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

8 weeks ago

webmasterphilfv

Hello,
Thanks for the comment. I have fixed the code for you. I have added a new version of FHM called FHM(float), which can process datasets with float values. You can download the code from the SPFM website.
If you are using the command line or user interface, the algorithm is called "FHM(float)".
If you want to look at the code, it is located in the package:
ca.pfv.spmf.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum