No registration is required to post in this forum!

Results 1 - 30 of 1469

15 days ago

webmasterphilfv

Hello,
Thanks for using SPMF.
The traditional algorithm for generating association rules from itemset is implemented in SPMF. It is called AlgoAgrawalFaster94 in the source code, and is based on the paper of Agrawal for the Apriori algorithm. However, this code is designed to be applied with frequent itemsets instead of high utility itemsets.
One could think that it is easy to apply the

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

22 days ago

webmasterphilfv

Hello, Thanks for the feedback. Maybe you can send me your input file to my e-mail : philfv8 AT yahoo DOT com and I will investigate the problem. And are you are using the graphical interface or the source code of SPMF?
Best regards,
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

22 days ago

webmasterphilfv

The input format is explained in the documentation on the website and there is an example input file for each algorithm.
If you think that the result is wrong, you may send me the file to my e-mail : philfv8 AT yahoo DOT COM and let me know the parameters that you use and why you think that the result is wrong.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

4 weeks ago

webmasterphilfv

Sorry for the delay to answer your questions. I am currently travelling to attend an international conference and have been busy this week. My answers are below:
>I noticed that there is lift in SPMF's CMDeo algorithm but not ERMiner. Is there any reason for this?
Yes, the reason is that calculating the lift requires additional information that is not required for calculating the confide

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 weeks ago

webmasterphilfv

Fpgrowth is much faster than Apriori. You should use FpGrowth.
You can find the implementation in SPMF.
Best regards,

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

8 weeks ago

webmasterphilfv

> Re: A database has four transactions with min support=60% and min confidence=80%.if it is given in percentage,then what will be the min support count?
support count = 60 % * 4 transactions = 3 transactions (because we will round up)

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

Hello,
In general, clustering is an unsupervised type of data mining technique. This means that you don't need training and testing data.
You can just apply some clustering algorithms on some data to find clusters directly. Then how to evaluate these clusters? There are several ways:
1) you could ask some experts to look at your clusters visually to see if they make sense
2) you could us

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

Hello,
In general, in itemset mining, there is no order between the items in an itemset. Thus, {a,b,c} and {a,c,b} are the same itemset.
The PFPM algorithm is thus designed with that assumption that there is no order in an itemset.
You could change that. But that would require some programming. If you want to do that, you should read the code or make sure that you understand the algorith

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

In my opinion, it makes more sense to round up to 2 (take the ceiling of the number) because we do not want to accept something below the minimum support. If you round to 1 then you will accept patterns that do not satisfy the minimum support. This would not be good.
So this is how it is implemented in the SPMF software. In other data mining software, it may be implemented in some other ways.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

10. Re: datasets

Hello,
On the dataset page of the website, you should only use the datasets provided in the subsection"Datasets for Sequential Pattern Mining / Sequential Rule Mining / Sequence Prediction" for the sequential rule mining algorithms.
The Chess and Mushroom dataset are in the category for itemset mining and association rule mining, which does not have the -1 and -2 as you have observ

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

I am not so sure about how to run it. Maybe you can send an e-mail to Prof. Zhihong Deng to ask about how to run his code.
By the way, I converted his code of PrePost to Java. It is available in SPMF. You can also try that version if you want to use Java.
Best regards,

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

Hello,
I see. You are confusing the TWU measure and the utility measure.
The property 4 is for the TWU measure. It is not for the utility measure.
Thus, the fact that itemset {5} has a utility that is lower than minutil is not sufficient to apply Property 4.
If you want to apply Property 4, the TWU of {5} must be lower than minutil (it is not the utility that must be lower than minu

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

You could look at the MOA software for mining data streams. It has a stream dataset generator and maybe there are some datasets on the website.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

Hello,
Yes, it is possible that the itemset {5} is not a high utility itemsets but that the itemset {1,2,3,4,5} is a high utility itemset.
Actually, in general, if you have two itemsets A and B such that A is a subset of B, the utility of A can be lower than, equal or greater than the utility of B. In other words, if A = {5} and B = {1,2,3,4,5}, the utility of B can be greater, lower or eq

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

Thanks for sharing. All these things that can be done with deep learning are quite impressive. It is a very interesting field of research ;-)

Forum: The Artificial Intelligence Forum

Forum: The Artificial Intelligence Forum

2 months ago

webmasterphilfv

Hello,
I am not sure exactly for this case. But you could have a look at the topic of "change detection / concept drift" in data streams. In data stream mining, there are various algorithms in pattern mining that attempts to detect whether there is some significant change between two time windows. Maybe that you could find some ideas by looking at this topic.
Best regards,
Phi

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

You are welcome.
Best regards,
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

Hello,
- Current version of GoKrimp does not work for a sequence of itemsets but for list of items.
Does it mean that 1 -1 1 2 3 -1 1 3 -1 4 -1 3 6 -1 -2 ISNT supported, but 1 -1 123 -1 13 -1 4 -1 36 -1 -2 IS OK??
I will explain what this means. A sequence can be viewed as a sequence of events. For example, a sequence 1 -1 2 3 -1 4 -1 -2 means that the event 1 appeared, was followed by e

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

Yes, the reason is that the transaction utility value in your example file is wrong.
The file that you gave to me is:
1 2 3 4:0.215:0.385 0.077 0.385 0.077
2 4:0.077:0.077 0.077
1 4:0.215:0.385 0.077
1 2 4 5:0.144:0.385 0.077 0.077 0.077
1 2 3 4 5:0.187:0.385 0.077 0.385 0.077 0.077
2 3 5:0.167:0.077 0.385 0.077
Consider the first transaction:
1 2 3 4:0.215:0.385 0.077 0.385

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

Hi,
The length constraint has not been implemented for these algorithms. But it has been implemented for the CM-SPAM algorithm, which takes the same input and produce the same output as GSP/Spade/Spam. So the easiest solution would be to use CM-SPAM which have these features already and should be faster than those algorithms.
Best,
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 months ago

webmasterphilfv

Yes, the name is not clearly mentioned in the documentation but there is an example:
http://www.philippe-fournier-viger.com/spmf/HirateYamana.php
You will find all the details about the input format and how to use that algorithm there!

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

3 months ago

webmasterphilfv

Hello,
About this code:
private int compareItems(String item1, String item2) {
// ...
}
T there is a reason for using:
int compare = (int)( mapItemToTWU.get(item1) - mapItemToTWU.get(item2));
It is that if we sort the items by ascending order according of TWU, the algorithm will be faster than if you just sort by alphabetical order.
So in your function, your are sorting by al

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

3 months ago

webmasterphilfv

You can download a simple and fast Java implementation of K-Means from the SPMF open source data mining library:
http://www.philippe-fournier-viger.com/spmf/

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

3 months ago

webmasterphilfv

Hi Tarun,
You may have discovered a bug!
Could you please send me your input file with the algorithm named and parameters that you have used to my e-mail? philfv8 AT yahoo DOT com
Then, I will try to see what is the problem. It seems like a bug. Then, i will fix it.
Best,
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

3 months ago

webmasterphilfv

You can get the code here:
http://www.philippe-fournier-viger.com/spmf/

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

3 months ago

webmasterphilfv

You are welcome. Glad that it is faster than CHUI-Miner. By the way, there is also an algorithm called CLS-Miner which is an improvement of CHUI-Miner. It is not in SPMF but if you contact the main author maybe he can give you the code.
Best,
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

3 months ago

webmasterphilfv

Hi,
For the EFIM algorithm, there is some detailed example in the journal paper:
Zida, S., Fournier-Viger, P., Lin, J. C.-W., Wu, C.-W., Tseng, V.-S. (2017). EFIM: A Fast and Memory Efficient Algorithm for High-Utility Itemset Mining . Knowledge and Information Systems (KAIS), Springer, 51(2), 595-625
http://philippe-fournier-viger.com/EFIM_JOURNAL_VERSION%20KAIS%202016.pdf
For the EFI

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

3 months ago

webmasterphilfv

To calculate the accuracy, what you need to do is do a loop where you try to predict the class of multiple records (instances).
You need to count how many records are correctly predicted.
Then you divide the number of correctly predicted record by the total number of records used for making predictions.
So you just need to write this in Java to get the accuracy.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

3 months ago

webmasterphilfv

You mean how complicated?
It can be quite complicated since data mining software typically offers many algorithms and features. For example, I am the founder of the SPMF data mining software. It took a few years of work to develop that software. But if someone has more programmers, it could be done faster.
But in real life, you don't need to implement all algorithms. For example, if you wa

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum