This forum is about

Results 1 - 30 of 1436

Yesterday

webmasterphilfv

Hello,
I have tried with the above file and minutil = 0.1 and I get the following results:
1 #UTIL: 1.54
1 2 #UTIL: 1.3859999
1 2 3 #UTIL: 1.694
1 2 3 4 #UTIL: 1.8479998
1 2 3 4 5 #UTIL: 1.0009998
1 2 3 5 #UTIL: 0.9239999
1 2 4 #UTIL: 1.6169999
1 2 4 5 #UTIL: 1.2319999
1 2 5 #UTIL: 1.078
1 3 #UTIL: 1.54
1 3 4 #UTIL: 1.694
1 3 4 5 #UTIL: 0.924
1 3 5 #UTIL: 0.847
1 4 #UTIL: 1.8479

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

2 days ago

webmasterphilfv

Hello,
Thanks for the comment. I have fixed the code for you. I have added a new version of FHM called FHM(float), which can process datasets with float values. You can download the code from the SPFM website.
If you are using the command line or user interface, the algorithm is called "FHM(float)".
If you want to look at the code, it is located in the package:
ca.pfv.spmf.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

5 days ago

webmasterphilfv

Hi Greg,
I think that what you want to do is "frequent pattern mining". It is a classical problem in data mining, where you have many lines which contains items (numbers). Then we try to find these sets of items that appears in many lines.
To do that, you can try the FPGrowth algorithm offered in SPMF.
It will let you specify a minimum support threshold as parameter. This indica

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

9 days ago

webmasterphilfv

You can get the ID3 source code here:
http://www.philippe-fournier-viger.com/spmf/index.php

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

10 days ago

webmasterphilfv

You could modify EFIM to mine correlated high utility itemsets, similar to this:
Fournier-Viger, P., Lin, C. W., Dinh, T., Le, H. B. (2016). Mining Correlated High-Utility Itemsets Using the Bond Measure. Proc. 11 th International Conference on Hybrid Artificial Intelligence Systems (HAIS 2016), Springer LNAI, pp.53-65
You could modify EFIM to mine minimal high utility itemsets, similar to

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

15 days ago

webmasterphilfv

Yes, in high utility itemset mining, that answer is correct.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

17 days ago

webmasterphilfv

A first thing that you could do is to normalize these values since the minimum and maximum is different for each attribute (some range from 0 to 10 and others from 0 to 100. So putting all these attributes to values between 0 to 100 or 0 to 1 would be a good first step.
Then, after that it depends on your goal and how you will evaluate the result.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

17 days ago

webmasterphilfv

So you have 8 numerical scores given by 8 agencies and each of these scores are computed based on different criteria. Some of these criteria may be more important than others for some users of your system. Let's say that you design a system that compute the "best" website based on these numerical scores, then how will you validate that your system has truly shown you the best websites?

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

17 days ago

webmasterphilfv

Still, "the best website" is something very subjective. The best website for me may not be the "best website for you". Typically, when you use a search engine like Google, the "best website" is the one that answer your search query, which represents what you are looking for at this exact moment. Answering your query can be done by analyzing the keywords in your sear

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

20 days ago

webmasterphilfv

There is a lot of kinds of patterns that you can find in a database. It would be too long to list all of them. For example:
- frequent itemsets
- closed frequent itemsets
- maximal frequent itemsets
- rare itemsets
- perfectly rare itemsets
- periodic itemsets
- association rules
- negative association rules
- generator itemsets
....
etc.
Making a list of all kinds of patterns that y

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

22 days ago

webmasterphilfv

I do not know any algorithm that do the average support. it would be possible to do that.
Actually, there are a lot of possibilities for research. You can either create new measures, new optimizations or new algorithm. Or you can combine two topics to create a new topic. For example, you can combine:
high utility sequential rule mining + negative pattern mining = negative high utility seque

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

22 days ago

webmasterphilfv

Hello,
There are a lot of patterns in a database. To select the patterns, we need to use some measures to decide whether the patterns are interesting or not.
In high utility itemset mining, the interestingness measure is the utility. The assumption is that if an itemset has a high utility (makes a lot of money), then it is interesting for the user.
Now, you could always combine several m

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

22 days ago

webmasterphilfv

Besides the support, there exists many other measures such as the lift, leverage, bond, etc. to assess whether an itemset or pattern is interesting or not.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

23 days ago

webmasterphilfv

Hi
> I understand now. To be be precise we need to have a internal utility and external utility table to calculate the overall profit value of an item.
Yes. Exactly.
> 1) What is the purpose of the rule or what we can achieve by calculating the rules.
In general, the purpose of discovering patterns in databases is to find some patterns that can be useful to understand the data

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

23 days ago

webmasterphilfv

I think you are confusing some definitions in the paper. The table 2 does not give the utility of items.
In high utility pattern mining, there are three concepts that you should not confuse:
- the internal utility (or purchase quantity - the number of units of an item that were purchased in a transaction)
- the external utility (or unit profit - amount of profit generated by the sale of one

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

24 days ago

webmasterphilfv

In high utility itemset mining, there are usually two types of numeric values : purchased quantities and unit profit values. For example, a customer can buy 3 apples, and each apple yield 5 $ profit.
In your example database, you have a single table. So I assume that each item is associate with some amount of money for example. So your transaction:
T1: A (10) B(20) C(10) D (15)
means th

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

27 days ago

webmasterphilfv

Hello,
You can get it in the SPMF library (link above). It offers EFIM and other good algorithms for utility mining such as UPGrowth, HUI-Miner, FHM, IHUP, USpan, etc.
Best regards,
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

27 days ago

webmasterphilfv

Hi Stephen,
Glad to know that SPMF is included in a bundle to produce something useful, and glad you like it. Currently, in SPMF, there is no option for exporting results to CSV, so I guess that the best solution would be to write a small program using your favorite programming language to read the output file of SPMF and generate a CSV file in the format that you prefer, from it.
But we n

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

4 weeks ago

webmasterphilfv

Hello,
The dataset above is in the SPMF format. This format is used by the SPMF library and is defined as follows:
The input file format is defined as follows. It is a text file where each line represents a sequence from a sequence database. Each item from a sequence is a positive integer and items from the same itemset within a sequence are separated by single space. Note that it is assu

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

5 weeks ago

webmasterphilfv

Hi all,
This is to let you know that a new book will be published next year about high utility pattern mining, by Springer. I am the main editor of that book and we are currently looking for chapter proposals. It means that if you would like to write a chapter in the book, you may submit a proposal before the deadline (1st October). More details below:
Philippe
========================

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

5 weeks ago

webmasterphilfv

Hello all,
The DSPR (Data Science and Pattern Recognition - http://dspr.ikelab.net/ ) journal is looking for paper for the next issue to be published next month.
I am one of the editor-in-chief. If you are interested to submit you may contact with me directly ( philfv8 AT yahoo.com ).
The review time is very fast. Also, for those in India, if you need some certificate of pubications, we s

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 weeks ago

webmasterphilfv

Here are a few possible topics:
- incremental sequential rule mining
- negative sequential rule mining
- ...
The most important is to find a topic that you like and are interested in. Even if I suggest some topics, you should actually find something that you like by reading recent conferences/journal on data mining.

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 weeks ago

webmasterphilfv

Hello,
It depends on how you set the parameters of CPT. If we want to predict strictly what appears after <A,B,C>, then, yes we don't have training data to make any predictions. However, in the CPT and CPT+ models, there are some strategies to deal with noise and remove items that prevent us from doing a prediction to be more noise tolerant.
In the CPT model, the strategy is called

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

6 weeks ago

webmasterphilfv

Dear Hiro,
Interesting. These results are as I thought they would be for AprioriTID (faster but use more memory than AprioriTID).
I am happy to know that you are satisfied by the results.
Best regards,
Philippe

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 weeks ago

webmasterphilfv

Dear Hiro,
Glad that it works well on your data.
In general, I would expect AprioriTID to be faster than the regular Apriori. Whether AprioriTID is faster than Apriori depends mostly on whether your data is sparse or dense. Let me explain this with an example.
Let's say that we have a pattern X. To calculate the support of X, the Apriori algorithm will scan the database and compare ea

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 weeks ago

webmasterphilfv

Dear Hiro,
I see. This is great. I have thus implemented the feature for you today, and uploaded a new version of the SPMF software on the website. If you download it again from the download page, you will get the new version.
It includes two new algorithms: AprioriRare_TID and AprioriInverse_TID
For both of them, I have added the option of showing the transaction identifiers.
Note th

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 weeks ago

webmasterphilfv

I have recently written a blog post on my blog to explain the Apriori algorithm in a simple way.
New post: Introduction to the Apriori algorithm (with Java code)
You may want to check it out!

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum

7 weeks ago

webmasterphilfv

Hello,
Actually, there are two main types of Apriori algorithms: those based on Apriori, which scan the database to calculate the support of patterns, and those that based on AprioriTID, which keep the transaction identifiers in memory to avoid scanning the database.
CORI is an AprioriTID based algorithm. Since it is based on AprioriTID, it keeps the transaction IDs of each pattern in memo

Forum: The Data Mining / Big Data Forum

Forum: The Data Mining / Big Data Forum