The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining book
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  

Pages: 12345...LastNext
Current Page: 1 of 48
Results 1 - 30 of 1436
Yesterday
webmasterphilfv
Hello, I have tried with the above file and minutil = 0.1 and I get the following results: 1 #UTIL: 1.54 1 2 #UTIL: 1.3859999 1 2 3 #UTIL: 1.694 1 2 3 4 #UTIL: 1.8479998 1 2 3 4 5 #UTIL: 1.0009998 1 2 3 5 #UTIL: 0.9239999 1 2 4 #UTIL: 1.6169999 1 2 4 5 #UTIL: 1.2319999 1 2 5 #UTIL: 1.078 1 3 #UTIL: 1.54 1 3 4 #UTIL: 1.694 1 3 4 5 #UTIL: 0.924 1 3 5 #UTIL: 0.847 1 4 #UTIL: 1.8479
Forum: The Data Mining / Big Data Forum
2 days ago
webmasterphilfv
Hello, Thanks for the comment. I have fixed the code for you. I have added a new version of FHM called FHM(float), which can process datasets with float values. You can download the code from the SPFM website. If you are using the command line or user interface, the algorithm is called "FHM(float)". If you want to look at the code, it is located in the package: ca.pfv.spmf.
Forum: The Data Mining / Big Data Forum
5 days ago
webmasterphilfv
Hi Greg, I think that what you want to do is "frequent pattern mining". It is a classical problem in data mining, where you have many lines which contains items (numbers). Then we try to find these sets of items that appears in many lines. To do that, you can try the FPGrowth algorithm offered in SPMF. It will let you specify a minimum support threshold as parameter. This indica
Forum: The Data Mining / Big Data Forum
9 days ago
webmasterphilfv
You can get the ID3 source code here: http://www.philippe-fournier-viger.com/spmf/index.php
Forum: The Data Mining / Big Data Forum
10 days ago
webmasterphilfv
You could modify EFIM to mine correlated high utility itemsets, similar to this: Fournier-Viger, P., Lin, C. W., Dinh, T., Le, H. B. (2016). Mining Correlated High-Utility Itemsets Using the Bond Measure. Proc. 11 th International Conference on Hybrid Artificial Intelligence Systems (HAIS 2016), Springer LNAI, pp.53-65 You could modify EFIM to mine minimal high utility itemsets, similar to
Forum: The Data Mining / Big Data Forum
15 days ago
webmasterphilfv
Did someone try it?
Forum: The Data Mining / Big Data Forum
15 days ago
webmasterphilfv
Yes, in high utility itemset mining, that answer is correct.
Forum: The Data Mining / Big Data Forum
17 days ago
webmasterphilfv
A first thing that you could do is to normalize these values since the minimum and maximum is different for each attribute (some range from 0 to 10 and others from 0 to 100. So putting all these attributes to values between 0 to 100 or 0 to 1 would be a good first step. Then, after that it depends on your goal and how you will evaluate the result.
Forum: The Data Mining / Big Data Forum
17 days ago
webmasterphilfv
So you have 8 numerical scores given by 8 agencies and each of these scores are computed based on different criteria. Some of these criteria may be more important than others for some users of your system. Let's say that you design a system that compute the "best" website based on these numerical scores, then how will you validate that your system has truly shown you the best websites?
Forum: The Data Mining / Big Data Forum
17 days ago
webmasterphilfv
Still, "the best website" is something very subjective. The best website for me may not be the "best website for you". Typically, when you use a search engine like Google, the "best website" is the one that answer your search query, which represents what you are looking for at this exact moment. Answering your query can be done by analyzing the keywords in your sear
Forum: The Data Mining / Big Data Forum
17 days ago
webmasterphilfv
What do you mean by "best"?
Forum: The Data Mining / Big Data Forum
20 days ago
webmasterphilfv
There is a lot of kinds of patterns that you can find in a database. It would be too long to list all of them. For example: - frequent itemsets - closed frequent itemsets - maximal frequent itemsets - rare itemsets - perfectly rare itemsets - periodic itemsets - association rules - negative association rules - generator itemsets .... etc. Making a list of all kinds of patterns that y
Forum: The Data Mining / Big Data Forum
22 days ago
webmasterphilfv
I do not know any algorithm that do the average support. it would be possible to do that. Actually, there are a lot of possibilities for research. You can either create new measures, new optimizations or new algorithm. Or you can combine two topics to create a new topic. For example, you can combine: high utility sequential rule mining + negative pattern mining = negative high utility seque
Forum: The Data Mining / Big Data Forum
22 days ago
webmasterphilfv
Hello, There are a lot of patterns in a database. To select the patterns, we need to use some measures to decide whether the patterns are interesting or not. In high utility itemset mining, the interestingness measure is the utility. The assumption is that if an itemset has a high utility (makes a lot of money), then it is interesting for the user. Now, you could always combine several m
Forum: The Data Mining / Big Data Forum
22 days ago
webmasterphilfv
Besides the support, there exists many other measures such as the lift, leverage, bond, etc. to assess whether an itemset or pattern is interesting or not.
Forum: The Data Mining / Big Data Forum
23 days ago
webmasterphilfv
Hi > I understand now. To be be precise we need to have a internal utility and external utility table to calculate the overall profit value of an item. Yes. Exactly. > 1) What is the purpose of the rule or what we can achieve by calculating the rules. In general, the purpose of discovering patterns in databases is to find some patterns that can be useful to understand the data
Forum: The Data Mining / Big Data Forum
23 days ago
webmasterphilfv
I think you are confusing some definitions in the paper. The table 2 does not give the utility of items. In high utility pattern mining, there are three concepts that you should not confuse: - the internal utility (or purchase quantity - the number of units of an item that were purchased in a transaction) - the external utility (or unit profit - amount of profit generated by the sale of one
Forum: The Data Mining / Big Data Forum
24 days ago
webmasterphilfv
In high utility itemset mining, there are usually two types of numeric values : purchased quantities and unit profit values. For example, a customer can buy 3 apples, and each apple yield 5 $ profit. In your example database, you have a single table. So I assume that each item is associate with some amount of money for example. So your transaction: T1: A (10) B(20) C(10) D (15) means th
Forum: The Data Mining / Big Data Forum
27 days ago
webmasterphilfv
Hello, You can get it in the SPMF library (link above). It offers EFIM and other good algorithms for utility mining such as UPGrowth, HUI-Miner, FHM, IHUP, USpan, etc. Best regards, Philippe
Forum: The Data Mining / Big Data Forum
27 days ago
webmasterphilfv
Hi Stephen, Glad to know that SPMF is included in a bundle to produce something useful, and glad you like it. Currently, in SPMF, there is no option for exporting results to CSV, so I guess that the best solution would be to write a small program using your favorite programming language to read the output file of SPMF and generate a CSV file in the format that you prefer, from it. But we n
Forum: The Data Mining / Big Data Forum
4 weeks ago
webmasterphilfv
Hello, The dataset above is in the SPMF format. This format is used by the SPMF library and is defined as follows: The input file format is defined as follows. It is a text file where each line represents a sequence from a sequence database. Each item from a sequence is a positive integer and items from the same itemset within a sequence are separated by single space. Note that it is assu
Forum: The Data Mining / Big Data Forum
5 weeks ago
webmasterphilfv
Hi all, This is to let you know that a new book will be published next year about high utility pattern mining, by Springer. I am the main editor of that book and we are currently looking for chapter proposals. It means that if you would like to write a chapter in the book, you may submit a proposal before the deadline (1st October). More details below: Philippe ========================
Forum: The Data Mining / Big Data Forum
5 weeks ago
webmasterphilfv
Hello all, The DSPR (Data Science and Pattern Recognition - http://dspr.ikelab.net/ ) journal is looking for paper for the next issue to be published next month. I am one of the editor-in-chief. If you are interested to submit you may contact with me directly ( philfv8 AT yahoo.com ). The review time is very fast. Also, for those in India, if you need some certificate of pubications, we s
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
Here are a few possible topics: - incremental sequential rule mining - negative sequential rule mining - ... The most important is to find a topic that you like and are interested in. Even if I suggest some topics, you should actually find something that you like by reading recent conferences/journal on data mining.
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
Hello, It depends on how you set the parameters of CPT. If we want to predict strictly what appears after <A,B,C>, then, yes we don't have training data to make any predictions. However, in the CPT and CPT+ models, there are some strategies to deal with noise and remove items that prevent us from doing a prediction to be more noise tolerant. In the CPT model, the strategy is called
Forum: The Data Mining / Big Data Forum
6 weeks ago
webmasterphilfv
Dear Hiro, Interesting. These results are as I thought they would be for AprioriTID (faster but use more memory than AprioriTID). I am happy to know that you are satisfied by the results. Best regards, Philippe
Forum: The Data Mining / Big Data Forum
7 weeks ago
webmasterphilfv
Dear Hiro, Glad that it works well on your data. In general, I would expect AprioriTID to be faster than the regular Apriori. Whether AprioriTID is faster than Apriori depends mostly on whether your data is sparse or dense. Let me explain this with an example. Let's say that we have a pattern X. To calculate the support of X, the Apriori algorithm will scan the database and compare ea
Forum: The Data Mining / Big Data Forum
7 weeks ago
webmasterphilfv
Dear Hiro, I see. This is great. I have thus implemented the feature for you today, and uploaded a new version of the SPMF software on the website. If you download it again from the download page, you will get the new version. It includes two new algorithms: AprioriRare_TID and AprioriInverse_TID For both of them, I have added the option of showing the transaction identifiers. Note th
Forum: The Data Mining / Big Data Forum
7 weeks ago
webmasterphilfv
I have recently written a blog post on my blog to explain the Apriori algorithm in a simple way. New post: Introduction to the Apriori algorithm (with Java code) You may want to check it out!
Forum: The Data Mining / Big Data Forum
7 weeks ago
webmasterphilfv
Hello, Actually, there are two main types of Apriori algorithms: those based on Apriori, which scan the database to calculate the support of patterns, and those that based on AprioriTID, which keep the transaction identifiers in memory to avoid scanning the database. CORI is an AprioriTID based algorithm. Since it is based on AprioriTID, it keeps the transaction IDs of each pattern in memo
Forum: The Data Mining / Big Data Forum
Pages: 12345...LastNext
Current Page: 1 of 48

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.