The Data Mining Forum                             open-source data mining software data science journal data mining conferences machine learning in software engineering MLISE 2021 utility mining workshop at ICDM 2021
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
HUIM Datasets
Posted by: Waheed
Date: June 08, 2021 04:40AM

The pattern of your provided datasets for High Utility Itemsets Mining (connect, Mushroom, Chess, Foodmart etc.) is like as
2 3 4 5:<45>: 5 6 7 8 and in general
<item1_id> <item2_id> ... <item_n_id>:<transaction utility>:<item1_utility> <item2_utility> ... <item_n_utility>


First of all, please correct me if I have wrong understanding about datasets. Secondly, if my understanding is correct, then how can we get the internal and external utilities for each item?

Thanks in advanve.

Options: ReplyQuote
Re: HUIM Datasets
Date: June 09, 2021 08:48AM

Hi,

Yes, the format is:

<item1_id> <item2_id> ... <item_n_id>:<transaction utility>:<item1_utility> <item2_utility> ... <item_n_utility>

So, yes, the internal and external utility values are not explicitly represented in these datasets. The file just indicate the utility values, which arethe products of the internal and external utility values.

If you would like to have the internal and external utility represented explicitly, you would need to look at the original databases that have been used to prepare these datasets. For example, you could find the original Foodmart and Chainstore dataset, which explicitly contains the internal and external utility values. The original Foodmart dataset is a SQL database that was converted to the SPMF format.

The reason why in SPMF we choose the format with the utility is that this format is more general. In fact, the HUIM algorithm can be applied to other kinds of data where items have utility values beside shopping... and it is not necessary to have a table of external utility values that are always fixed. This is an assumption in most HUIM papers, but it is not needed. In fact, most HUIM algorithms can also work if the external utility values are not static.... So using the format of SPMF, any algorithm can work with static or dynamic external utility values. But if we had a separated table for external utility values that is always static ... than the algorithm could only be used with static external utility values! So using the format of SPMF is more general. There is no need to have external utility values that are static in a separated table.

Hope this is clear.

Best regards



Edited 1 time(s). Last edit at 06/09/2021 08:50AM by webmasterphilfv.

Options: ReplyQuote
Re: HUIM Datasets
Posted by: Waheed
Date: June 10, 2021 04:04AM

Thanks for valued response.

Actually, In the following paper I have found the cutoff utility concept which is the product of minSup and external utility. Also, authors justified the external and internal utility values like this "The internal and external utilities of the items are synthetically generated by SPMF toolkit."

I have following two concern about this:-
1) Please let me know where can I find that library which produces external and internal utilities?
2) If we use randomly generated value for external and internal utility then What will be the guarantee that total transaction utility in the database will be same with the produced TU from the enternal and internal utility. Because we will be generating the EU and IU synthetically. Its produced TU would not be same with TU which is already in databases.


Please clear this point, it makes my confused.

Thanks in advance.


https://link.springer.com/chapter/10.1007/978-3-030-16145-3_15

Options: ReplyQuote
Re: HUIM Datasets
Date: June 11, 2021 07:52AM

Waheed Wrote:
-------------------------------------------------------
> Thanks for valued response.
>
> Actually, In the following paper I have found the
> cutoff utility concept which is the product of
> minSup and external utility. Also, authors
> justified the external and internal utility values
> like this "The internal and external utilities of
> the items are synthetically generated by SPMF
> toolkit."
>
> I have following two concern about this:-
> 1) Please let me know where can I find that
> library which produces external and internal
> utilities?

The SPMF library is here: http://www.philippe-fournier-viger.com/spmf/

The tool for generating utility values in SPMF generates random utility values and does not explicitly generate the external utility and the internal utility. The tool just put directly utility values generated using a random number generator.

This is the documentation for that tool:
http://www.philippe-fournier-viger.com/spmf/Generating_synthetic_utility_values.php

> 2) If we use randomly generated value for external
> and internal utility then What will be the
> guarantee that total transaction utility in the
> database will be same with the produced TU from
> the enternal and internal utility. Because we will
> be generating the EU and IU synthetically. Its
> produced TU would not be same with TU which is
> already in databases.

The tool in SPMF is used to add utility values to a database that has no utility values yet. So because there is no utility values, then there is no conflict or problem with using this tool to add utility values. The tool will generate utility values and then for each transaction, the TU value is just the sum of the utility values in the transaction.

In my papers, I always use the datasets from the SPMF library. But in the paper that you mention, I am not the main author. I am instead a co-author that collaborated on the project. I am not sure about the details of how the experiments were carried in that paper. And I forgot.

>
>
> Please clear this point, it makes my confused.
>
> Thanks in advance.
>
>
> https://link.springer.com/chapter/10.1007/978-3-03
> 0-16145-3_15

Hope that this is more clear

Options: ReplyQuote
Re: HUIM Datasets
Posted by: i201606
Date: June 13, 2021 08:08PM

Yes clear. Thank you so much.



Edited 1 time(s). Last edit at 06/13/2021 08:15PM by i201606.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.