The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining book
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
FHM for float utilities
Posted by: Mohammad S Alodadi
Date: September 23, 2017 11:36PM

Dear all,
I am trying to run the FHM algo from source code. However, my dataset contains utilities which are float data types while the algorithm is designed to work with integers only.
I tried to fix the datatypes inside the algorithm, however, it looks I have to fix so many variables and functions in several dependencies to deal with float data types.
is there any solution for dealing with utilities of the data type float.

Thanks

Options: ReplyQuote
Re: FHM for float utilities
Date: September 24, 2017 12:38AM

Hello,

Thanks for the comment. I have fixed the code for you. I have added a new version of FHM called FHM(float), which can process datasets with float values. You can download the code from the SPFM website.

If you are using the command line or user interface, the algorithm is called "FHM(float)".

If you want to look at the code, it is located in the package:
ca.pfv.spmf.algorithms.frequentpatterns.hui_miner_float

To run that algorithm from the code, there is an example called MainTestFHM_float.java

And there is an input file called "DB_UtilityFloat.txt" for testing:

3 5 1 2 4 6:30.2:1.2 3.0 5.0 10.0 6.0 5.0
3 5 2 4:20.2:3.2 3.0 8.0 6.0
3 1 4:8.3:1.1 5.1 2.1
3 5 1 7:27.0:6.0 6.0 10.0 5.0
3 5 2 7:11.0:2.0 3.0 4.0 2.0

And the output will look like this:
6 4 2 1 5 3 #UTIL: 30.2
4 2 #UTIL: 30.0
4 2 5 #UTIL: 36.0
4 2 5 3 #UTIL: 40.4
4 2 3 #UTIL: 34.4
2 5 #UTIL: 31.0
2 5 3 #UTIL: 37.4
1 5 3 #UTIL: 31.2

If you have any other suggestions, you can let me know. This one was actually not difficult to implement. Just took about 30 min to do it because I am familiar with the code ;-)

Best regards and thanks for using SPMF!

Philippe



Edited 4 time(s). Last edit at 09/24/2017 12:45AM by webmasterphilfv.

Options: ReplyQuote
Re: FHM for float utilities
Posted by: Mohammad S. Alodadi
Date: September 24, 2017 07:33AM

Thank you very much.
I will let you know if there is anything not working

Options: ReplyQuote
Re: FHM for float utilities
Posted by: Mohammad S Alodadi
Date: September 24, 2017 08:46PM

Hi,
Thanks again for the time you took to edit the algorithm
Your float edition was working great. except my data has the item utilities and transaction utilities all float. while your code accepts the twu as float and the item as int.
a sample data I have is:
1 2 3 4:0.215:0.385 0.077 0.385 0.077
2 4:0.077:0.077 0.077
1 4:0.215:0.385 0.077
1 2 4 5:0.144:0.385 0.077 0.077 0.077
1 2 3 4 5:0.187:0.385 0.077 0.385 0.077 0.077
2 3 5:0.167:0.077 0.385 0.077

Options: ReplyQuote
Re: FHM for float utilities
Date: September 25, 2017 01:34AM

Hello,

I have tried with the above file and minutil = 0.1 and I get the following results:

1 #UTIL: 1.54
1 2 #UTIL: 1.3859999
1 2 3 #UTIL: 1.694
1 2 3 4 #UTIL: 1.8479998
1 2 3 4 5 #UTIL: 1.0009998
1 2 3 5 #UTIL: 0.9239999
1 2 4 #UTIL: 1.6169999
1 2 4 5 #UTIL: 1.2319999
1 2 5 #UTIL: 1.078
1 3 #UTIL: 1.54
1 3 4 #UTIL: 1.694
1 3 4 5 #UTIL: 0.924
1 3 5 #UTIL: 0.847
1 4 #UTIL: 1.8479999
1 4 5 #UTIL: 1.078
1 5 #UTIL: 0.92399997
2 #UTIL: 0.385
2 3 #UTIL: 1.3859999
2 3 4 #UTIL: 1.078
2 3 4 5 #UTIL: 0.61599994
2 3 5 #UTIL: 1.078
2 4 #UTIL: 0.616
2 4 5 #UTIL: 0.462
2 5 #UTIL: 0.462
3 #UTIL: 1.155
3 4 #UTIL: 0.92399997
3 4 5 #UTIL: 0.539
3 5 #UTIL: 0.92399997
4 #UTIL: 0.385
4 5 #UTIL: 0.308
5 #UTIL: 0.231

What is not working?

Options: ReplyQuote
Re: FHM for float utilities
Posted by: Mohammad S Alodadi
Date: October 11, 2017 01:31PM

I got it
thank you so much for clarification

I have a question though
I have a data in which the items are strings:



C1507106 C0175730 C2607990 C0340030 C3204189 C2039751 C0745275 C0041281 C0441587 C2036645 C0336630 C0883304 C0085678 C0276477 C0023928:4.1964126516:0.0555555555556 0.0555555555556 0.0555555555556 0.111111111111 0.166666666667 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556
C0752151 C0032326 C3842377 C2607990 C0267998 C0008034 C0236663 C0881863 C0030899 C0439688 C0179740 C0929214 C0585105:3.27592628616:0.111111111111 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.222222222222 0.0555555555556 0.0555555555556 0.111111111111 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556
C0700148 C0600500 C0747635 C0004144 C2607990 C0032285 C1261077 C3828490 C0015967 C0020649 C0189859:2.66968471729:0.0769230769231 0.153846153846 0.0769230769231 0.0769230769231 0.0769230769231 0.0769230769231 0.0769230769231 0.0769230769231 0.153846153846 0.0769230769231 0.0769230769231
C0332620 C0039985 C0600500 C0004144 C2607990 C2071404 C0013687 C2072938 C0032285 C0702116 C0881863 C0344329 C0034065 C0031238 C2129412 C0239015 C0024117 C0746221 C1299393:3.82331490829:0.0869565217391 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696 0.0869565217391 0.0434782608696 0.0869565217391 0.0434782608696 0.0869565217391 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696
C0343401 C0919865 C2607990 C0243026 C0881863 C0063258 C0459830 C2109206:3.66158001107:0.111111111111 0.111111111111 0.111111111111 0.111111111111 0.111111111111 0.222222222222 0.111111111111 0.111111111111
C2698624 C0743066 C0042029 C3842377 C2607990 C2167162 C0746533 C0015967 C1860224 C0034186 C0332448 C1718474 C0623362:4.23550346462:0.0666666666667 0.0666666666667 0.0666666666667 0.0666666666667 0.0666666666667 0.0666666666667 0.0666666666667 0.133333333333 0.0666666666667 0.0666666666667 0.133333333333 0.0666666666667 0.0666666666667
C0010055 C0032326 C2607990 C0407260 C0647288 C0577703 C0179790 C2022149 C1282959 C1856942:3.50067360382:0.0714285714286 0.0714285714286 0.0714285714286 0.0714285714286 0.0714285714286 0.0714285714286 0.357142857143 0.0714285714286 0.0714285714286 0.0714285714286
C0476273 C0013687 C1522614 C0004144 C2607990 C0546330 C0008034 C0742344 C0546334 C0881863 C0746053 C0009274 C0179790 C0034063 C0940007 C0702116 C0746221:3.51733230266:0.0454545454545 0.0454545454545 0.0454545454545 0.0454545454545 0.0454545454545 0.0909090909091 0.0454545454545 0.136363636364 0.0454545454545 0.0454545454545 0.0454545454545 0.0454545454545 0.0454545454545 0.0909090909091 0.0454545454545 0.0909090909091 0.0454545454545
C2698624 C0700148 C0175730 C2607990 C0444301 C3828490 C0020542 C0007438 C1290339 C0179790 C0018802 C2939313:3.58853979017:0.125 0.0625 0.125 0.0625 0.0625 0.0625 0.125 0.0625 0.0625 0.0625 0.125 0.0625
C0032227 C0032326 C0004144 C0929194 C2607990 C0015252 C0746175 C0008034 C0240860 C0702116 C0189661 C0746053 C2073710 C1857790 C0185792 C2071407 C0856747 C0747635 C0457200:3.54228818431:0.047619047619 0.047619047619 0.142857142857 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619


I tried to fix the code to accept the items as string instead of Integer.
I had some trouble when it comes for comparing items for sort when the code was assuming int not strings.
I had to chage the code of the compareItems to become :

private int compareItems(String item1, String item2) {
		//int compare = (int)( mapItemToTWU.get(item1) - mapItemToTWU.get(item2));
		// if the same, use the lexical order otherwise use the TWU
		//return (compare == 0)? item1 - item2 :  compare;
		if (item1 == null) {
			return -1;
		}
		if (item2 == null) {
			return 1;
		}
		if (item1.equals( item2 )) {
			return 0;
		}
		return item1.compareTo(item2);
	}

considering that I change all instances of items to string, I am not sure if this code is correct but it works for me and it creates results.

the problem is the results are unrealistically huge (High-utility itemsets count: 1235696)
I feel there is something wrong with my modifications that cause this drastic results.

can you please help

Options: ReplyQuote
Re: FHM for float utilities
Date: October 11, 2017 09:29PM

Hello,

About this code:

private int compareItems(String item1, String item2) {
// ...
	}

T there is a reason for using:

int compare = (int)( mapItemToTWU.get(item1) - mapItemToTWU.get(item2));

It is that if we sort the items by ascending order according of TWU, the algorithm will be faster than if you just sort by alphabetical order.

So in your function, your are sorting by alphabetical order (what we call lexicographical order, actually). This should be slower than if you use the ascending order of TWU. So it would be better if your function would look like this:

private int compareItems(String item1, String item2) {
		int compare = (int)( mapItemToTWU.get(item1) - mapItemToTWU.get(item2));
		// if the same, use the lexical order otherwise use the TWU
		return (compare == 0)? item1.compareTo(item2);
	}


Note that i did not compile the code. But this should be the main idea.

About the number of patterns it depends on your data and how you set the minutil threshold. If you set the threshold higher you will get less patterns. And if you set it very low, you could get billions of patterns. It really depends on your data and how you set the parameter.

Best regards,

Options: ReplyQuote
Re: FHM for float utilities
Posted by: Mohammad S Alodadi
Date: October 20, 2017 01:30PM

webmasterphilfv Wrote:
-------------------------------------------------------
> Hello,
>
> I have tried with the above file and minutil = 0.1
> and I get the following results:
>
> 1 #UTIL: 1.54
> 1 2 #UTIL: 1.3859999
> 1 2 3 #UTIL: 1.694
> 1 2 3 4 #UTIL: 1.8479998
> 1 2 3 4 5 #UTIL: 1.0009998
> 1 2 3 5 #UTIL: 0.9239999
> 1 2 4 #UTIL: 1.6169999
> 1 2 4 5 #UTIL: 1.2319999
> 1 2 5 #UTIL: 1.078
> 1 3 #UTIL: 1.54
> 1 3 4 #UTIL: 1.694
> 1 3 4 5 #UTIL: 0.924
> 1 3 5 #UTIL: 0.847
> 1 4 #UTIL: 1.8479999
> 1 4 5 #UTIL: 1.078
> 1 5 #UTIL: 0.92399997
> 2 #UTIL: 0.385
> 2 3 #UTIL: 1.3859999
> 2 3 4 #UTIL: 1.078
> 2 3 4 5 #UTIL: 0.61599994
> 2 3 5 #UTIL: 1.078
> 2 4 #UTIL: 0.616
> 2 4 5 #UTIL: 0.462
> 2 5 #UTIL: 0.462
> 3 #UTIL: 1.155
> 3 4 #UTIL: 0.92399997
> 3 4 5 #UTIL: 0.539
> 3 5 #UTIL: 0.92399997
> 4 #UTIL: 0.385
> 4 5 #UTIL: 0.308
> 5 #UTIL: 0.231
>
> What is not working?

as can be seen in your results some frequent items have utility > 1.0.
However, if I set my min_utiility =1.0 or 1.1 it does not generate any itemsets or candidates.
can you help?

Options: ReplyQuote
Re: FHM for float utilities
Date: October 20, 2017 11:25PM

Yes, the reason is that the transaction utility value in your example file is wrong.

The file that you gave to me is:

1 2 3 4:0.215:0.385 0.077 0.385 0.077
2 4:0.077:0.077 0.077
1 4:0.215:0.385 0.077
1 2 4 5:0.144:0.385 0.077 0.077 0.077
1 2 3 4 5:0.187:0.385 0.077 0.385 0.077 0.077
2 3 5:0.167:0.077 0.385 0.077

Consider the first transaction:

1 2 3 4:0.215:0.385 0.077 0.385 0.077

The following value (in bold) should be the sum of the utility values.

1 2 3 4:0.215:0.385 0.077 0.385 0.077

That means, that you should replace

0.215 by the sum 0.385 + 0.077 + 0.385 + 0.077

Thus, this transaction should be correctly written as:

1 2 3 4:0.924:0.385 0.077 0.385 0.077

The other transactions should be fixed in the same way.

If you fix all the transactions like that, then it should work. The reason is that the value in bold is used by the algorithm to reduce the search space. If the value is incorrect and too small, then the algorithm can show no result.

Options: ReplyQuote
Re: FHM for float utilities
Posted by: Mohammad S. Alodadi
Date: October 21, 2017 02:31AM

Thanks for the response.
Please correct me if I am wrong. But i thought the FHM paper said the transaction utility is the some of products of items profit and quantity which means p(i1)*q(i1)+p(i2)*q(i2)......
In your answer you didn't mention the profit of items.
Thanks for your response in advance

Options: ReplyQuote
Re: FHM for float utilities
Posted by: Mohammad S Alodadi
Date: October 22, 2017 01:50AM

I read the paper and I was wrong about my previous reply.
Thanks for pointing the answer in your post.

Options: ReplyQuote
Re: FHM for float utilities
Date: October 22, 2017 02:33AM

You are welcome.

Best regards,

Philippe

Options: ReplyQuote
Re: FHM for float utilities
Posted by: Mohammad S Alodadi
Date: November 04, 2017 01:08AM

I have a question regarding FHM results.
is it possible to have an itemset that one of its items' utility is not >=minutil.
the reason I asked is that I have these high utility frequent itemsets:
1 2 3 4 #UTIL: 228.80042749643326
1 2 3 4 5 #UTIL: 228.80735443532467

while the second itemset has very small utility difference than the utility of the first itemset. which means that the item number 5 has no big effect. especially, I looked in the entire result set and I could not find item 5 alone along with its utility which means that item 5 has utility <minuntility.

is there an explanation or something that I am missing here.

thanks

Options: ReplyQuote
Re: FHM for float utilities
Date: November 04, 2017 08:41AM

Hello,

Yes, it is possible that the itemset {5} is not a high utility itemsets but that the itemset {1,2,3,4,5} is a high utility itemset.

Actually, in general, if you have two itemsets A and B such that A is a subset of B, the utility of A can be lower than, equal or greater than the utility of B. In other words, if A = {5} and B = {1,2,3,4,5}, the utility of B can be greater, lower or equal to the utility of B.

But why? This is a good question. And the answer is a little bit counter-intuitive. The reason is the following:

Normally, if you add more items to an itemset, you could expect that it will cost more money because there is more items. For example, you could expect that {1,2,3,4,5} may yield more profit (utility) because it contains more item than {1,2,3,4}. This is true in some cases.

But, if we add more items to an itemset, the frequency of that itemset can be the same or lower. For example, if you have an itemset {1,2,3,4}, it is possible that the itemset {1,2,3,4,5} appears less often than {1,2,3,4}. Thus, this could decrease the profit yield by {1,2,3,4,5} compared to {1,2,3,4}.

Thus, to summarize, if we add more items, the profit may increase because we add more items, but it may decrease because the larger itemset may be less frequent. Thus, in general, the utility of an itemset A that is a subset of an itemset B can be greater, lower or equal to the utility of B.

In your example, the reason may be that item 5 has a very low utility, and adding it increase the utility, but it also decrease the frequency. These two factors work against each other and as result the utility of 1,2,3,4 is almost the same as 1,2,3,4,5 in that case. And because the item 5 has a low utility, it is not a high utility itemset when it is sold just by itself.



Edited 2 time(s). Last edit at 11/04/2017 08:43AM by webmasterphilfv.

Options: ReplyQuote
Re: FHM for float utilities
Posted by: Mohammad S Alodadi
Date: November 05, 2017 12:00AM

Thanks for the clear explanation.
Quote
webmasterphilfv
Quote
Mohammad S. Alodadi
is it possible to have an itemset that one of its items' utility is not >=minutil.
the reason I asked is that I have these high utility frequent itemsets:
1 2 3 4 #UTIL: 228.80042749643326
1 2 3 4 5 #UTIL: 228.80735443532467
Yes, it is possible that the itemset {5} is not a high utility itemsets but that the itemset {1,2,3,4,5} is a high utility itemset.
please, correct me if I am wrong.
property 3 in the paper stated that:
Property 3 (pruning). Let X be an itemset. If TW U(X) < minutil, then the
itemset X is a low-utility itemset as well as all its supersets. Proof. This directly
follows from Property 1 and Property 2.

which I am still confused why that the itemset {5} which is < minutil have its superset {1,2,3,4,5} >= minutil.

Thanks for your fast responds

Options: ReplyQuote
Re: FHM for float utilities
Date: November 05, 2017 05:17AM

Hello,

I see. You are confusing the TWU measure and the utility measure.

The property 4 is for the TWU measure. It is not for the utility measure.

Thus, the fact that itemset {5} has a utility that is lower than minutil is not sufficient to apply Property 4.

If you want to apply Property 4, the TWU of {5} must be lower than minutil (it is not the utility that must be lower than minutil).

Actually, the TWU is an upper-bound on the utility. This means, that for any itemset X, we have that TWU(X) >= Utility(X). Thus, even if the itemset {5} has a utility lower than minutil, as in your example, it can still have a TWU that is higher than minutil.

Thus, this explains why property 4 is not applied in your example. utility({5}) < minutil but TWU({5}) is greater or equal to minutil. In that case property 4 cannot be applied, and the supersets of {5} can be high utility itemsets even if {5} is not a high utility itemset.


Hope that this is clearer.



Edited 2 time(s). Last edit at 11/05/2017 05:18AM by webmasterphilfv.

Options: ReplyQuote


Your Name: 
Your Email: 
Subject: 
Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically.
 **     **   *******   ********  **     **  **    ** 
 **     **  **     **     **      **   **   **   **  
 **     **  **     **     **       ** **    **  **   
 **     **   ********     **        ***     *****    
 **     **         **     **       ** **    **  **   
 **     **  **     **     **      **   **   **   **  
  *******    *******      **     **     **  **    ** 
This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.