The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining book
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
FHM for float utilities
Posted by: Mohammad S Alodadi
Date: September 23, 2017 11:36PM

Dear all,
I am trying to run the FHM algo from source code. However, my dataset contains utilities which are float data types while the algorithm is designed to work with integers only.
I tried to fix the datatypes inside the algorithm, however, it looks I have to fix so many variables and functions in several dependencies to deal with float data types.
is there any solution for dealing with utilities of the data type float.

Thanks

Options: ReplyQuote
Re: FHM for float utilities
Date: September 24, 2017 12:38AM

Hello,

Thanks for the comment. I have fixed the code for you. I have added a new version of FHM called FHM(float), which can process datasets with float values. You can download the code from the SPFM website.

If you are using the command line or user interface, the algorithm is called "FHM(float)".

If you want to look at the code, it is located in the package:
ca.pfv.spmf.algorithms.frequentpatterns.hui_miner_float

To run that algorithm from the code, there is an example called MainTestFHM_float.java

And there is an input file called "DB_UtilityFloat.txt" for testing:

3 5 1 2 4 6:30.2:1.2 3.0 5.0 10.0 6.0 5.0
3 5 2 4:20.2:3.2 3.0 8.0 6.0
3 1 4:8.3:1.1 5.1 2.1
3 5 1 7:27.0:6.0 6.0 10.0 5.0
3 5 2 7:11.0:2.0 3.0 4.0 2.0

And the output will look like this:
6 4 2 1 5 3 #UTIL: 30.2
4 2 #UTIL: 30.0
4 2 5 #UTIL: 36.0
4 2 5 3 #UTIL: 40.4
4 2 3 #UTIL: 34.4
2 5 #UTIL: 31.0
2 5 3 #UTIL: 37.4
1 5 3 #UTIL: 31.2

If you have any other suggestions, you can let me know. This one was actually not difficult to implement. Just took about 30 min to do it because I am familiar with the code ;-)

Best regards and thanks for using SPMF!

Philippe



Edited 4 time(s). Last edit at 09/24/2017 12:45AM by webmasterphilfv.

Options: ReplyQuote
Re: FHM for float utilities
Posted by: Mohammad S. Alodadi
Date: September 24, 2017 07:33AM

Thank you very much.
I will let you know if there is anything not working

Options: ReplyQuote
Re: FHM for float utilities
Posted by: Mohammad S Alodadi
Date: September 24, 2017 08:46PM

Hi,
Thanks again for the time you took to edit the algorithm
Your float edition was working great. except my data has the item utilities and transaction utilities all float. while your code accepts the twu as float and the item as int.
a sample data I have is:
1 2 3 4:0.215:0.385 0.077 0.385 0.077
2 4:0.077:0.077 0.077
1 4:0.215:0.385 0.077
1 2 4 5:0.144:0.385 0.077 0.077 0.077
1 2 3 4 5:0.187:0.385 0.077 0.385 0.077 0.077
2 3 5:0.167:0.077 0.385 0.077

Options: ReplyQuote
Re: FHM for float utilities
Date: September 25, 2017 01:34AM

Hello,

I have tried with the above file and minutil = 0.1 and I get the following results:

1 #UTIL: 1.54
1 2 #UTIL: 1.3859999
1 2 3 #UTIL: 1.694
1 2 3 4 #UTIL: 1.8479998
1 2 3 4 5 #UTIL: 1.0009998
1 2 3 5 #UTIL: 0.9239999
1 2 4 #UTIL: 1.6169999
1 2 4 5 #UTIL: 1.2319999
1 2 5 #UTIL: 1.078
1 3 #UTIL: 1.54
1 3 4 #UTIL: 1.694
1 3 4 5 #UTIL: 0.924
1 3 5 #UTIL: 0.847
1 4 #UTIL: 1.8479999
1 4 5 #UTIL: 1.078
1 5 #UTIL: 0.92399997
2 #UTIL: 0.385
2 3 #UTIL: 1.3859999
2 3 4 #UTIL: 1.078
2 3 4 5 #UTIL: 0.61599994
2 3 5 #UTIL: 1.078
2 4 #UTIL: 0.616
2 4 5 #UTIL: 0.462
2 5 #UTIL: 0.462
3 #UTIL: 1.155
3 4 #UTIL: 0.92399997
3 4 5 #UTIL: 0.539
3 5 #UTIL: 0.92399997
4 #UTIL: 0.385
4 5 #UTIL: 0.308
5 #UTIL: 0.231

What is not working?

Options: ReplyQuote
Re: FHM for float utilities
Posted by: Mohammad S Alodadi
Date: October 11, 2017 01:31PM

I got it
thank you so much for clarification

I have a question though
I have a data in which the items are strings:



C1507106 C0175730 C2607990 C0340030 C3204189 C2039751 C0745275 C0041281 C0441587 C2036645 C0336630 C0883304 C0085678 C0276477 C0023928:4.1964126516:0.0555555555556 0.0555555555556 0.0555555555556 0.111111111111 0.166666666667 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556
C0752151 C0032326 C3842377 C2607990 C0267998 C0008034 C0236663 C0881863 C0030899 C0439688 C0179740 C0929214 C0585105:3.27592628616:0.111111111111 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556 0.222222222222 0.0555555555556 0.0555555555556 0.111111111111 0.0555555555556 0.0555555555556 0.0555555555556 0.0555555555556
C0700148 C0600500 C0747635 C0004144 C2607990 C0032285 C1261077 C3828490 C0015967 C0020649 C0189859:2.66968471729:0.0769230769231 0.153846153846 0.0769230769231 0.0769230769231 0.0769230769231 0.0769230769231 0.0769230769231 0.0769230769231 0.153846153846 0.0769230769231 0.0769230769231
C0332620 C0039985 C0600500 C0004144 C2607990 C2071404 C0013687 C2072938 C0032285 C0702116 C0881863 C0344329 C0034065 C0031238 C2129412 C0239015 C0024117 C0746221 C1299393:3.82331490829:0.0869565217391 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696 0.0869565217391 0.0434782608696 0.0869565217391 0.0434782608696 0.0869565217391 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696 0.0434782608696
C0343401 C0919865 C2607990 C0243026 C0881863 C0063258 C0459830 C2109206:3.66158001107:0.111111111111 0.111111111111 0.111111111111 0.111111111111 0.111111111111 0.222222222222 0.111111111111 0.111111111111
C2698624 C0743066 C0042029 C3842377 C2607990 C2167162 C0746533 C0015967 C1860224 C0034186 C0332448 C1718474 C0623362:4.23550346462:0.0666666666667 0.0666666666667 0.0666666666667 0.0666666666667 0.0666666666667 0.0666666666667 0.0666666666667 0.133333333333 0.0666666666667 0.0666666666667 0.133333333333 0.0666666666667 0.0666666666667
C0010055 C0032326 C2607990 C0407260 C0647288 C0577703 C0179790 C2022149 C1282959 C1856942:3.50067360382:0.0714285714286 0.0714285714286 0.0714285714286 0.0714285714286 0.0714285714286 0.0714285714286 0.357142857143 0.0714285714286 0.0714285714286 0.0714285714286
C0476273 C0013687 C1522614 C0004144 C2607990 C0546330 C0008034 C0742344 C0546334 C0881863 C0746053 C0009274 C0179790 C0034063 C0940007 C0702116 C0746221:3.51733230266:0.0454545454545 0.0454545454545 0.0454545454545 0.0454545454545 0.0454545454545 0.0909090909091 0.0454545454545 0.136363636364 0.0454545454545 0.0454545454545 0.0454545454545 0.0454545454545 0.0454545454545 0.0909090909091 0.0454545454545 0.0909090909091 0.0454545454545
C2698624 C0700148 C0175730 C2607990 C0444301 C3828490 C0020542 C0007438 C1290339 C0179790 C0018802 C2939313:3.58853979017:0.125 0.0625 0.125 0.0625 0.0625 0.0625 0.125 0.0625 0.0625 0.0625 0.125 0.0625
C0032227 C0032326 C0004144 C0929194 C2607990 C0015252 C0746175 C0008034 C0240860 C0702116 C0189661 C0746053 C2073710 C1857790 C0185792 C2071407 C0856747 C0747635 C0457200:3.54228818431:0.047619047619 0.047619047619 0.142857142857 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619 0.047619047619


I tried to fix the code to accept the items as string instead of Integer.
I had some trouble when it comes for comparing items for sort when the code was assuming int not strings.
I had to chage the code of the compareItems to become :

private int compareItems(String item1, String item2) {
		//int compare = (int)( mapItemToTWU.get(item1) - mapItemToTWU.get(item2));
		// if the same, use the lexical order otherwise use the TWU
		//return (compare == 0)? item1 - item2 :  compare;
		if (item1 == null) {
			return -1;
		}
		if (item2 == null) {
			return 1;
		}
		if (item1.equals( item2 )) {
			return 0;
		}
		return item1.compareTo(item2);
	}

considering that I change all instances of items to string, I am not sure if this code is correct but it works for me and it creates results.

the problem is the results are unrealistically huge (High-utility itemsets count: 1235696)
I feel there is something wrong with my modifications that cause this drastic results.

can you please help

Options: ReplyQuote
Re: FHM for float utilities
Date: October 11, 2017 09:29PM

Hello,

About this code:

private int compareItems(String item1, String item2) {
// ...
	}

T there is a reason for using:

int compare = (int)( mapItemToTWU.get(item1) - mapItemToTWU.get(item2));

It is that if we sort the items by ascending order according of TWU, the algorithm will be faster than if you just sort by alphabetical order.

So in your function, your are sorting by alphabetical order (what we call lexicographical order, actually). This should be slower than if you use the ascending order of TWU. So it would be better if your function would look like this:

private int compareItems(String item1, String item2) {
		int compare = (int)( mapItemToTWU.get(item1) - mapItemToTWU.get(item2));
		// if the same, use the lexical order otherwise use the TWU
		return (compare == 0)? item1.compareTo(item2);
	}


Note that i did not compile the code. But this should be the main idea.

About the number of patterns it depends on your data and how you set the minutil threshold. If you set the threshold higher you will get less patterns. And if you set it very low, you could get billions of patterns. It really depends on your data and how you set the parameter.

Best regards,

Options: ReplyQuote


Your Name: 
Your Email: 
Subject: 
Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically.
 ********  **    **  **    **  **    **  **     ** 
 **    **  **   **   ***   **  ***   **  **     ** 
     **    **  **    ****  **  ****  **  **     ** 
    **     *****     ** ** **  ** ** **  ********* 
   **      **  **    **  ****  **  ****  **     ** 
   **      **   **   **   ***  **   ***  **     ** 
   **      **    **  **    **  **    **  **     ** 
This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.