The Data Mining Forum                             open-source data mining software open-source data mining software data science journal data mining conferences
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
sort
Posted by: setya
Date: July 04, 2012 02:03AM

I want to ask

how to sort the results of the pattern of fp-growth algorithm from the confidence of the greatest to the least of which are in spmfGUIv079

really need help

regards

Options: ReplyQuote
Re: sort
Date: July 04, 2012 04:22AM

Hi Setya,

Welcome to the forum.

I assume that you are using the MainTestAllAssociationRules_FPGrowth_version test file.

To sort the result, you could add this code to the class RulesAgrawal.java in the package ca.pfv.spmf.associationrules.agrawal_FPGrowth_version:

public void sortByConfidence(){
		Collections.sort(rules, new Comparator<RuleAgrawal>() {
			public int compare(RuleAgrawal r1, RuleAgrawal r2) {
				return (int)((r2.getConfidence() - r1.getConfidence() ) * Integer.MAX_VALUE);
			}
		});
	}

Then, you can call this method to sort the result by confidence in the test file :

....
		// STEP 2: Generating all rules from the set of frequent itemsets (based on Agrawal & Srikant, 94)
		double  minconf = 0.60;
		AlgoAgrawalFaster94_FPGrowth_version algoAgrawal = new AlgoAgrawalFaster94_FPGrowth_version(minconf);
		RulesAgrawal rules = algoAgrawal.runAlgorithm(patterns);
		
		rules.sortByConfidence();  
		
		rules.printRules(database.size());
...



Hope this helps,

Philippe



Edited 1 time(s). Last edit at 07/04/2012 04:23AM by webmasterphilfv.

Options: ReplyQuote
Re: sort
Posted by: setya
Date: July 11, 2012 01:59AM

thanks for the guide ..

but I use a class contained in the package AlgoAgrawalFaster94_FPGrowth_version_saveToFile ca.pfv.spmf.associationrules.agrawal_FPGrowth_version_saveToFile

I run a MainWindow, here goes wrong

tutorial that you provide can not be applied here.

once again I ask directions ..

thank you ..

regards

Options: ReplyQuote
Re: sort
Date: July 11, 2012 07:52PM

Hi,

The solution that I suggested was for the version of the algorithm that keep the result into memory. With this version, my solution was to sort the rules into memory. This is the easiest way to solve the problem and the best way to solve the problem if the number of rules found can fit into memory.

But, I understand that you are using the version that saves to file. To modify this version to sort by confidence it would be more complicated. The problem is that the file would need to be sorted instead of sorting the rules in memory. There are two ways that this could be done.

The first way is to write the results to the file as it is done, and then sort the file. But if you assume that the file cannot fit into memory, you would need to use an external sorting algorithm (an algorithm specifically designed for sorting a large file on hard drive). For example, in the book of Schaffer, section 8.5, there is one such algorithm that is described: http://people.cs.vt.edu/~shaffer/Book/JAVA3e20120605.pdf
But it would be a complicated solution.

The second approach is that every time that a rule is written to the file, the program would scan the file to insert the rule at the right position in the file. The problem with this approach is that it would be time consuming to always scan the file for each insertion, especially if we assume that the file is so large that it cannot fit into memory. An optimization would be to use an index with a binary search to make it more efficient. Both still it would be complicated to implement and some other issue would need to be addressed.

There may be also some other solutions to improve performance.But in my opinion, if you can assume that the result will fit into memory, I would recommend to use the simple approach that I have described previously by using the version of the algorithm that keep the result into memory. This solution is very simple and would work fine unless you run out of memory because there are too many rules.

Best,

Philippe



Edited 2 time(s). Last edit at 07/11/2012 07:55PM by webmasterphilfv.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.