The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining workshop
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
high dimensional data with frequent itemsets
Posted by: david@gmail.com
Date: November 07, 2018 09:19AM

Hi,

Is itemset mining in high dimensional data ( like gene datasets) beneficial?
for example is bioinformatics datasets can be used in data mining ( association rule mining)?




kind regards
David

Options: ReplyQuote
Re: high dimensional data with frequent itemsets
Date: November 08, 2018 05:22AM

Hi David,

If you have high dimensional data (many attributes), then itemset mining can still be applied because each itemset will generally only involve a few attributes. Thus, even if you have many attributes, by applying frequent itemset mining, you will only find the small sets of values that appear often together using a subset of the attributes.

Now, if you have high dimensional data, a problem could be that the database will be very dense or in other words that transactions will be very similar with each other. If that is the case, the search space can become very large and algorithms may take a lot of time to run to check all the possibilities. For example, if there is a frequent itemset with 10 items, then all the 2^10-1 subsets will also be frequent. Thus, the more you have attributes, the more transactions are likely to be similar, and the more itemsets there could be... and thus the search space can be very large.

There are however some solutions to deal with a large search space. It is to set the parameter(s) to greater values, and to use some constraints such as not finding itemsets having more than 4 items. This will make an algorithm much more efficient and it will be able to run even in high dimensional data.

After that, it also depends on what you are doing. There are different algorithms to find patterns. Depending on your application, a type of patterns may be more useful than others and allow you to discover more interesting knowledge from your data.

Hope this helps.

Regards,
Philippe

Options: ReplyQuote
Re: high dimensional data with frequent itemsets
Posted by: david
Date: November 12, 2018 07:09AM

Thanks so much,


It is really helpful.
I just wonder is frequent itemsets generated from bioinformatics datasets beneficial.

for example, in gene datasets, what the knowledge we can get from the resulted frequent itemsets.

Kind regards
David

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.