The Data Mining Forum
This forum is about data mining
, data science
and big data
: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger
. No registration is required to use this forum!
Re: big data
Date: November 15, 2018 05:05PM
About Big Data, some researchers say that there are the five V of Big data that are important: Volume, Velocity, Variety, etc.
But besides that, I would like to point out that some problems are easy even if we have big data, and some problems are difficult even if you don't have a lot of data. Actually, the difficulty of a computing problem is sometimes more influenced by the parameters of the problem than by the data size. This is for example the case in some problems such as itemset mining, where reducing the "minsup" parameter can exponentially increase the difficulty of the problem, while some algorithm scale in linear times when the data size is increased.
There are also many interesting problems related to big data such as stream data mining, where the data is potentially infinite. Or mining complex data like social graphs, etc.
Re: big data
Date: November 15, 2018 11:49PM
it is really helpful,
I totally agree and that was my question without 5 V's challenges I am looking for another challenges related to big data to be considered while using frequent itemset mining.
So, Please let me know which challenges interesting in big data that can be solved using frequent itemset mining
Re: big data
Date: November 16, 2018 03:32AM
Specifically for itemset mining, there are several challenges related to big data:
- design some parallel algorithms that run on big data architectures like hadoop, spark, etc. There exists a few already, but perhaps they can be improved or you can design algorithms for other pattern mining problems or variations of the itemset mining problem.
- you can design algorithms for mining itemsets in data streams. There are also exist some. But you can work on some variation of the itemset mining problem, for example, or make something more efficient. There are various possibilities.
- you can work on some topic related to big data, like preserving privacy when we do itemset mining. Privacy preserving data mining is relevant for big data because the data is potentially distributed and processed on multiple servers. So we want to protect privacy... If you are interested by this, you can check our PSPF software for privacy preserving pattern mining. It is not for big data, but it can help to see the idea about this type of problems.
Those are the ideas that come to my mind.