The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining book
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
[SEARCH] Challenges of Data Preparation with Big Data
Posted by: Julius
Date: April 10, 2017 03:54AM

Hi, can anyone recommend good literature on the subject of data preprocessing with Big Data. Especially with challenges that can arise with data preprocessing with Big Data?

Options: ReplyQuote
Re: [SEARCH] Challenges of Data Preparation with Big Data
Date: April 10, 2017 08:12AM

I did a quick search and did not find specific papers that only discusses challenge for pre-processing. But there are some papers such as the following one that discusses some challenges of data processing for big data:

http://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=1095&context=electricalpub

 Regardless of the paradigm used to develop the
algorithms, an important determinant of the success of
supervised ML approaches is the pre-processing of the data.
This step is often critical in order to obtaining reliable and
meaningful results. Data cleaning, normalization, feature
extraction and selection [24] are all essential in order to
obtain an appropriate training set. This poses a massive
challenge in the light of Big Data as the preprocessing of
massive amounts of tuples is often not possible.

I think that this somewhat make sense. Although it is not very detailed. Maybe you can find some more information in other surveys about big data. Also, if you look at specific topics such as "feature extraction", maybe it is easier to find some papers.



Edited 5 time(s). Last edit at 04/10/2017 08:16AM by webmasterphilfv.

Options: ReplyQuote
Re: [SEARCH] Challenges of Data Preparation with Big Data
Posted by: Julius
Date: April 12, 2017 12:34AM

Its better to look at topics like Challenges of Big Data Mining ? or other search keywords with prepartion of Data Mining Challenges ?

Options: ReplyQuote
Re: [SEARCH] Challenges of Data Preparation with Big Data
Date: April 12, 2017 05:19PM

If I were you, I would search about specific types of techniques for data processing with keywords such as:
"dimensionality reduction big data" or "dimensionality reduction map reduce" or "dimensionality reduction spark"...
"feature selection big data" or "feature selection map reduce, etc.
...
You could also search for other keywords for other preprocessing techniques...

Hopefully, some papers have been published and you can read about the techniques that they have developped. And sometimes they will mention some challenges in the future work section.

Options: ReplyQuote


Your Name: 
Your Email: 
Subject: 
Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically.
 **    **  **        ********   **    **  ******** 
 ***   **  **        **     **   **  **   **       
 ****  **  **        **     **    ****    **       
 ** ** **  **        ********      **     ******   
 **  ****  **        **            **     **       
 **   ***  **        **            **     **       
 **    **  ********  **            **     **       
This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.