I did a quick search and did not find specific papers that only discusses challenge for pre-processing. But there are some papers such as the following one that discusses some challenges of data processing for big data:
http://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=1095&context=electricalpub
Regardless of the paradigm used to develop the
algorithms, an important determinant of the success of
supervised ML approaches is the pre-processing of the data.
This step is often critical in order to obtaining reliable and
meaningful results. Data cleaning, normalization, feature
extraction and selection [24] are all essential in order to
obtain an appropriate training set. This poses a massive
challenge in the light of Big Data as the preprocessing of
massive amounts of tuples is often not possible.
I think that this somewhat make sense. Although it is not very detailed. Maybe you can find some more information in other surveys about big data. Also, if you look at specific topics such as "feature extraction", maybe it is easier to find some papers.
Edited 5 time(s). Last edit at 04/10/2017 08:16AM by webmasterphilfv.