The Data Mining Forum
This forum is about data mining
, data science
and big data
: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger
. No registration is required to use this forum!
Software for Data Selection
Date: October 05, 2020 05:55AM
I am facing a Data Selection issue.
I can access a large database with a great amount of variables (i.e. hundreds of columns) on SAP BW. I have very little documentation for this db. I want to go through each variable, ignore the empty ones, and identify the useful ones.
I have done this in a very inefficient way. I load my data in Power BI Desktop and I check each variable one at a time. By doing this, I’m sure I’m making no mistake but it takes too much time. I really need a first cleaning.
I’m looking for a software that could help me to select the interesting variables out of a large dataset. I want it to be at least able to detect the empty variable and possibly to figure out duplicates or correlations.
I’d really like to know your tips and good practices for this kind of data selection!