The Data Mining Forum                             open-source data mining software data science journal data mining conferences machine learning in software engineering MLISE 2021 utility mining workshop at ICDM 2021
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
statistical data: concrete case. tests to perform
Posted by: Gilgamesh
Date: August 17, 2021 02:50PM

Hello everyone,

I haven't done statistics / econometrics for a long time. I haven't done a lot by the way. But I would like advice on some methods and tests to be performed on two excel files that I have.

Each excel file contains thousands of lines. One line is for "a paper file" (in the real life) and there are dozens of columns.

The first excel file contains data concerning closed cases (closed in the period 2015-2020) and the second excel file contains data concerning non-closed cases.
A lot of columns are qualitative variables: nature of the file (fiscal, HR, ...), the department that manages it, the department that created the file, the name of the creator, ... and other variables such as the number of project managers on the file, the number of type A employees involved in the file, the number of type B employees, the numbers of pages in the file... and also variables such as the case start date, the case closing date (in the event that the files are closed).(and sometimes intermediate event dates).

I would like to do an analysis to "make the data talk" about the processing time and what influences it. To do this I work on the two files separately.

For the first, I calculate the processing time: end case date - begining case date -> and I have the processing time expressed in days.(for the closed files-cases)
From this, I would like to try to explain what are the (main) factors that influence the average processing time of a case.
My question: should I use "linear regression model"? And how to "mix" qualitative and quantitative variables (if possible).

I would like to repeat the same thing afterwards but by separating the files by nature (fiscal, HR, ...). The goal would be to have the main elements that determine the processing time of a specific type of file.

What other operations / questions could I also carry out to reveal interesting informations in relation to the processing time of cases?

the second excel file contains unfinished folders/files/cases. There are also old files lying around. What tests / questions can I do to compare with the results of the first excel file?

Hoping that my English is an understandable minimum smiling smiley

thanks in advance ! winking smiley


Options: ReplyQuote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.