The Data Mining Forum
This forum is about data mining
, data science
and big data
: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger
. No registration is required to use this forum!
Distribution of list of items -statistical tests-
Date: October 27, 2017 07:52AM
Given a list of items L collected in the time intervals t = 1,2,...,10, I want to test the hypothesis that the sublist of items in the time interval t = 1,2,...,5 does not come from the same distribution as the sublist of items in the time intervals t = 6,7,...,10.
Is it any standard way to test this hypothesis? I think that we can assume that the list of items comes from a Poisson distributions, then test the null H_0: lambda1 = lambda2 against H_a: lambda1 ~= lambda2 using standard tests for Poisson means.
Has any of you which is more familiar with sequential pattern literature have seen this test before?
Re: Distribution of list of items -statistical tests-
Date: October 27, 2017 08:52AM
I am not sure exactly for this case. But you could have a look at the topic of "change detection / concept drift" in data streams. In data stream mining, there are various algorithms in pattern mining that attempts to detect whether there is some significant change between two time windows. Maybe that you could find some ideas by looking at this topic.