The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining workshop
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
confusion in EFIM algorithm and hope someone can give me more explanation
Posted by: zhaozhao
Date: July 24, 2018 07:38AM

after I read the article of EFIM and I'm lost at certain page.
correct me if my understanding is not correct.
The FHM(you created) is able to accelerate(improve) the performance of MUI-MINER, and I feel quite clear to the explanations of power-point, however. when it comes to the EFIM, I found confusion on the several areas.

on the PowerPoint of page 11 of EFIM. the transaction(T4) contains {B,E,F}according to my understanding and based on the context of the articles, I think it should be {D,E,F}(change B to D), right?

the total order you explained in the PowerPoint is not the same as "http://forum.ai-directory.com/read.php?5,5303", I wonder if there is video of other thing to explain the essential ideal of EFIM

I'm also lost in the page of 20 of the PowerPoint as the for the definition of 4.10, in the second sentence. the local utility is z w.r.t? what does it means?  and the definition of 3.3 is for what purpose?

is there a easy(concrete) example to understanding the definition 4.11(sub-tree utility) ?

can we have more example(concrete) to explain how the primary(itemset) and secondary(itemset) generated?

what does the page 26 suppose to tell audience? what happen to the data U[a], U[c], and U[d] when reading the transaction of T1, is there is connection to the page25?



sorry to ask so many question. although FHM is able for my work to data mining task. I think that, to apply the latest algorithm created by your team to real work would be more valuable and prove the importance of the EFIM.

Options: ReplyQuote
Re: confusion in EFIM algorithm and hope someone can give me more explanation
Date: July 24, 2018 09:04AM

Hello,

Thanks for reading our papers ;-) It is a little bit late, so I will answer the easy questions first, and answer other questions maybe tomorrow.

Philippe

> after I read the article of EFIM and I'm lost at
> certain page.
> correct me if my understanding is not correct.
> The FHM(you created) is able to
> accelerate(improve) the performance of MUI-MINER,
> and I feel quite clear to the explanations of
> power-point, however. when it comes to the EFIM, I
> found confusion on the several areas.

Yes, the powerpoint that I made for FHM is better than the one for EFIM. I totally agree. EFIM is a bit more complicated and I did not make a very detailed powerpoint for it. However, it is explained with a lot of details in the journal paper. But I understand that it is a bit formal and maybe not so easy to understand.

> I wonder if there is video of other thing to explain
> the essential ideal of EFIM

Yes, it is a good idea. There is no video yet but maybe I can record one if I find some time. This week, I am a little bit busy though so I am not sure if I would have time. I am currently also finalizing a book on high utility mining which will contain a survey of high utility itemset mining that is quite easy to read. But it does not focus on EFIM.


> on the PowerPoint of page 11 of EFIM. the
> transaction(T4) contains {B,E,F}according to my
> understanding and based on the context of the
> articles, I think it should be {D,E,F}(change B to
> D), right?

Yes, that B should be a D ;-) That is correct.

> the total order you explained in the PowerPoint is
> not the same as
> "http://forum.ai-directory.com/read.php?5,5303",

Yes, the total order may be different in diferent examples, but normally there always need to be a total order. The total order can for example be the alphabetical order : A < B < C < D < E < ...
Or it can be some other orders. But it is important that there is an order. The reason is that if you don't follow some order to create patterns, you may create the same pattern more than once! And we don't want to do that, of course.

In the example in the PPT, I think I just used the lexicographical order to have some simple example. But in the implementation of EFIM and in the journal paper, we are using the ascending order of TWU values. What does it means? It means that we calculate the TWU of each item, and then we sort the items according to these values. If we use that order, the algorithm is faster than if we use the alphabetical order. You may want to know why? Actually, there is a good reason. If you look at the picture of the search space in Figure 1 of the journal paper:

http://philippe-fournier-viger.com/EFIM_JOURNAL_VERSION%20KAIS%202016.pdf

You will notice that the tree representing the search space is not balanced. In other words, the first branch of the tree (on the left) contains many itemsets, while the last branch (on the right) contains only the itemset {D}. Thus, depending how you select the total order on items, different items will be in the bigger branches of the tree and the algorithm will be able to prune more or less itemsets from the search space. So this is just a simple explanation, but this is why the total order on items is important.

> I'm also lost in the page of 20 of the PowerPoint
> as the for the definition of 4.10, in the second
> sentence. the local utility is z w.r.t? what does
> it means?  and the definition of 3.3 is for
> what purpose?

First, I am not sure if you know that but "w.r.t" means "with respect to". It is an abbreviation commonly used in mathematics and sometimes in computer science.

So I will give you the main idea about why we need these definitions.

As I think you know the utility is not monotonic nor antimonotonic. It means that if you have an itemset like ABC, we don't know if its supersets like ABCD will have a higher utility or a lower utility or the same utility.

To solve this problem, in the different algorithms like FHM and EFIM, we use upper-bounds on the utility that are anti-monotonic to reduce the search space.

In the definition 4.10, which is called the "local utility" we are defining a new upper-bound. When we apply this upper-bound it is always with respect to an itemset ALPHA and an item Z. For example, let's say that the algorithm is currently considering the itemset ABCD. We will say that ALPHA = ABC and Z = D. Thus, if we put them together it is ABCD.

Now we want to decide: do we need to explore the extensions of ABCD such as ABCDE or ABCDEF? To decide if we need to explore, we can use the local utility upper-bound defined with respect to ALPHA U Z (definition 4.10). We calculate this upper bound. And if it is lower than minutil, we do not need to consider the supersets of ALPHA U Z . For example, if you consider ALPHA = ABC and Z = D, and we calculate that the local utility of ABC w.r.t D is 50 but minutil = 60, we know that any extension of ABCD will not be a high utility itemset. So we can reduce the search space.

I tried to tell you why we use that definition.

Now what about definition 3.3? The purpose is that definition is used in definition 4.10 ;-) We need it to be able to introduce 4.10. Actually the definition 3.3 of remaining utility is the same as in HUI-Miner and FHM I think.

I will answer the other questions tomorrow:

>
> is there a easy(concrete) example to understanding
> the definition 4.11(sub-tree utility) ?
>
> can we have more example(concrete) to explain how
> the primary(itemset) and secondary(itemset)
> generated?
>
> what does the page 26 suppose to tell audience?
> what happen to the data U, U, and U when reading
> the transaction of T1, is there is connection to
> the page25?
>
>
>
> sorry to ask so many question. although FHM is
> able for my work to data mining task. I think
> that, to apply the latest algorithm created by
> your team to real work would be more valuable and
> prove the importance of the EFIM.

Options: ReplyQuote
Re: confusion in EFIM algorithm and hope someone can give me more explanation
Posted by: zhaozhao
Date: July 24, 2018 07:48PM

thanks Philippe. i will wait for following explanation.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.