This forum is about

what's the meaning of Bond in CORI algorithm

Posted by:
**
xiaowei
**

Date: May 24, 2020 09:37AM

hi, everyone. I'd like to use the CORI algorithm and have read the related reference, but it isn't clear to me. Could you explain it further and the bond？thanks a lot.

Posted by:
**
webmasterphilfv
**

Date: May 24, 2020 04:29PM

Hi Xiaowei,

The CORI algorithm is an algorithm to find sets of values that do not appear very frequently together but have a high correlation.

The SPMF documentation of CORI is here:

http://www.philippe-fournier-viger.com/spmf/CORI.php

It contains some example.

Also, here is the link to the paper:

http://www.philippe-fournier-viger.com/spmf/cori.pdf

The idea of the bond measure is the following.

Let's say that you have a set of items {a,b,c,d}.

The BOND will be how many transactions (records) contain a,b,c,d together in the database divided by how many transactions contains at least one item from {a,b,c,d}.

If the BOND is equal to 1, it means that {a,b,c,d} always appear together. In other words, if you have a, then you also have b,c,d. Or if you have b, you always have a, c, d. Or if you have c, then you always have a b d. And if you have d, you always have a b c.

IF the BOND is equal to 0, it means that {a,b,c,d} never appear together.

So in general, a BOND closer to 1 means a higher correlation, while a BOND closer to zero means a lower correlation.

That is the main idea.

If something else is not clear, you can ask.

Best regards,

The CORI algorithm is an algorithm to find sets of values that do not appear very frequently together but have a high correlation.

The SPMF documentation of CORI is here:

http://www.philippe-fournier-viger.com/spmf/CORI.php

It contains some example.

Also, here is the link to the paper:

http://www.philippe-fournier-viger.com/spmf/cori.pdf

The idea of the bond measure is the following.

Let's say that you have a set of items {a,b,c,d}.

The BOND will be how many transactions (records) contain a,b,c,d together in the database divided by how many transactions contains at least one item from {a,b,c,d}.

If the BOND is equal to 1, it means that {a,b,c,d} always appear together. In other words, if you have a, then you also have b,c,d. Or if you have b, you always have a, c, d. Or if you have c, then you always have a b d. And if you have d, you always have a b c.

IF the BOND is equal to 0, it means that {a,b,c,d} never appear together.

So in general, a BOND closer to 1 means a higher correlation, while a BOND closer to zero means a lower correlation.

That is the main idea.

If something else is not clear, you can ask.

Best regards,

Posted by:
**
xiaowei
**

Date: May 24, 2020 06:06PM

thank you very much. Now I have a better understanding of bond.

But I have another question. What's the threshold value of support and bond when the item is efficient and has a higher performance?

Kind regards,

But I have another question. What's the threshold value of support and bond when the item is efficient and has a higher performance?

Kind regards,

Posted by:
**
webmasterphilfv
**

Date: May 24, 2020 06:15PM

Hi again,

Glad it helps.

How to choose the thresholds values depends on your data.

For some datasets, there could be no patterns with a support of 0.1, while for another datasets, there could be millions of patterns having a support of 0.99. So there is not really a way of knowing what is a good support value on your data without testing it.

You could first apply an algorithm like FPGrowth to find the frequent itemsets using a high value like 0.99 and slowly decrease that value until you find some patterns. This would tell you what is the highest support in your data. For example, it could be 0.7.

Then for CORI, you would want to set the maxsup parameter to a value that is less than 0.7 because the goal is to find rare patterns. After that, it depends on how rare you want the patterns to be. If you set for example, maxsup = 0.2, then the patterns cannot appear in more than 20% of the records. But if you set maxup = 0.01 then patterns cannot appear in more than 1% of the records... At some point, you do not want to set maxsup too low either because maybe some patterns will not be significant if they just appear in a few transactions.

Then for the bond, a higher bond value is better. If you set the minbond too low, you will find uncorrelated patterns and if you set it too high you will find nothing. Here again it depends on your data. It is possible that on your data, there exists no patterns with a bond greater than 0.5, or it is possible that there are millions of patterns with a bond of 0.9... So you need to test it.

So generally, you can start with some strict parameter settings like minbond = 0.9 and then decrease the parameter until you find some patterns. If you find too many patterns you can set the parameters more strictly.

Increasing the minbond will reduce the number of patterns

Decreasing the minbond will increase the number of patterns

Decreasing the maxup will reduce the number of patterns

Increasing maxsup will increase the number of patterns

Edited 2 time(s). Last edit at 05/24/2020 06:18PM by webmasterphilfv.

Glad it helps.

How to choose the thresholds values depends on your data.

For some datasets, there could be no patterns with a support of 0.1, while for another datasets, there could be millions of patterns having a support of 0.99. So there is not really a way of knowing what is a good support value on your data without testing it.

You could first apply an algorithm like FPGrowth to find the frequent itemsets using a high value like 0.99 and slowly decrease that value until you find some patterns. This would tell you what is the highest support in your data. For example, it could be 0.7.

Then for CORI, you would want to set the maxsup parameter to a value that is less than 0.7 because the goal is to find rare patterns. After that, it depends on how rare you want the patterns to be. If you set for example, maxsup = 0.2, then the patterns cannot appear in more than 20% of the records. But if you set maxup = 0.01 then patterns cannot appear in more than 1% of the records... At some point, you do not want to set maxsup too low either because maybe some patterns will not be significant if they just appear in a few transactions.

Then for the bond, a higher bond value is better. If you set the minbond too low, you will find uncorrelated patterns and if you set it too high you will find nothing. Here again it depends on your data. It is possible that on your data, there exists no patterns with a bond greater than 0.5, or it is possible that there are millions of patterns with a bond of 0.9... So you need to test it.

So generally, you can start with some strict parameter settings like minbond = 0.9 and then decrease the parameter until you find some patterns. If you find too many patterns you can set the parameters more strictly.

Increasing the minbond will reduce the number of patterns

Decreasing the minbond will increase the number of patterns

Decreasing the maxup will reduce the number of patterns

Increasing maxsup will increase the number of patterns

Edited 2 time(s). Last edit at 05/24/2020 06:18PM by webmasterphilfv.

Posted by:
**
xiaowei
**

Date: May 25, 2020 07:46AM

Thanks a lot.

Kind regards,

Kind regards,