This forum is about

What are best sampling methods for top-k Frequent Subgraph Mining

Posted by:
**
maya
**

Date: April 05, 2020 11:26AM

Hello,

I hope that you are all ok,

Would you please let me know what are best sampling methods for top-k Frequent Subgraph Mining?

Best,

Maya

I hope that you are all ok,

Would you please let me know what are best sampling methods for top-k Frequent Subgraph Mining?

Best,

Maya

Posted by:
**
webmasterphilfv
**

Date: April 05, 2020 04:28PM

Hi,

I think that there are a few ways of doing samplings depending on what you want to do.

1) You could do sampling on the input graphs to reduce the size of input database. For example, an input database has 300 graphs, but you decide to do sampling and use only 200 of those.

2) You could do sampling to reduce the search space during the search for frequent subgraphs. Instead of checking all subgraphs in the search space, you will skip some of them. But then, the result would become approximate.

3) Or among the patterns that you have found, you may sample them to get a small set that is representative of all the patterns.

So I think first, you should think about what is your goal about using sampling. Then, if it is clear, you can compare the sampling techniques for either of these goals.

I dont remember the details of the different sampling techniques so I cannot help you much. But I think some of them may be covered in some data mining textbooks or otherwise in some research papers.

I assume that using sampling, a new approach would be a success if it can considerably reduce the runtime but at the same time give you some result that is quite accurate. And perhaps that you also want to control that error, or guarantee the maximum error if you can...

By the way, you can start from the source code of TKG in SPMF and perhaps modify it for your project. It does the Top-k sugraph mining.

Hope this gives you some ideas.

Best regards,

Philippe

I think that there are a few ways of doing samplings depending on what you want to do.

1) You could do sampling on the input graphs to reduce the size of input database. For example, an input database has 300 graphs, but you decide to do sampling and use only 200 of those.

2) You could do sampling to reduce the search space during the search for frequent subgraphs. Instead of checking all subgraphs in the search space, you will skip some of them. But then, the result would become approximate.

3) Or among the patterns that you have found, you may sample them to get a small set that is representative of all the patterns.

So I think first, you should think about what is your goal about using sampling. Then, if it is clear, you can compare the sampling techniques for either of these goals.

I dont remember the details of the different sampling techniques so I cannot help you much. But I think some of them may be covered in some data mining textbooks or otherwise in some research papers.

I assume that using sampling, a new approach would be a success if it can considerably reduce the runtime but at the same time give you some result that is quite accurate. And perhaps that you also want to control that error, or guarantee the maximum error if you can...

By the way, you can start from the source code of TKG in SPMF and perhaps modify it for your project. It does the Top-k sugraph mining.

Hope this gives you some ideas.

Best regards,

Philippe

Posted by:
**
maya
**

Date: April 06, 2020 01:55AM

Hi,

Thank you Dr. Philippe for your reply and help.

Actually I am intersted in the 2nd point wich you mentioned in your reply:

2) You could do sampling to reduce the search space during the search for frequent subgraphs. Instead of checking all subgraphs in the search space, you will skip some of them. But then, the result would become approximate.

If you remember the best sampling methods, I would be thanksfull.

Best Regards,

Maya

Thank you Dr. Philippe for your reply and help.

Actually I am intersted in the 2nd point wich you mentioned in your reply:

2) You could do sampling to reduce the search space during the search for frequent subgraphs. Instead of checking all subgraphs in the search space, you will skip some of them. But then, the result would become approximate.

If you remember the best sampling methods, I would be thanksfull.

Best Regards,

Maya

Posted by:
**
webmasterphilfv
**

Date: April 08, 2020 05:30PM

Hi Maya, that is a good topic.

Ideally, you may want to avoid looking as as many subgraphs as possible while ensuring that most of the frequent subgraphs will still be found. You could eventually calculate the accuracy of your method.

To avoid looking at all subgraphs, you would need to try to focus on the subgraphs that are the most likely to be frequent I think. But maybe you can think about other criteria. I think you should look at papers that have used sampling in pattern mining to get a more clear idea of how to use sampling in pattern mining. Try to find a paper that did it and see how they did it. It will give you some inspiration.

Best regards,

Ideally, you may want to avoid looking as as many subgraphs as possible while ensuring that most of the frequent subgraphs will still be found. You could eventually calculate the accuracy of your method.

To avoid looking at all subgraphs, you would need to try to focus on the subgraphs that are the most likely to be frequent I think. But maybe you can think about other criteria. I think you should look at papers that have used sampling in pattern mining to get a more clear idea of how to use sampling in pattern mining. Try to find a paper that did it and see how they did it. It will give you some inspiration.

Best regards,

Posted by:
**
maya
**

Date: April 09, 2020 05:50AM

Thanks a lot for your reply.