In addition to two popular small graph datasets for frequent subgraph mining, namely Chemical and Compound datasets, I have uploaded nine new large graph datasets in the following link:
These datasets include seven bio- and chemo-informatics datasets and two social network datasets. They are in the format of DIMACS, the default format used in the gSpan algorithm.
In case you need the detail of the graph format, you can read the post of Philippe at this link
If you use these graph datasets in your papers or projects, please cite our paper as follows:
Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh, Dinh Phung (2018). Learning Graph Representation via Frequent Subgraphs. SDM 2018, San Diego, USA. SIAM, 306-314.