The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining workshop
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
Itemset mining in irregular data?
Posted by: Naalm
Date: November 01, 2018 07:25AM

I have been Following the forum for a while. I Wonder if we can use the itemset mining in irregular data such as social network posts? Please guide me to explore this idea.

Options: ReplyQuote
Re: Itemset mining in irregular data?
Posted by: Rashid
Date: November 07, 2018 05:00PM

What does it mean IRREGULAR DATA?


Options: ReplyQuote
Re: Itemset mining in irregular data?
Date: November 08, 2018 06:37AM

Hi, thanks for following the forum. :-) I think you mean "unstructured data". For example, a text document or a tweet do not have a clear structure. In that case, yes, we could do some pattern mining.

For example, you can consider sentences of a text as sequence of symbols (items), and then apply sequential pattern mining to find subsequences of words that appear frequently in tweets or a text document. In my previous work, I for example mined sequential patterns from books to analyze the writing styles of people. In that paper:

Pokou J. M., Fournier-Viger, P., Moghrabi, C. (2016). Authorship Attribution Using Small Sets of Frequent Part-of-Speech Skip-grams. Proc. 29th Intern. Florida Artificial Intelligence Research Society Conference (FLAIRS 29), AAAI Press, pp. 86-91

In that paper, what we call Skip-grams is basically a sequential pattern.

But for a book it is probably more interesting to find patterns than in short messages like tweets. A tweet is usually very short and people may not write them very well, so it is more challenging to analyze tweets in my opinion than to analyze a book.

Similarly, if we see sequences as bag of words (words without order), than we can apply itemset mining. Each transaction is a sentence and each item is a word. This would find sets of items common to multiple sentences for example.

I think there are various possibilities. I just mention a few that come to my mind.

Perhaps that some other possibilities about social network would be to analyze a matrix of "likes" such as on Facebook, where pages are items, and each user is a transaction. Thus, each transaction would indicates the pages that a user like. Then we could use itemset mining to find some correlation between sets of page that people like together.

Options: ReplyQuote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.