The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining workshop
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
SPMF format for sentiment analysis
Posted by: ani
Date: June 16, 2018 03:17PM

Hi Prof Philippe Fournier-Viger,



I would need your support (if possible)to convert my ds in SPMF format so that subjecting it to SPMF algorithms will make sense.

1-Initially I have some concerns on how to map the user ids of the FB users whose posts I have labelled previously; should I anonymize them and map them to some random unique numbers???what do you suggest?
2- What about time stamp, i need to keep that field? does it make sense if I map chronological time stamps to 1, 2,3,4 ....for each userID?
3- as for the emotion tags I am mapping them to integer values 1-7.


Many thanks



Edited 1 time(s). Last edit at 06/20/2018 01:24AM by webmasterphilfv.

Options: ReplyQuote
Re: SPMF format for sentiment analysis
Date: June 17, 2018 11:17PM

Hello,

Sorry for the delay to answer. I saw your e-mail but actually was too busy in the last few days. I will provide some answer/opinion/suggestion below.


How to represent the data is always a good question because depending on how you represent the data, you may obtain different results using a data mining algorithms.

A possibility could be that each sequence represents a sequence of emotions associated to the post of a politican. Then you would also have several sequences, as you have several politicians. If you do like that, then you do not need to encode the user id. The first sequence could be the first user. The second sequence could be your second user, etc.
I am not sure if this is what makes the most sense, but it is my idea when reading your message.

> 2- What about time stamp, i need to keep that
> field? does it make sense if I map chronological
> time stamps to 1, 2,3,4 ....for each userID?

In a sequence, you can have the sequential order between posts.

Now, it is worthy to be more specific and also have the timestamps? Maybe not. Actually, in my software SPMF, few algorithms can actually use the timestamps. Most of them just care about the sequential order.

Some algorithm that uses the timestamps in SPMF such as the Hirate-Yamana algorithm are very strict about how the time is handled. For example, a pattern (a time = 1)(b time = 2) is considered to be different from a pattern (a time=1) (b time =3) because the time difference between "a" and "b" in the two patterns is not the same. Thus, using timestamps may not always give you good results. So my suggestion is perhaps to first not use them, unless you really want to use the algorithms with timestamps.

> 3- as for the emotion tags I am mapping them to
> integer values 1-7.

This seems reasonable.


I think, yes, you could have sequences like this:

1 -1 2 -1 3 -1 -2

which means that a post with emotion 1, was followed by a post with emotion 2, which was followed by a post with emotion 3.

Or as you said, you can try to include timestamps.

By the way, not all algorithms in SPMF have the same format. The format that I have described above is the one used by most sequential pattern mining algorithms. But you could also consider finding other types of patterns such as periodic patterns in a single sequence. In that case, another format must be used. I think you can check the various algorithm available in the documentation to see what is best for what you want to do.



Edited 1 time(s). Last edit at 06/20/2018 01:25AM by webmasterphilfv.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.