The Data Mining Forum
This forum is about
data mining,
data science and
big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by
P. Fournier-Viger.
No registration is required to use this forum!.
Authorship Attribution Question
Date: October 24, 2019 02:00PM
Hi Phillipe
I have read your papers on authorship attribution. I am interested in the output of TKS -- how you rank the most likely candidate.
It seems you are using a script or customised software to apply the output of TKS to create percentages, discard most common sequences and rank the most likely candidates. Is this script/program available or is it propriety?
I have output TKS from 8 authors (K=50) into CART from Salford Systems ( a decision Tree) and it has selected the most probable candidates in a ransom note/kidnapping project, which matches a stylometry analysis I did on this same case a few years ago.
But I would really like to confirm this with your ranking procedure if it is available.
Also, am I correct in author percentages of top sequences?
percentage = SUPPORT/total sentences by author
Or is percentage divided by ALL the sentences of all the authors?
I hope this is clear.
Many thanks,
Tom
Re: Authorship Attribution Question
Date: October 25, 2019 08:16PM
Dear Tom,
Thanks very much for your interest in these papers. I think authorship attribution is a very interesting topic. I was working on that topic for the project of a student named J.M. Pokou and another professor.
I would like to share the code of this project with you. It is all implemented in Java. But I think that I may not have it on my computer. I would need to search for it as it is a few years ago. I think the best would be to send an e-mail to JM. Pokou at pokoujeanmarc AT GMAIL DOT COM which is the student who wrote the code for that project. He should be able to give you the code and his datasets as well if you need them. I think he can also answer specific questions that you may have related to his project! If you cannot reach him, you also let me know.
Best regards,
Philippe