2 days ago

webmasterphilfv

Dear all,
Just to let you know that the PDFs of articles from the UDM 2018 workshop on utility driven mining are online at:
http://philippe-fournier-viger.com/utility_mining_workshop_2018/program.php
Best regards,
Philippe

2 days ago

webmasterphilfv

Hi,
Yes, I think you are right. There seems to be a bug in the implementation. Thanks for reporting it. I should release a new version of SPMF in about 1 week and half because I will have a week of holiday. I will then fix the bug, and also add several new algorithms related to high utility itemset mining that some people have sent to me recently.
By the way, I will also add your name to th

9 days ago

webmasterphilfv

Thanks Dang,
I see. An XML format. I don't like too much XML-based format personally. It wastes a lot of space with all these tags. Already the text-based format of SPMF takes a lot of space because it is a text file. This format would maybe make the output files 10 times or more larger. Just my opinion. But I understand that it can be useful for interoperability with other software.
Is it w

9 days ago

webmasterphilfv

Looks like an interesting concept. Wish you good luck with your product.
Philippe

9 days ago

webmasterphilfv

IEEE Big Data 2018 Call for Workshop Papers & Posters
2018 IEEE International Conference on Big Data (BigData 2018)
http://cci.drexel.edu/bigdata/bigdata2018/index.html
Dec 10-13 2018, Seattle, WA, USA
The IEEE Big Data 2018 has received more than 600 full papers in the main conference and industry and government program. If you miss the submission deadline, there are still chances f

11 days ago

webmasterphilfv

Hi,
Thanks for using SPMF. I do not know what is the PMML format. But if you are comfortable with Java, you could modify the code for writing the rules to the file. This should not be hard. In SPMF, each algorithm is in a separated package. So you could first find the code of the algorithm that you want to modify and then change the code.
But what is PMML? Can you give me a link to a websit

13 days ago

webmasterphilfv

Hi,
There is a lot of possible topics. You can choose to work on something more fundamental like algorithm design or something more applied such as how to best solve a given applied problem.
A good way of choosing a research problem is to look at some recent papers and find something that you are interested in. Personally, I am quite interested in pattern mining problems and algorithm desi

23 days ago

webmasterphilfv

Hi all,
It is my pleasure to announce that my data mining blog is now also available in Chinese:
The data mining blog (Chinese).
About every week some articles will be translated to Chinese and put on this Chinese version of the blog. I will not translate all the content of my English blog but the most important posts will be translated. Besides, some guest authors may also write blog p

25 days ago

webmasterphilfv

You may export the data from the database to a text file in the proper format and they apply Apriori to the text file to obtain the result.
But it depends on your implementation of Apriori. If you are using the implementation from the SPMF software, then you should read the documentation to see which format is required as input.
Best

28 days ago

webmasterphilfv

If your data has time information then yes.

8 weeks ago

webmasterphilfv

Hello,
Thanks for reading our papers ;-) It is a little bit late, so I will answer the easy questions first, and answer other questions maybe tomorrow.
Philippe
> after I read the article of EFIM and I'm lost at
> certain page.
> correct me if my understanding is not correct.
> The FHM(you created) is able to
> accelerate(improve) the performance of MUI-MINER,
>

2 months ago

webmasterphilfv

Hello all,
As many of you know, I have a data mining blog that talk about various topics related to data mining and research:
http://data-mining.philippe-fournier-viger.com/
Recently, I have tried to update the blog once every week. I have actually prepared weekly blog posts already until the end of August.
If you are interested, you can have a look ;-)
Philippe

2 months ago

webmasterphilfv

The most likely reason is that you need to adjust the parameters. It is possible that in your training data, there is no sequence that match with the sequence that you want to predict. Thus, no prediction is given. If you adjust the parameters of the prediction model, I think we will get some prediction ;-)
By the way, sorry for the delay to answer. I have been a bit busy during the last few da

2 months ago

webmasterphilfv

Hello,
Pradivana Wrote:
-------------------------------------------------------
> Hi, i did some research using spmf sequence
> pattern library and i would like to know is there
> anyway to show the accuracy from spmf library?
> especially for Markov, CPT and TDAG algorithm
>
> I've read the documentation and i think i miss
> something, please help, and i'm sorry f

2 months ago

webmasterphilfv

Hi,
Thanks for using SPMF!
I checked the paper just now, and I think that is not explained clearly what Checking_and_removing_item() is supposed to do. Thus, it would be hard to implement this function. If you want to implement it, I think that you should contact the authors of the paper to ask more details about what this function is supposed to do. But with the current paper, I think we don

2 months ago

webmasterphilfv

Hello,
In algorithms like HUI-Miner, the items are sorted according to a total order.
What is a total order? It means that there is some order between the items. For example, it could be the alphabetical order. According to the alphabetical order, an item "a" must be processed before an item "b", and "b" must be processed before an item "c".
The al

2 months ago

webmasterphilfv

Hello,
Sure, if you want to discuss this in the forum, you can share details. Is it improving the performance by a great amount? If so, your improvement could be integrated in the SPMF library and you could become a contributor.
Best regards,
Philippe

2 months ago

webmasterphilfv

Hi,
If you want to use sequential patterns to generate rules, then you could check the RuleGen algorithm from SPMF which allows to do that. But it would need to be modified because this algorithms does not consider timestamps. I mean you could maybe draw inspiration from that...
Another algorithm in SPMF that find rules and with a windows constraint is TRuleGrowth. But it does not consider t

2 months ago

webmasterphilfv

Hi,
In HUI-Miner, you should only combine two itemsets if they are identical except for one item. Thus, you should not combine the itemset
d,f,g
with the itemset
f,g,b
because they have two different items (d and b).
This is one part of the problem.
Best, regads

2 months ago

webmasterphilfv

Hello,
If you clicked on "GENERATE_DATASET.bat" to generate the dataset, it should generate a sequence database, where each line is a sequence.
But I did not use that IBM generator for a long time, so I do not remember how it works. There is some database generators that are perhaps easier to use in my SPMF library.
Best regards,

3 months ago

webmasterphilfv

In frequent subgraph mining, you typically have edges and vertices that have names. For example, you could have a graph about a water molecule, and you would have two nodes that have the same label "Hydrogen" and one node with the label "Oxygen"
Hydrogen ---- Oxygen ----- Hydrogen
Now, when you check if two graphs are isomorphic, yes, you need to check that the structure

3 months ago

webmasterphilfv

Those are mostly the same thing.
In simple words, two graphs are isomorphic if we map the edges and vertices of one graph to the other and they are equivalent.
Subgraph isomorphim checking is the same thing. But since you add the word "subgraph" it means that you are comparing subgraphs of a graph to check if these subgraphs are equivalent.
Yes, the idea of graph isomorphism is

3 months ago

webmasterphilfv

Hello,
Yes, I call this a class sequential rules. There is this algorithm in SPMF:
the TopSeqClassRules algorithm for mining the top-k class sequential rules
that does that.
It will let you select {i} to find the k most frequent sequential rules of the form X --> {i}.
This algorithm is similar to RuleGrowth but modified to do that.
Best
Philippe

3 months ago

webmasterphilfv

Hello,
Sorry for the delay to answer. I saw your e-mail but actually was too busy in the last few days. I will provide some answer/opinion/suggestion below.
How to represent the data is always a good question because depending on how you represent the data, you may obtain different results using a data mining algorithms.
A possibility could be that each sequence represents a sequence o

3 months ago

webmasterphilfv

Hi Victor,
I see. There is no such implementation in SPMF that does exactly that. It could be done, I think, but it would require some programming to modify the algorithm and it can be more or less complicated. If one modifies it, then it would need to check to make sure that the algorithm remains correct, and sometimes combining two ideas results in an algorithm that cannot find all the patter

3 months ago

webmasterphilfv

Hi,
1) the likely reason is that the input format is not correct.
At the end of each itemset, there should be a -1 to separate. For example, the first sequence should be in this format:
<10> 42 45 -1 <11> 31 42 45 -1 <20> 18 23 31 42 45 -1 <36> 48 -1 -2
It is the same for the other sequences.
2) Yes, if you have a pattern:
<0> 1 2 <1> 3

3 months ago

webmasterphilfv

Great. I will fix the error in the documentation. Thanks for reporting it.

3 months ago

webmasterphilfv

Hi Victor,
I will answer your question below.
> So how large this large should be set to make it
> exact not approximate algorithm? Does it depends
> on data size?
The problem with the maximal time interval constraint is that if you apply this constraint when doing closed sequential pattern mining, you may miss some patterns. If you don't care about missing a few patterns

3 months ago

webmasterphilfv

Dear Victor,
> My current way of defining itemset is that for a
> sequence of events that a customer took in the
> history, an itemset is the events happened in the
> same day. So in the final frequent sequence, I am
> able to know this sequence covers how many days.
> But I also want to reserve the original order of
> events in an itemset. So what if I reserve the
&

