Hello and thanks for your response!
I think what still confuses me is the so called escape-mechanism proposed by the PPM algorithm. Here is the part in the paper I am talking about:
So, as I understand it, PPM trains all Markov Models up to the maximal order as well, and uses the highest order possible to make the prediction, only switching to a lower order if a prediction cannot be made using the higher one. I have actually created a figure demonstrating the training phase of PPM for the sequence 'ABABAC' using hash tables. It only shows the relevant tables for each encoding step, leaving out all the other order 2 and order 1 tables that are not being updated during that step:
It shows the training of a PPM order 2, incrementing the counts of the encountered symbols in the respective tables for each order and each context observed.
Let's say after training this model, one wants to predict the next item of the sequence <B,B>. Since BB has not been observed in the training sequences, the highest entry in the order-2 table for the BB-context (O2-BB ) would be the 1 for the escape-symbol, all others are still 0. So it escapes order 2 and moves on to order 1. Now it throws away the first B and looks into the order 1 table for the B-context (O1-B ), where the Symbol A has the highest score of 2 (updated in the 2nd to last step shown in my graphic). It will therefore predict A as the next symbol.
To me this sounds just exactly like you described how AKOM works, which is why I do not understand why AKOM has such a massively higher complexity and memory usage.
Maybe you did not implement PPM in a way where it actually uses lower order models if it has not observed a symbol in a larger context size yet. So your implementation in SPMF only uses the maximal order specified, which is why it is so much smaller.
Do you see any other difference between PPM and AKOM then at all if PPM is seen the way I described here?
I hope my explanations and my graphics were understandable