I must tell you that your intuitions are not wrong. Many language models do pred...

I must tell you that your intuitions are not wrong.

Many language models do predict.

In this case, they either try to predict what the next word (or character, or sub-character in case of Chinese, Japanese, etc.- this is totally the decision of the DS) is , or what some "masked" word are.

    w_i becomes w_(i-1) in the sequence
    where w_i is the last word generated

The ones that are trained to be able to predict the next word are the ones that are good generators.