You can also try out Stanza in spaCy --- Ines updated the spacy-stanfordnlp wrap...

You can also try out Stanza in spaCy --- Ines updated the spacy-stanfordnlp wrapper to use the new version pretty much immediately: https://github.com/explosion/spacy-stanza

I was asked by a friend about Stanza in a private DM, I'll paste the answer here as I think others might find it helpful:

    Q: are stanza models more accurate and consistent than
    spacy as this tweet claims?

    A: Yeah definitely, our models are quite a bit behind 
    state-of-the-art atm because we're still optimized for 
    CPU. We're hoping to have a spacy-nightly up soon that
    builds on the new version of Thinc.

    The main thing we want to do differently is having 
    shared encoding layers across the pipeline, with several
    components backproping to at least some shared layers of
    that. So that took a fair bit of redesign, especially to
    make sure that people could customize it well.

    We never released models that were built on wide and 
    deep BiLSTM architectures because we see that as an 
    unappealing speed/accuracy trade-off. It also makes the
    architecture hard to train on few examples, it's very 
    hyper-parameter intensive which is bad for Prodigy.

    Their experiments do undercount us a bit, especially 
    since they didn't use pretrained vectors, while they 
    did use pretrained vectors for their own and Flair's 
    models. We also perform really poorly on the CoNLL-03 
    task. I've never understood why --- I hate that dataset.
    I looked at it and it's like, these soccer match 
    reports, and the dev and test sets don't correlate well.
    So I've never wanted to figure out why we do poorly on 
    that data specifically.

As an example of what I mean by "under counting", we can get to 78% on the GermEval data, while their table has as on 68%, while FLAIR and Stanza are on 85%. So we're still behind, but by less. The thing is, the difference between 85 and 78 is actually quite a lot -- probably more than most people would intuit.

I hope we can get back to them with some updates for specific figures, or perhaps some datasets can be shown as missing values for spaCy. Running experiments with a bunch of different software and making sure it's all 100% compatible is pretty tedious, and it won't add much information. The bottom-line anyone should care about is, "Am I likely to see a difference in accuracy between Stanza and spaCy on my problem". At the moment I think the answer is "yes". (Although spaCy's default models are still cheaper to run on large datasets).

We're a bit behind the current research atm, and the improvements from that research are definitely real. We're looking forward to releasing new models, but in the meantime you can also use the Stanza models with very little change to your spaCy code, to see if they help on your problem.