You can also try out Stanza in spaCy --- Ines updated the spacy-stanfordnlp wrapper to use the new version pretty much immediately: https://github.com/explosion/spacy-stanza
I was asked by a friend about Stanza in a private DM, I'll paste the answer here as I think others might find it helpful:
Q: are stanza models more accurate and consistent than
spacy as this tweet claims?
A: Yeah definitely, our models are quite a bit behind
state-of-the-art atm because we're still optimized for
CPU. We're hoping to have a spacy-nightly up soon that
builds on the new version of Thinc.
The main thing we want to do differently is having
shared encoding layers across the pipeline, with several
components backproping to at least some shared layers of
that. So that took a fair bit of redesign, especially to
make sure that people could customize it well.
We never released models that were built on wide and
deep BiLSTM architectures because we see that as an
unappealing speed/accuracy trade-off. It also makes the
architecture hard to train on few examples, it's very
hyper-parameter intensive which is bad for Prodigy.
Their experiments do undercount us a bit, especially
since they didn't use pretrained vectors, while they
did use pretrained vectors for their own and Flair's
models. We also perform really poorly on the CoNLL-03
task. I've never understood why --- I hate that dataset.
I looked at it and it's like, these soccer match
reports, and the dev and test sets don't correlate well.
So I've never wanted to figure out why we do poorly on
that data specifically.
As an example of what I mean by "under counting", we can get to 78% on the GermEval data, while their table has as on 68%, while FLAIR and Stanza are on 85%. So we're still behind, but by less. The thing is, the difference between 85 and 78 is actually quite a lot -- probably more than most people would intuit.
I hope we can get back to them with some updates for specific figures, or perhaps some datasets can be shown as missing values for spaCy. Running experiments with a bunch of different software and making sure it's all 100% compatible is pretty tedious, and it won't add much information. The bottom-line anyone should care about is, "Am I likely to see a difference in accuracy between Stanza and spaCy on my problem". At the moment I think the answer is "yes". (Although spaCy's default models are still cheaper to run on large datasets).
We're a bit behind the current research atm, and the improvements from that research are definitely real. We're looking forward to releasing new models, but in the meantime you can also use the Stanza models with very little change to your spaCy code, to see if they help on your problem.
I was asked by a friend about Stanza in a private DM, I'll paste the answer here as I think others might find it helpful:
As an example of what I mean by "under counting", we can get to 78% on the GermEval data, while their table has as on 68%, while FLAIR and Stanza are on 85%. So we're still behind, but by less. The thing is, the difference between 85 and 78 is actually quite a lot -- probably more than most people would intuit.I hope we can get back to them with some updates for specific figures, or perhaps some datasets can be shown as missing values for spaCy. Running experiments with a bunch of different software and making sure it's all 100% compatible is pretty tedious, and it won't add much information. The bottom-line anyone should care about is, "Am I likely to see a difference in accuracy between Stanza and spaCy on my problem". At the moment I think the answer is "yes". (Although spaCy's default models are still cheaper to run on large datasets).
We're a bit behind the current research atm, and the improvements from that research are definitely real. We're looking forward to releasing new models, but in the meantime you can also use the Stanza models with very little change to your spaCy code, to see if they help on your problem.