When he says he's using a phaser, he means that updates happen in lockstep. Each update still happens on multiple cores, but each fiber will not move on to the next update until all of the other fibers have finished the current update.
So it's not synchronous in the sense that it's running everything sequentially.
> When running the simulation synchronously, i.e. with a phaser, performance drops to about 8 cycles per second on my development machine.
> Performance – we are able to fully exploit the computing power of modern multi-core hardware.
So, 25% faster with 8 cores is "fully exploit the computing power of modern multi-core hardware". WTF?