We use a pretty standard tech stack of PyTorch + NCCL + MPI. We've used both OpenMPI and MPICH to varying degrees.
Kubeflow is interesting, but it solves a slightly different problem of scheduling/coordinating ML workflows on top of Kube. It doesn't get involved with how an ML job communicates within itself cross-node.
Probably OP was referring to the MPIOperator, TFOperator, PytorchOperator, ... they are under the Kuberflow org, but can be deployed independently of Kubeflow itself. Several other projects are using those operators to provide similar abstractions you mentioned in your blog post, e.g. Gang scheduling, cross-nodes communication, ...
One difference is that these operators use the Kubernetes service interface for communication, generally exposing a headless service for each replica.
We use a pretty standard tech stack of PyTorch + NCCL + MPI. We've used both OpenMPI and MPICH to varying degrees.
Kubeflow is interesting, but it solves a slightly different problem of scheduling/coordinating ML workflows on top of Kube. It doesn't get involved with how an ML job communicates within itself cross-node.