I used to work in video and if I remember correctly there were I, P and B frames. You need I and P but the B frames are optional. So if some meta data is unencrypted the server can tell which packets are B frames and decide not to send them to slow clients. The actual data is still encrypted.