Why is it cheating? We literally teach sports this way? Often times you teach sports by learning in scaled down scenarios. I see no reason this should be different.
If the goal is to learn how to solve a Rubik's Cube when you've never seen a Rubik's Cube before, you have no idea what "halfway solved" even looks like.
This is precisely how RL worked for learning Atari games: you don't start with the game halfway solved and then claim the AI solved the end-to-end problem on its own.
The goal in these scenarios is for the machine to solve the problem with no prior information.
If you sat down to solve a problem you’ve never seen before you wouldn’t even know what a valid “later state” looking like.