The output is a string and a lot of the input is strings, so it's not so unusual to treat HTML generation as an exercise in string templating. It also more easily enables optimizations (coalescing of output strings and similar). Getting the filtering/escaping right is essential, of course and it's best if you can only output valid HTML.
But it shouldn't be surprising that people turn to string-handling for handling something where most of the inputs are strings and all of the output is.
But it shouldn't be surprising that people turn to string-handling for handling something where most of the inputs are strings and all of the output is.