I’m working a lot with TTS (Text-to-Speach), and it’s also a total wild west - even worse than LLMs in some ways. The demos are always perfect, but once you generate hundreds of minutes you start seeing volume drift, pacing changes, random artifacts, and occasional mispronunciations that never show up in the curated clips.
The big difference from LLMs is that we don’t really have production-grade, standardized benchmarks for long-form TTS. We need things like volume-stability across segments, speech-rate consistency, and pronunciation accuracy over a hard corpus.
Great question! With the paid plan, you get 70 credits per month, which equals 70 minutes of episodes. You can re-listen to any episodes you’ve already generated as many times as you want, create playlists, and share them with people who aren’t users.
Definitely the latter. From my experience with my kids, they love listening to the same episode multiple times, and it helps them absorb the material better. Also, it makes for a great bedtime routine.
hey, thanks for asking!
short answer - i combine an LLM model with TTS models to generate and narrate each episode. But there’s a lot more happening behind the scenes to make sure everything is safe, age-appropriate, and sounds natural every time.
Thanks so much! A lot of parents use it in the car, at bedtime to wind down, or just for some screen-free quiet time. I really wanted to make it easy for them to whip up an episode anytime their kid asks a question.
https://wonderpods.app/ - create custom podcasts for kids
reply