https://github.com/mlang/llm-tts
Strictly speaking, even music generation fits the usage pattern: text in, audio out.
llm-tts is far from complete, but it makes it relatively "easy" to try a few models in an uniform way.
https://github.com/mlang/llm-tts
Strictly speaking, even music generation fits the usage pattern: text in, audio out.
llm-tts is far from complete, but it makes it relatively "easy" to try a few models in an uniform way.