The real question is whether or not it is legal. Theoretically it is possible to do with current tech. If i was making such a tool, i would need access to the ebook then pass it through a llm model (possibly with a 7b open source one) to tag which characters are saying what. Once i have tagged dialogues then I could pass it through elevenlabs or other opensource tts and voila you have an audiobook with different voices.
The real problem is that opensource tts aren’t as good and i imagine if you use paid versions, you will encounter legal issues or it might be too expensive. And can you sell your audio book? Legal troubles again.
But if you just wanna do it while sailing the high seas, everything should be possible.
You can’t feed a book to an open-source LLM. I want to learn about something convenient that can take a book and generate an audiobook in just a few minutes, rather than a process where you have to divide the book into small parts to be fed into an open-source LLM and then produce audio from these parts and merge them together.
The real question is whether or not it is legal. Theoretically it is possible to do with current tech. If i was making such a tool, i would need access to the ebook then pass it through a llm model (possibly with a 7b open source one) to tag which characters are saying what. Once i have tagged dialogues then I could pass it through elevenlabs or other opensource tts and voila you have an audiobook with different voices.
The real problem is that opensource tts aren’t as good and i imagine if you use paid versions, you will encounter legal issues or it might be too expensive. And can you sell your audio book? Legal troubles again.
But if you just wanna do it while sailing the high seas, everything should be possible.
You can’t feed a book to an open-source LLM. I want to learn about something convenient that can take a book and generate an audiobook in just a few minutes, rather than a process where you have to divide the book into small parts to be fed into an open-source LLM and then produce audio from these parts and merge them together.