One of my previous post describes my first attempt to generate training data for HTS system from recordings and transcripts: How to create full-context labels for your HTS system (update: not really worked); unfortunately, it did not work as expected. During eNTERFACE’14, I have learned that there is a tool named EHMM in festvox that can help to build the .utt files (and in turn the full-context labels easily).
To start, you can follow the steps in the following link, which is the full instruction to build a CLUSTERGEN voice: http://festvox.org/bsv/c3170.html.
You can actually stop after having obtained the .utt files and use it for your own purposes.
Last month (from June 9 to July 4), I have a chance to go to Spain to work on a project name ZureTTS (which means Your Text-to-speech in Basque language) during the eNTERFACE’14 workshop. This was very excited considering the unique opportunity to work with graduate students and researchers in speech processing all around Europe in 4 full working weeks (and the opportunity to leave in Spain for 1 month as well).
Basically ZureTTS is an online platform for anyone to obtain a speech synthesis engine with their own voice. The technology was not new, but this was the effort to coordinate multiple modules and technologies to create a complete and assessable system. Users can go to the website, register for an account, then record there voice with 100 pre-defined sentences in a chosen language, and choose to start the adaptation process. The adaption from the average/generic voice to the personalized voice is queued up and run sequentially in the background on the main server, which may last from 1 hours to 10 hours. After the adaption has finished, an notification email will be sent, and users can go back to the website to synthesize speech using there own voice. There is also a web API for synthesizing so that mobile apps can plug in to this in the future.
I am very happy that we can complete a working system described above in only 4 weeks. Of course, there are still many possible improvements, but the basic flow of the system is complete, and the system is also quite decoupled and extensible. You can try it out at http://aholab.ehu.es/zuretts (the server sometimes blocks you for 20 to 30 minutes, I am still not sure why).
And below is a photo of the awesome team and awesome new friends.
And below is a photo of Prof. Inma Hernaez, our 2nd project leader and the photographer of the photo above, and me.