One of my previous post describes my first attempt to generate training data for HTS system from recordings and transcripts: How to create full-context labels for your HTS system (update: not really worked); unfortunately, it did not work as expected. During eNTERFACE’14, I have learned that there is a tool named EHMM in festvox that can help to build the .utt files (and in turn the full-context labels easily).
To start, you can follow the steps in the following link, which is the full instruction to build a CLUSTERGEN voice: http://festvox.org/bsv/c3170.html.
You can actually stop after having obtained the .utt files and use it for your own purposes.
In general, the necessary steps are listed below: