If you have used SailAlign (or HTK) to do forced alignment on a large corpus, you may already encounter the error: ReadString: String too long. This error is actually thrown out from HTK, and a quick search on the Internet would return the below web page.
http://www.ling.ohio-state.edu/~bromberg/htk_problems.html
The solution according to the page is:
Make changes to the pronunciation dictionary:
Replace all multiple spaces with single space;
Replace all tabs with single space;
Put a ” before every double quote (“); %”
Put a ” before any dictionary entry beginning with single quote (‘)
And this actually solves the problem, which is quite annoying since the error message “String too long” gives no clue on this solution. Moreover, you will also have to make the same changes to the transcript giving to SailAlign to avoid seeing the same problem with HDecode.
I have spent so much time checking the dictionary and reducing the length of the input data to get rid of the error, just to find out that those suspects are irrelevant. Fortunately I found the problem right in the transcript, and at last SailAlign can run without a hitch now.
An example like this will be good:
Error may be cause by sentence like below:
LET ‘EM PLAY
Solution:
Write like this in .mlf -> LET “‘EM” PLAY
Write like this in .dic -> “‘EM” DH EH M
Replace the /t with single spaces and other spaces (more than single space) with single space.
My issue is solved using the above.
Thanking you,
Senjam Shantirani
Thanks for the example!