Name: Text-to-Speech Synthesis
Rating: 4.19 (3 reviews)
ISBN: 9780521899277

158 reviews13 followers

March 6, 2018

There are many Natural Language Processing books that cover text to speech synthesis. Similarly, there are many books that cover particular facets of speech synthesis (prosody in speech synthesis, synthesizers for a specific language). Yet, Paul Taylor's Text to Speech Synthesis seemed to be one of the few books dedicated solely to this topic.

The book is pretty comprehensive - going through text segmentation, part of speech analysis, signal processing, acoustic models of speech and finally speech and prosody synthesis. The author focuses on "first generation" speech synthesis (formant synthesis) and concatenative and HMM synthesis. Articulatory synthesis is only mentioned in passing and the book was published before the deep learning boom (so neural network synthesis is also not covered).

I was impressed how focused the book is. The author goes into detail how many engineering trade-offs one can perform when the domain is constricted only to speech synthesis. For example full part of speech tagging is often unnecessary when the words are homophones and homographs may be often disambiguated on a case-by-case basis by simpler algorithms (i.e bass/fish versus bass/guitar may be discerned by a simple classifier trained on the neighboring words).

The author claims to present a practical engineering book but in fact, Text to Speech Synthesis is very light on the practicalities. Being spoiled by programming books I expected source code, or at least large sections of pseudocode, when in fact the book is very scant on the details and I doubt I could recreate even half the described methods without further investigation.

The book is broad in scope but not very deep. By nature, the author couldn't indulge in signal processing or go into NLP techniques and some ideas couldn't be explained in detail. The author does a good job of explaining simple concepts but I felt quite lost at times for example in the chapters on acoustic models of speech.

One other pretty big annoyance of the book were the persistent errors in bibliography - wrong entries, missing entries, or telltale signs of misusing latex - "[?]" where a bibliographical entry should be.

All in all, I was somewhat disappointed by Text to Speech Synthesis but the book is pretty approachable introductory book on the topic of speech synthesis.

nlp-related owned-books

Text-to-Speech Synthesis

Paul Taylor

About the author

Paul Taylor

Ratings & Reviews

Friends & Following

Community Reviews

Join the discussion

Can't find what you're looking for?