In order to recognize speech, Speech Recognition Engines require two types of files: the first, called an Acoustic Model, is created by taking a very large number of transcribed speech recordings (called a Speech Corpus) and ‘compiling’ them into statistical representations of the sounds that make up each word. The second is a Grammar or Language Model. A Grammar is a relatively small file containing sets of predefined combinations of words. A Language Model is a much larger file containing the probabilities of certain sequences of words.
Most Acoustic Models used by ‘Open Source’ Speech Recognition engines are ‘closed source’. They do not give you access to the speech audio (the ‘source’) used to create the Acoustic Model. If they do give you access, there are usually licensing restrictions on the distribution of the ‘source’ (i.e. you can only use it for personal or research purposes).
The reason for this is because there is no free Speech Corpus in a form that can readily be used, or that is large enough, to create good quality Acoustic Models for Speech Recognition Engines. Although there are a few instances of small FOSS speech corpora that could be used to create acoustic models, the vast majority of corpora (especially large corpora best suited to building good acoustic models) must be purchased under restrictive licenses.
As a result, Open Source projects that want to distribute their code freely must purchase restrictively licensed Speech Copora that limit distribution of the ‘source’ speech audio, but allow them to distribute any Acoustic Models they create.
VoxForge will address this problem by providing all Acoustic Models and their ‘source’ (i.e. transcribed speech audio) in GPL licensing format – which requires that the distribution of derivative works include access to the source used to create that work.