While conventional speech applications limit what you can say in a given application (for example, existing names in your address book), Vlingo does not impose any such limit. Instead, Vlingo systems allow you to say anything and still be recognized properly. This provides some key benefits over current technologies:

  • Accuracy is greatly improved (and is better than humans on many tasks)
  • Any application can be speech enabled
  • By not defining grammars, there is no large up–front work and no speech expertise required to speech–enable an application

This is accomplished through advanced adaptation techniques that learn what users are likely to say to different applications. Central to these techniques is a technology called Hierarchical Language Models (HLMs) that allows Vlingo to scale its learning algorithms to millions of vocabulary words and millions of users.

In order to achieve unprecedented accuracy at massive scales, Vlingo technology includes the following components:

Hierarchical Language Model Based Speech Recognition:

We have replaced constrained grammars and statistical language models with very large vocabulary (millions of words) Hierarchical Language Models (HLMs). These HLMs are based on well-defined statistical models to predict what words users are likely to say and how words are grouped together (for example, "let's meet at ___" is likely to be followed by something like "1 pm" or the name of a place). While there are no hard constraints, the models are able to take into account what this and other users have spoken in the particular text box in the particular application, and therefore improve with usage. Unlike previous generations of statistical language models, the new HLM technology being developed by Vlingo scales to tasks requiring the modeling of millions of possible words (such as open web search, directory assistance, navigation, or other tasks where users are likely to use any of a very large number of words).


In order to achieve high accuracy, Vlingo makes use of significant amounts of automatic and continual adaptation. In addition to adapting the HLMs, the system adapts to many user and application attributes such as learning the speech patterns of individuals and groups of users, learning new words, learning which words are more likely to be spoken into a particular application or by a particular user, and learning pronunciations of words based on usage. Adaptation is applied to individual users (for example, the system learns over time that a particular user tends to ask for Mexican food) as well as across users (a first-time user with a Southern accent benefits from other users who have spoken into the system with a southern accent). Unlike other speech recognition technologies that require intensive manual labor to tune recognition inputs, Vlingo adaptation is automated and comprehensive, leading to continual improvements for users. The adaptation process can be seen in the figure below.


You can see examples of how our technology works in our Video Demos page.