Some folks use the term “automatic speech recognition”, ASR. I don’t like the separation between recognition and understanding, but that’s where the technology stands.
The term ASR encourages thinking about spoken language at a technical level in which purely inductive techniques are used to generate text from an audio signal (which is hopefully some recorded speech!).
As you may know, I am very interested in what many in ASR consider “downstream” natural language tasks. Nonetheless, I’ve been involved with speech since Carnegie Mellon in the eighties. During Haley Systems, I hired one of the Sphinx fellows who integrated Microsoft and IBM speech products with our natural language understanding software. Now I’m working on spoken-language understanding again…
Most common approaches to ASR these days involve deep learning, such as Baidu’s DeepSpeech. If your notion of deep learning means lots of matrix algebra more than necessarily neural networks, then KALDI is also in the running, but it dates to 2011. KALDI is an evolution from the hidden Markov model toolkit, HTK (once owned by Microsoft). Hidden Markov models (HMM) were the basis of most speech recognition systems dating back to the eighties or so, including Sphinx. All of these are open source and freely licensed.
As everyone knows, ASR performance has improved dramatically in the last 10 years. The primary metric for ASR performance is “word error rate” (WER). Most folks think of WER as the percentage of words incorrectly recognized, although it’s not that simple. WER can be more than 1 (e.g., if you come up with a sentence given only noise!). Here is a comparison published in 2011.
Today, Google, Amazon, Microsoft and others have WER under 10% in many cases. To get there, it takes some talent and thousands of hours of training data. Google is best, Alexa is close, and Microsoft lags a bit in 3rd place. (Click the graphic for the article summarizing Vocalize.io results.)
Continue reading “Simon Says”
Deep natural language understanding (NLU) is different than deep learning, as is deep reasoning. Deep learning facilities deep NLP and will facilitate deeper reasoning, but it’s deep NLP for knowledge acquisition and question answering that seems most critical for general AI. If that’s the case, we might call such general AI, “natural intelligence”.
Deep learning on its own delivers only the most shallow reasoning and embarrasses itself due to its lack of “common sense” (or any knowledge at all, for that matter!). DARPA, the Allen Institute, and deep learning experts have come to their senses about the limits of deep learning with regard to general AI.
General artificial intelligence requires all of it: deep natural language understanding, deep learning, and deep reasoning. The deep aspects are critical but no more so than knowledge (including “common sense”). Continue reading “Natural Intelligence”
What is the part of speech of “subject” in the sentence:
- Are vitamins subject to sales tax in California?
Related questions might include:
- Does California subject vitamins to sales tax?
- Does California sales tax apply to vitamins?
- Does California tax vitamins?
Vitamins is the direct object of the verb in each of these sentences, so, perhaps you would think “subject” is a verb in the subject sentence…
Continue reading “Are vitamins subject to sales tax in California?”
The following is motivated by Section 6359 of the California Sales and Use Tax. It demonstrates how knowledge can be acquired from dictionary definitions:
Here, we’ve taken a definition from WordNet and prefixed it with the word followed by a colon and parsed it using the Linguist.
Continue reading “Dictionary Knowledge Acquisition”
In the spring of 2012, Vulcan engaged Automata for a knowledge acquisition (KA) experiment. This article provides background on the context of that experiment and what the results portend for artificial intelligence applications, especially in the areas of education. Vulcan presented some of the award-winning work referenced here at an AI conference, including a demonstration of the electronic textbook discussed below. There is a video of that presentation here. The introductory remarks are interesting but not pertinent to this article.
Background on Vulcan’s Project Halo
From 2002 to 2004, Vulcan developed a Halo Pilot that could correctly answer between 30% and 50% of the questions on advanced placement (AP) tests in chemistry. The approaches relied on sophisticated approaches to formal knowledge representation and expert knowledge engineering. Of three teams, Cycorp fared the worst and SRI fared the best in this competition. SRI’s system performed at the level of scoring a 3 on the AP, which corresponds to earning course credit at many universities. The consensus view at that time was that achieving a score of 4 on the AP was feasible with limited additional effort. However, the cost per page for this level of performance was roughly $10,000, which needed to be reduced significantly before Vulcan’s objective of a Digital Aristotle could be considered viable.
Continue reading “Background for our Semantic Technology 2013 presentation”