A lot of recent work has advanced the learning of increasingly context-sensitive distributed representations (i.e., so-called ’embeddings’). In particular. DeepMind’s paper on “Contrastive Predictive Coding” (CPC) is particularly interesting and advances on a number of fronts. For example, in wav2vec, Facebook AI Research (FAIR) uses CPC to obtain apparently superior acoustic modeling results to DeepSpeech’s connectionist temporal classification (CTC) approach. In the CPC paper, the following image is particularly striking, harkening back to the early notion of a Grandmother Cell.
This is an important paper in the development of neural reasoning capabilities which should reduce the brittleness of purely symbolic approaches: Neural Logic Machine
The potential reasoning capabilities, such as with regard to multi-step inference, as in problem solving and theorem proving, are most interesting, but there are important contemporary applications in machine learning and question answering. I’ll just provide a few hightlights from the paper on the latter and some more points and references on the former below.
When I wrote Are Vitamins Subject to Sales Tax, I was addressing the process of translating knowledge expressed in formal documents, like laws, regulations, and contracts, into logic suitable for inference using the Linguist.
Recently, one of my favorite researchers working in natural language processing and reasoning, Luke Zettlemoyer, is among the authors of Entailment-driven Extracting and Editing for Conversational
Machine Reading. This is a very nice turn towards knowledge extraction and inference that improves on superficial reasoning by textual entailment (RTE).
I recommend this paper, which relates to BERT, which is among my current favorites in deep learning for NL/QA. Here is an image from the paper, FYI:
Deep natural language understanding (NLU) is different than deep learning, as is deep reasoning. Deep learning facilities deep NLP and will facilitate deeper reasoning, but it’s deep NLP for knowledge acquisition and question answering that seems most critical for general AI. If that’s the case, we might call such general AI, “natural intelligence”.
Deep learning on its own delivers only the most shallow reasoning and embarrasses itself due to its lack of “common sense” (or any knowledge at all, for that matter!). DARPA, the Allen Institute, and deep learning experts have come to their senses about the limits of deep learning with regard to general AI.
General artificial intelligence requires all of it: deep natural language understanding, deep learning, and deep reasoning. The deep aspects are critical but no more so than knowledge (including “common sense”). Continue reading “Natural Intelligence”
A nationwide physical therapy business is renowned for eliminating chronic pain. One unique aspect of this business is that it eliminates pain not by manipulation but by providing clients with expertly selected sequences of exercises that address problems in their functional anatomy. In effect, the business helps people fix themselves and teaches them to maintain their musculoskeletal function for pain-free life.
The business has dozens of clinics with many more therapists. Its expertise comes primarily from its founder and a number of long-time employees who have learned through experience. We have been engaged to assist with several challenges on several occasions.
Recently, we “read” ten thousand recipes or so from a cooking web site. The purpose of doing so was to produce a formal representation of those recipes for use in temporal reasoning by a robot.
Our task was to produce ontology by reading the recipes subject to conflicting goals. On the one hand, the ontology was to be accurate so that the robot could reason, plan, and answer questions robustly. On the other hand, the ontology was to be produced automatically (with minimal human effort).
In order to minimize human effort while still obtaining deep parses from which we produce ontology, we used more techniques from statistical natural language processing than we typically do in knowledge acquisition for deep QA, compliance, or policy automation. (Consider that NLP typically achieves less than 90% syntactic accuracy while such work demands near 100% semantic accuracy.)
In the effort, we refined some prior work on representing words as vectors and semi-supervised learning. In particular, we adapted semi-supervised, active learning similar to Stratos & Collins 2015 using enhancements to the canonical correlation analysis (CCA) of Dhillon et al 2015 to obtain accurate part of speech tagging, as conveyed in the following graphic from Stratos & Collins:
This provides some background relating to some work we did on part of speech tagging for a modest, domain-specific corpus. The path is from Hsu et al 2012, which discusses spectral methods based on singular value decomposition (SVD) as a better method for learning hidden Markov models (HMM) and the use of word vectors instead of clustering to improve aspects of NLP, such as part of speech tagging.
The use of vector representations of words for machine learning in natural language is not all that new. A quarter century ago, Brown et al 1992 introduced hierarchical agglomerative clustering of words based on mutual information and the use of those clusters within hidden Markov language models. One notable difference versus today’s word vectors is that paths through the hierarchy of clusters to words at the leaves correspond to vectors of bits (i.e., Boolean features) rather than real-valued features.
Over the last 25 years we have developed two generations of AI systems for physical therapy. The first was before the emergence of the Internet, when Windows was 16 bits. There were no digital cameras, either. So, physical therapists would take Polaroid pictures; anterior and left and right lateral pictures or simply eyeball a patient and enter their posture into the computer using a diagram like the following: