Last Thursday (14 February) the non-profit research agency OpenAI released a new language model that can generate convincing passages. In fact, so convincing that the researchers have refrained from open-sourcing the code, hoping to block potential armament as a means of producing fake news.
Although the impressive results are a remarkable leap above what existing language models have achieved, the technique involved is not exactly new. Instead, the breakthrough was mainly due to the algorithm giving more and more training data – a trick that has also been responsible for most other recent advancements in learning to read and write AI. “They are quite surprising people in terms of what you can do with ( .) more data and larger models,” says Percy Liang, computer science professor at Stanford.
Sign up for The algorithm – artificial intelligence, demystified
The text passages that the model produces are good enough to present themselves as something written human. But this ability should not be confused with a true understanding of language – the ultimate goal of the AI subfield known as natural-language processing (NLP). (There is an analog in computer vision: an algorithm can synthesize very realistic images without real visual insight.) It is even a task that NLP researchers have largely ignored to get machines to that level. That goal can last for years, even decades, Liang achieves, and probably suspects techniques that do not yet exist.
Four different language philosophies currently guide the development of NLP techniques. Let’s start with the one used by OpenAI.
# 1. Distributive semantics
Language philosophy. Words derive meaning from how they are used. For example, the words “cat” and “dog” are related to meaning because they are used more or less in the same way. You can feed and pet a cat, and you give a dog to pet and pet. However, you cannot feed and pet orange.
Related story
How it translates to NLP. Distribution semantics algorithms have largely been responsible for recent breakthroughs in NLP. They use machine learning to process text, find patterns by essentially counting how often and how closely words are used in relation to each other. The resulting models can then use those patterns to construct complete sentences or paragraphs and control things such as autocomplete or other predictive text systems. In recent years, some researchers have also started experimenting with looking at the distributions of random strings instead of words, so that models can be more flexible with acronyms, punctuation, jargon and other things that are not in the dictionary, as well as languages that do not have clear have a delimitation between words.
Pros. These algorithms are flexible and scalable because they can be applied in any context and can learn from non-labeled data.
Cons. The models that they produce do not really understand the sentences that they construct. Eventually they write prose with word associations.
# 2. Frame semantics
Language philosophy. Language is used to describe actions and events, so sentences can be divided into topics, verbs and modifiers – who, what, where and when.
How it translates to NLP. Frame semantics based algorithms use a set of rules or many labeled training data to deconstruct sentences. This makes them particularly good at parsing simple commands – and therefore useful for chatbots or voice assistants. For example, if you asked Alexa to “find a restaurant with four stars,” such an algorithm would figure out how to implement the sentence by splitting it into the action (“search”), the what (“restaurant with four stars”) stars’) and the time (‘tomorrow’).
Pros. Unlike distribution semantic algorithms, which do not understand the text from which they learn, frame semantic algorithms can distinguish the different pieces of information in a sentence. These can be used to answer questions such as “When will this event take place?”
Cons. These algorithms can only process very simple sentences and therefore do not catch the nuance. Because they require a lot of context-specific training, they are also not flexible.
# 3. Model theoretical semantics
Language philosophy. Language is used to communicate human knowledge.
How it translates to NLP. Model theoretical semantics is based on an old idea in AI that all human knowledge can be coded or modeled in a series of logical rules. So if you know that birds can fly, and eagles are birds, then you can deduce that eagles can fly. This approach is no longer in vogue, because researchers soon realized that there were too many exceptions to each rule (for example, penguins are birds but cannot fly). But algorithms based on model theoretical semantics are still useful for extracting information from models of knowledge, such as databases. Like algorithms for frame semantics, they parse sentences by deconstructing them into parts. But while frame semantics defines those parts as the who, what, where and when, model theoretical semantics defines them as the logical rules that encode knowledge. For example, consider the question “What is Europe’s largest city by population?” are in Europe? “” What are the populations of the cities? “” Which population is the largest? “It would then be able to traverse the model of knowledge to give you your definitive answer.
Pros. These algorithms give machines the ability to answer complex and nuanced questions.
Cons. They require a knowledge model that is time-consuming to build and that are not flexible in different contexts.
# 4. Grounded semantics
Language philosophy. Language derives meaning from lived experience. In other words, people created language to achieve their goals, so it must be understood in the context of our goal-oriented world.
How it translates to NLP. This is the newest approach and the one that holds the most promise according to Liang. It tries to mimic how people pick up language in the course of their lives: the machine starts with an empty state and learns to associate words with the correct meanings through conversation and interaction. In a simple example, if you want to teach a computer how to move objects in a virtual world, you would give it a command such as “Move the red block to the left” and then show what you meant. Over time, the machine would learn to understand and execute the commands without assistance.
Pros. In theory, these algorithms must be very flexible and come closest to a true understanding of language.
Cons. Teaching is very time consuming – and not all words and sentences are as easy to illustrate as “Move the red block.”
Liang believes that in the field of NLP much more progress will be made in the short term in exploiting existing techniques, in particular those based on distribution semantics. But in the longer term, he believes, they all have limits. “There is probably a qualitative gap between the way people understand language and perceive the world and our current models,” he says. Closing that gap would probably require a new way of thinking, he adds, as well as much more time.
This originally appeared in our AI newsletter The Algorithm. Register here for free to have it delivered directly to your inbox.