Over the past 10 years, commercial AI has enjoyed what we at Amazon call the flywheel effect: customer interactions with AI systems generate data; with more data, machine learning algorithms perform better, which leads to better customer experiences; better customer experiences drive more usage and engagement, which in turn generate more data.
Those data are used to train machine learning systems in three chief ways. The first is supervised learning, in which the training data are hand-labeled (with, say, words’ parts of speech or the names of objects in an image) and the system learns to apply labels to unlabeled data. A variation of this is weakly supervised learning, which uses easily acquired but imprecise labels to enable machine learning at scale. If a website visitor performs a search, for instance, the links she clicks indicate which search results should have been at the top of the list; that kind of implicit information can be used to automatically label data.
Training with entirely unlabeled data is called unsupervised learning. There, the most common approach is to cluster data together according to structural features; the clusters themselves define classification categories. Finally, semi-supervised learning leverages a small amount of labeled training data to extract information from much larger stores of unlabeled training data.
In recent AI research, supervised learning has predominated. But today, commercial AI systems generate far more customer interactions than we could begin to label by hand. The only way to continue the torrid rate of improvement that commercial AI has delivered so far is to reorient ourselves toward semi-supervised, weakly supervised and unsupervised learning. Our systems need to learn how to improve themselves.
The most common approach to semi-supervised learning is self-training, in which a machine learning system trained on a smattering of labeled data itself applies labels to a much larger set of unlabeled data. Because machine learning systems are statistical, their outputs have associated confidence scores. The outputs of the system are sorted according to confidence score, and those that fall within the right confidence window are used to train the system further. The system, in other words, is retrained on data that it has labeled itself.
Typically, self-training works best with high-confidence training examples. But in some contexts, Amazon researchers have found that lower-confidence examples offer greater performance improvements, as they’re more likely to capture nuances that the system hasn’t already learned.
Another way to leverage small amounts of labeled data is to lump it together with unlabeled data and apply some kind of unsupervised clustering algorithm to the result. For instance, sentences can be automatically embedded in a high-dimensional space, where they’re grouped together according to how frequently their component words co-occur with other words. Then, algorithms can generalize the labels of the labeled sentences to the unlabeled sentences in the same clusters, dramatically expanding the number of training examples available to a natural-language-understanding system.
Companies that depend on machine learning for real-time data classification have an additional semi-supervised–training option. That’s to use labeled data to train a powerful but impractically slow neural network, then use that network to label training data for a leaner, more efficient real-time network. Amazon researchers are using this approach across a range of business units.
Frequently, AI companies can also use customer feedback to automatically label data. For instance, the numerical (star) ratings associated with product reviews on Amazon.com could provide automatic data labels for a weakly supervised machine learning system that tries to infer customer sentiment from linguistic cues.
Customers of the Amazon Alexa voice service don’t typically rate Alexa’s responses to individual requests, but their interactions with Alexa do provide useful implicit signals. If Alexa’s initial response to a request is unsatisfying, the customer might cut the response off and rephrase the request. If the response to the rephrased request is allowed to play out, it’s a strong signal that the first request should have elicited the same response.
Alexa automatically analyzes a large number of such rephrased requests every month, learning how to rewrite the most common of them. That’s why, for instance, if you say to Alexa, “Play ‘Radioactive’ by Magic Dragons,” she’ll respond, “Playing ‘Radioactive’ by Imagine Dragons.”

Currently, Alexa’s rewrite procedures are general: anyone who requests music by Magic Dragons has the same likelihood of receiving music from Imagine Dragons instead. But the underlying technology could be adapted to provide customers with personalized query responses. It may be, for instance, that among the many, many customers requesting music by Imagine Dragons, there are a few who are really trying to find the Magic Dragons, the former Wednesday-night house band at the Spread Eagle pub in Ipswich, England.
Amazon researchers are exploring a host of other techniques for doing unsupervised learning, from monitoring the ordinary operating parameters of cloud servers in order to recognize anomalies; to using the Amazon.com product hierarchy to draw connections between customers’ product searches; to bootstrapping natural-language–understanding systems in new languages by automatically translating texts into a language with existing machine learning systems, using those systems to label the text, and automatically translating the labeled text back into the target language.