Increasingly, to cut down on both training time and data collection, natural language processing researchers are turning to cross-lingual transfer learning, a technique which entails training an AI system in one language before retraining it in another. For instance, scientists at Amazon’s Alexa division recently employed it to adopt an English language model to German. And in a new paper (“Cross-lingual Transfer Learning for Japanese Named Entity Recognition”) scheduled to be presented at the upcoming North American chapter of the Association for Computational Linguistics conference in Minneapolis, they expanded the scope of their work to transfer an English-language model to Japanese.
“Transfer learning between European languages and Japanese has been little-explored because of the mismatch between character sets,” explained Alexa AI Natural Understanding Group researcher Judith Gaspers in a blog post. To solve this, she and colleagues devised a named-entity recognition system — a system trained to identify the names in utterances and to categorize these names (e.g., song names, sports team names, city names) automatically — that took as inputs both Japanese characters and their Roman-alphabet transliterations.
As with most natural language systems, the inputs were in the form of embeddings — word embeddings and character embeddings — produced by a model trained to represent data as vectors, or strings of coordinates. It first split words into all of their component parts and then mapped them in a multidimensional space, such that word embeddings close to each other had similar meanings.
Pairs of characters from each word were embedded separately in the system, and next passed to a bidirectional long short-term memory (LSTM) AI model that processed them in order forward and backward so that each output reflected the inputs and outputs that preceded it. Then, the concatenated output of the character-level bidirectional LSTM with the word-level embedding were passed to a second bidirectional LSTM that processed all words of the input utterance in the sequence, enabling it to capture “information about each input word’s roots and affixes, intrinsic meaning, and context within the sentence,” according to Gaspers. Lastly, this representation was passed to a third network that did the actual classifying of named entities.
The systems were trained end to end so that they learned to produce representations useful for named-entity recognition. In tests involving two public data sets, the transferred model with Romanization of Japanese words achieved improvements of 5.9% and 7.4% in F1 score, a composite score that measures both false-positive and false-negative rates.
Furthermore, after experimenting with three different data sets (two public data sets and a proprietary data set), the researchers discovered that using Japanese characters as inputs to a particular module of the English-language system (the representation module) but Romanized characters as inputs to another module (the character representation module), the F1 score increased. This was particularly true of smaller data sets: on an in-house data set with 500,000 entries, the improvement in F1 score from transfer learning was 0.6%, and the transfer-learned model outperformed a model trained from scratch on a million examples.
“Even at larger scales, transfer learning could still enable substantial reductions in data requirements,” said Gaspers.