Out-of-vocabulary word embedding learning based on reading comprehension mechanism

Zhongyu ZHUANG, Ziran LIANG, Yanghui RAO, Haoran XIE, Fu Lee WANG

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Currently, most natural language processing tasks use word embeddings as the representation of words. However, when encountering out-of-vocabulary (OOV) words, the performance of downstream models that use word embeddings as input is often quite limited. To solve this problem, the latest methods mainly infer the meaning of OOV words based on two types of information sources: the morphological structure of OOV words and the contexts in which they appear. However, the low frequency of OOV words themselves usually makes them difficult to learn in pre-training tasks by general word embedding models. In addition, this characteristic of OOV word embedding learning also brings the problem of context scarcity. Therefore, we introduce the concept of “similar contexts” based on the classical “distributed hypothesis” in linguistics, by borrowing from the human reading comprehension mechanisms to make up for the deficiency of insufficient contexts in previous OOV word embedding learning work. The experimental results show that our model achieved the highest relative scores in both intrinsic and extrinsic evaluation tasks, which demonstrates the positive effect of the “similar contexts” introduced in our model on OOV word embedding learning.
Original languageEnglish
Article number100038
JournalNatural Language Processing Journal
Volume5
Early online date31 Oct 2023
DOIs
Publication statusPublished - Dec 2023

Keywords

  • Out-of-Vocabulary words
  • Word embedding
  • Reading comprehension mechanism

Fingerprint

Dive into the research topics of 'Out-of-vocabulary word embedding learning based on reading comprehension mechanism'. Together they form a unique fingerprint.

Cite this