Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance

Billy CHIU, Anna KORHONEN, Sampo PYYSALO

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

Abstract

The quality of word representations is frequently assessed using correlation with human judgements of word similarity. Here, we question whether such intrinsic evaluation can predict the merits of the representations for downstream tasks. We study the correlation between results on ten word similarity benchmarks and tagger performance on three standard sequence labeling tasks using a variety of word vectors induced from an unannotated corpus of 3.8 billion words, and demonstrate that most intrinsic evaluations are poor predictors of downstream performance. We argue that this issue can be traced in part to a failure to distinguish specific similarity from relatedness in intrinsic evaluation datasets. We make our evaluation tools openly available to facilitate further study.
Original languageEnglish
Title of host publicationProceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP
EditorsOmer LEVY, Felix HILL, Anna KORHONEN, Roi REICHART, Yoav GOLDBERG, Antoine BORDES
PublisherAssociation for Computational Linguistics (ACL)
Pages1-6
Number of pages7
ISBN (Electronic) 9781945626142
DOIs
Publication statusPublished - Aug 2016
Externally publishedYes
EventThe 54th Annual Meeting of the Association for Computational Linguistics: the 1st Workshop on Evaluating Vector-Space Representations for NLP - Berlin, Germany
Duration: 7 Aug 201612 Aug 2016
https://aclanthology.org/volumes/W16-25/

Conference

ConferenceThe 54th Annual Meeting of the Association for Computational Linguistics: the 1st Workshop on Evaluating Vector-Space Representations for NLP
Country/TerritoryGermany
CityBerlin
Period7/08/1612/08/16
Internet address

Bibliographical note

This work has been supported by Medical Research Council grant MR/M013049/1

Fingerprint

Dive into the research topics of 'Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance'. Together they form a unique fingerprint.

Cite this