Automatic keyword extraction from documents using conditional random fields

Chengzhi ZHANG, Huilin WANG, Yao LIU, Dan WU, Yi LIAO, Bo WANG

Research output: Journal PublicationsJournal Article (refereed)

124 Citations (Scopus)

Abstract

Keywords are subset of words or phrases from a document that can describe the meaning of the document. Many text mining applications can take advantage from it. Unfortunately, a large portion of documents still do not have keywords assigned. On the other hand, manual assignment of high quality keywords is expensive, time-consuming, and error prone. Therefore, most algorithms and systems aimed to help people perform automatic keywords extraction have been proposed. Conditional Random Fields (CRF) model is a state-of-the-art sequence labeling method, which can use the features of documents more sufficiently and effectively. At the same time, keywords extraction can be considered as the string labeling. In this paper, keywords extraction based on CRF is proposed and implemented. As far as we know, using CRF model in keyword extraction has not been investigated previously. Experimental results show that the CRF model outperforms other machine learning methods such as support vector machine, multiple linear regression model etc. in the task of keywords extraction.
Original languageEnglish
Pages (from-to)1169-1180
Number of pages12
JournalJournal of Computational Information Systems
Volume4
Issue number3
Publication statusPublished - 1 Mar 2008
Externally publishedYes

Fingerprint

Labeling
Linear regression
Support vector machines
Learning systems

Cite this

ZHANG, C., WANG, H., LIU, Y., WU, D., LIAO, Y., & WANG, B. (2008). Automatic keyword extraction from documents using conditional random fields. Journal of Computational Information Systems, 4(3), 1169-1180.
ZHANG, Chengzhi ; WANG, Huilin ; LIU, Yao ; WU, Dan ; LIAO, Yi ; WANG, Bo. / Automatic keyword extraction from documents using conditional random fields. In: Journal of Computational Information Systems. 2008 ; Vol. 4, No. 3. pp. 1169-1180.
@article{2bee698e26e14c63b3fe4e0141511614,
title = "Automatic keyword extraction from documents using conditional random fields",
abstract = "Keywords are subset of words or phrases from a document that can describe the meaning of the document. Many text mining applications can take advantage from it. Unfortunately, a large portion of documents still do not have keywords assigned. On the other hand, manual assignment of high quality keywords is expensive, time-consuming, and error prone. Therefore, most algorithms and systems aimed to help people perform automatic keywords extraction have been proposed. Conditional Random Fields (CRF) model is a state-of-the-art sequence labeling method, which can use the features of documents more sufficiently and effectively. At the same time, keywords extraction can be considered as the string labeling. In this paper, keywords extraction based on CRF is proposed and implemented. As far as we know, using CRF model in keyword extraction has not been investigated previously. Experimental results show that the CRF model outperforms other machine learning methods such as support vector machine, multiple linear regression model etc. in the task of keywords extraction.",
author = "Chengzhi ZHANG and Huilin WANG and Yao LIU and Dan WU and Yi LIAO and Bo WANG",
year = "2008",
month = "3",
day = "1",
language = "English",
volume = "4",
pages = "1169--1180",
journal = "Journal of Computational Information Systems",
issn = "1553-9105",
publisher = "Binary Information Press",
number = "3",

}

ZHANG, C, WANG, H, LIU, Y, WU, D, LIAO, Y & WANG, B 2008, 'Automatic keyword extraction from documents using conditional random fields', Journal of Computational Information Systems, vol. 4, no. 3, pp. 1169-1180.

Automatic keyword extraction from documents using conditional random fields. / ZHANG, Chengzhi; WANG, Huilin; LIU, Yao; WU, Dan; LIAO, Yi; WANG, Bo.

In: Journal of Computational Information Systems, Vol. 4, No. 3, 01.03.2008, p. 1169-1180.

Research output: Journal PublicationsJournal Article (refereed)

TY - JOUR

T1 - Automatic keyword extraction from documents using conditional random fields

AU - ZHANG, Chengzhi

AU - WANG, Huilin

AU - LIU, Yao

AU - WU, Dan

AU - LIAO, Yi

AU - WANG, Bo

PY - 2008/3/1

Y1 - 2008/3/1

N2 - Keywords are subset of words or phrases from a document that can describe the meaning of the document. Many text mining applications can take advantage from it. Unfortunately, a large portion of documents still do not have keywords assigned. On the other hand, manual assignment of high quality keywords is expensive, time-consuming, and error prone. Therefore, most algorithms and systems aimed to help people perform automatic keywords extraction have been proposed. Conditional Random Fields (CRF) model is a state-of-the-art sequence labeling method, which can use the features of documents more sufficiently and effectively. At the same time, keywords extraction can be considered as the string labeling. In this paper, keywords extraction based on CRF is proposed and implemented. As far as we know, using CRF model in keyword extraction has not been investigated previously. Experimental results show that the CRF model outperforms other machine learning methods such as support vector machine, multiple linear regression model etc. in the task of keywords extraction.

AB - Keywords are subset of words or phrases from a document that can describe the meaning of the document. Many text mining applications can take advantage from it. Unfortunately, a large portion of documents still do not have keywords assigned. On the other hand, manual assignment of high quality keywords is expensive, time-consuming, and error prone. Therefore, most algorithms and systems aimed to help people perform automatic keywords extraction have been proposed. Conditional Random Fields (CRF) model is a state-of-the-art sequence labeling method, which can use the features of documents more sufficiently and effectively. At the same time, keywords extraction can be considered as the string labeling. In this paper, keywords extraction based on CRF is proposed and implemented. As far as we know, using CRF model in keyword extraction has not been investigated previously. Experimental results show that the CRF model outperforms other machine learning methods such as support vector machine, multiple linear regression model etc. in the task of keywords extraction.

UR - http://commons.ln.edu.hk/sw_master/7018

M3 - Journal Article (refereed)

VL - 4

SP - 1169

EP - 1180

JO - Journal of Computational Information Systems

JF - Journal of Computational Information Systems

SN - 1553-9105

IS - 3

ER -