一种基于文本相似度矩阵运算的非结构化海量投诉数据分类算法

Translated title of the contribution: A text similarity matrix operation-based classification algorithm for large-scale unstructured complaint data

李青, 陈阳, 谢浩然, 蒙圣光

Research output: Journal PublicationsJournal Article (refereed)

Abstract

随着互联网和信息技术的日新月异,非结构化数据量有呈几何级数增长的趋势。尤其是Web 2.0网络社区的流行与火爆,使得增长趋势得到了进一步的加速。因此,面对海量的非结构化数据,如何有效地管理和组织它们,以便于终端用户进行信息存取,成为了一个迫在眉睫的重要研究课题。本文通过对非结构化数据的文本的建模和文本相似度比较,对于大规模非结构化数据的分类算法进行了讨论和研究,并将此算法应用到了中国移动的投诉数据分类系统中。在系统实施后,非常有效地提高了投诉数据的处理效率,从而印证所提出分类算法及系统框架的有效性。
With the fast development of the Internet and information technology nowadays, the growth of the volume of unstructured data is exponential. In particular, the prevalence of the Web 2.0 network community further enlarges the growth tendency. Therefore, how to manage and organize large-scale unstructured data effectively, so as to facilitate end-user information access, becomes an urgent and important research topic. In this paper, based on the text of unstructured data modeling and text similarity, the existing large-scale unstructured data classification algorithms are surveyed and discussed, and they are applied to a China Mobile user complaint data classification system. Upon the latter, the effectiveness of processing the complaint data is shown to have been much improved, and the usage of our proposed classification algorithm and system architecture is verified.
Original languageChinese (Simplified)
Pages (from-to)103-107
Number of pages5
Journal计算机工程与科学 = Computer Engineering & Science
Volume34
Issue number1
Early online date27 Apr 2012
DOIs
Publication statusPublished - 2012
Externally publishedYes

    Fingerprint

Keywords

  • 文本相似度
  • 非结构化数据
  • 投诉数据分类系统
  • text similarity
  • unstructured data
  • complaint data classification system

Cite this