Abstract
由于基于排序熵的有序决策树在扩展属性选取时,需计算每个条件属性的每个割点处的排序互信息,并通过对比这些排序互信息的大小来确定最大值(最大值对应的属性为扩展属性),计算复杂度较高。针对此问题,文中将割点分为平衡割点和非平衡割点两部分,建立一个数学模型,从理论上证明排序互信息最大值不会在平衡割点处达到,而只能在非平衡割点处达到。这说明在计算排序互信息时只需遍历非平衡割点,而无需再计算平衡割点处的值,从而使决策树构建的计算效率得到较大程度提高。数值实验验证此结果。
When the expanded attributes are selected for decision tree learning based on rank entropy, computing the rank mutual information of every single cut for each of the continuous-valued attributes is required to get the expanded attribute by comparing the values of rank mutual information. Therefore, the computational complexity is high. Aiming at this problem, cut-points are divided into stable and unstable cut-points and a mathematical model is established in this paper. The proposed model theoretically proves that the rank mutual information function achieves its maximum not at stable cut-points, but at unstable cut-points. The result means that in the algorithm only traversing the unstable cut-points is required instead of computing the values of the stable cut-points. Thus, the computational efficiency of building decision trees is greatly improved, which is confirmed by the numerical experimental results.
Translated title of the contribution | Improved ordinal decisions trees algorithms based on rank entropy |
---|---|
Original language | Chinese (Simplified) |
Pages (from-to) | 134-140 |
Number of pages | 7 |
Journal | 模式识别与人工智能 = Pattern Recognition and Artificial Intelligence |
Volume | 27 |
Issue number | 2 |
DOIs | |
Publication status | Published - Feb 2014 |
Externally published | Yes |
Bibliographical note
资助基金: 国家自然科学基金 (61170040);河北省自然科学基金 (F2013201110, F2013201220)Keywords
- Ordinal classification
- Ordinal decision tree
- Unstable cut-point
- 有序分类
- 非平衡割点
- 有序决策树