LS-BiLLMs: Label supervised bi-directional large language models for token- and sequence-level information extraction

  • Zongxi LI*
  • , Xianming LI
  • , Jing LI
  • , Haoran XIE
  • , Fu Lee WANG
  • , Qing LI
  • *Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Large Language Models (LLMs) have achieved remarkable generative capabilities but often underperform in sequence- and token-level classification tasks due to the causal masking constraint in decoder-only architectures. This unidirectional attention prevents tokens from accessing bidirectional context, limiting representation learning for discriminative prediction. We propose Label-Supervised Bi-directional Large Language Models (LS-BiLLMs), a lightweight adaptation method that (1) employs direct label supervision to align latent representations with task-specific labels and (2) removes the causal mask to enable bidirectional information flow. Implemented with LoRA-based fine-tuning, LS-BiLLMs efficiently adapt compact open-weight LLMs, such as LLaMA, Qwen, and Mistral, for classification without complex prompt engineering. Experiments across text classification, named-entity recognition, and commonsense reasoning benchmarks show consistent gains over instruction-tuned and encoder-based baselines. While unmasking sacrifices autoregressive generation, it substantially enhances discriminative understanding and efficiency. These findings reveal how causal directionality in attention mechanisms affects representational learning and reasoning in modern LLMs.
Original languageEnglish
Article number104568
JournalInformation Processing and Management
Volume63
Issue number4
Early online date7 Jan 2026
DOIs
Publication statusE-pub ahead of print - 7 Jan 2026

Funding

Zongxi Li and Haoran Xie have been supported by Lingnan University through Faculty Research Grants (SDS24A2, SDS24A8, SDS24A12, SDS24A19), Direct Grant (No. DR25E8), and Lam Woo Research Fund (No. LWP20040), and by the Hong Kong Research Grants Council through the Faculty Development Schemes (No. UGC/FDS16/E10/23). Xianming Li and Jing Li are partially supported by a grant from the Hong Kong Research Grants Council (Project No. PolyU/25200821), the Innovation and Technology Fund (Project No. PRP/047/22FX), and a gift fund from Huawei (N-ZGM3). Qing Li has been supported by the Hong Kong Research Grants Council under the General Research Fund (No. 15216225).

Keywords

  • Large language models
  • Natural language processing
  • Named entity recognition
  • Sequence classification
  • Token classification

Fingerprint

Dive into the research topics of 'LS-BiLLMs: Label supervised bi-directional large language models for token- and sequence-level information extraction'. Together they form a unique fingerprint.

Cite this