In the ever-evolving landscape of agricultural technology, a groundbreaking study published in *智慧农业* is set to revolutionize how we extract and utilize critical information from Chinese texts related to tea pest and disease management. Led by XIE Yuxin, WEI Jiangshu, ZHANG Yao, and LI Fang from the College of Information Engineering at Sichuan Agricultural University, this research introduces a novel named entity recognition (NER) method that promises to enhance the accuracy and efficiency of intelligent agricultural information systems.
Named entity recognition is a cornerstone of natural language processing (NLP), enabling machines to identify and categorize key information within text. While NER has made significant strides in general domains like news and social media, the agricultural sector has lagged due to a lack of specialized datasets and the unique challenges posed by domain-specific texts. “Traditional sequence labeling models often struggle with nested and long-span entities, leading to poor segmentation and labeling performance,” explains lead author XIE Yuxin. “Our research addresses these challenges by developing a tailored NER approach that significantly improves recognition accuracy in the context of tea pest and disease.”
The proposed model comprises two core modules: a boundary prediction module and a label enhancement module. The boundary prediction module employs an attention-based mechanism to dynamically estimate the probability that consecutive tokens belong to the same entity, effectively addressing boundary ambiguity. The label enhancement module refines entity recognition using a biaffine classifier that jointly models entity spans and their corresponding category labels. This joint modeling approach captures intricate interactions between span representations and semantic label information, improving the identification of long or syntactically complex entities.
To reduce computational complexity while preserving model effectiveness, the architecture incorporates low-rank linear layers. These layers, constructed by integrating the adaptive channel weighting mechanism of Squeeze-and-Excitation Networks with low-rank decomposition techniques, replace traditional linear transformations, yielding improvements in both efficiency and representational capacity.
In addition to model development, the researchers constructed a domain-specific NER corpus through the systematic collection and annotation of entity information related to tea pest and disease from scientific literature, agricultural technical reports, and online texts. The annotated entities in the corpus were categorized into ten classes, including tea plant diseases, tea pests, disease symptoms, and pest symptoms. Based on this labeled corpus, a Chinese NER dataset focused on tea pest and disease was developed, referred to as the Chinese tea pest and disease dataset.
Extensive experiments were conducted on the constructed dataset, comparing the proposed method with several mainstream NER approaches, including traditional sequence labeling models (e.g., BiLSTM-CRF), lexicon-enhanced models (e.g., SoftLexicon), and boundary smoothing strategies (e.g., Boundary Smooth). The results were impressive, with the proposed model achieving higher F1-Scores across all used datasets: 0.68% on the self-built dataset, 0.29% on ResumeNER, 0.96% on WeiboNER, 0.7% on CLUENER, and 0.5% on Taobao. “These outcomes demonstrate the model’s superior capacity for capturing intricate entity boundaries and semantics, confirming its robustness and adaptability when compared to current state-of-the-art methods,” notes co-author WEI Jiangshu.
The implications of this research for the agriculture sector are profound. Accurate and efficient NER can streamline information retrieval, knowledge graph construction, and decision-making processes, ultimately enhancing pest and disease management practices. “By improving the recognition of complex, nested, and long-span entities, our model can provide more precise and actionable insights to farmers and agricultural professionals,” says ZHANG Yao. “This can lead to better resource allocation, timely interventions, and ultimately, improved crop yields and quality.”
The study’s findings also highlight the broader applicability of the proposed NER approach. By demonstrating strong adaptability and robustness across both newly constructed and publicly available datasets, the model indicates its potential for use in other specialized domains within agriculture and beyond. “Our research not only addresses a critical gap in agricultural NLP but also sets a new benchmark for entity recognition in complex and fine-grained contexts,” adds LI Fang.
As the agriculture sector continues to embrace digital transformation, the integration of advanced NLP techniques like the one proposed by XIE Yuxin and their team will be instrumental in driving innovation and efficiency. This research, published in *智慧农业* and led by the College of Information Engineering at Sichuan Agricultural University, marks a significant step forward in the quest for smarter, more sustainable agricultural practices. The future of agri-tech is here, and it is powered by the precision of named entity recognition.

