Multi-label topic classification for COVID-19 literature annotation: A BioBERT-based feature enhancement approach

Conference: CIBDA 2022 - 3rd International Conference on Computer Information and Big Data Applications
03/25/2022 - 03/27/2022 at Wuhan, China

Proceedings: CIBDA 2022

Pages: 4Language: englishTyp: PDF

Authors:
Wang, Xin; Wang, Jian; Tang, Wentai; Zhang, Hongtong (College of Computer Science and Technology, Dalian University of Technology, Dalian, China)

Abstract:
With the rapid expansion and exponential growth of biomedical literatures, especially in the current environment of COVID-19 pandemic, it is urgent to explore an effective technology to automatically manage and categorize massive information for biomedical texts. The wide application and powerful performance of BERT have shown promising results in the field of natural language processing. Thus, we first choose the improved pre-trained language models CovidBERT and BioBERT as the basis, from the best performance of which further enhances semantic representation of abstract with extra title information. Finally, a novel feature enhancement method is proposed to exploit and integrate the distribution of label information effectively. The experimental results show that our model achieves an instance-based F1 score, precision and recall of 93.94%, 93.5% and 94.38% in the task of multi-label topic classification from track 5 BioCreative VII.