skip to main content
research-article

A comparative analysis of knowledge injection strategies for large language models in the scholarly domain

Published: 24 July 2024 Publication History

Abstract

In recent years, transformer-based models have emerged as powerful tools for natural language processing tasks, demonstrating remarkable performance in several domains. However, they still present significant limitations. These shortcomings become more noticeable when dealing with highly specific and complex concepts, particularly within the scientific domain. For example, transformer models have particular difficulties when processing scientific articles due to the domain-specific terminologies and sophisticated ideas often encountered in scientific literature. To overcome these challenges and further enhance the effectiveness of transformers in specific fields, researchers have turned their attention to the concept of knowledge injection. Knowledge injection is the process of incorporating outside knowledge into transformer models to improve their performance on certain tasks. In this paper, we present a comprehensive study of knowledge injection strategies for transformers within the scientific domain. Specifically, we provide a detailed overview and comparative assessment of four primary methodologies, evaluating their efficacy in the task of classifying scientific articles. For this purpose, we constructed a new benchmark including both 24K labelled papers and a knowledge graph of 9.2K triples describing pertinent research topics. We also developed a full codebase to easily re-implement all knowledge injection strategies in different domains. A formal evaluation indicates that the majority of the proposed knowledge injection methodologies significantly outperform the baseline established by Bidirectional Encoder Representations from Transformers.

References

[1]
Aggarwal T., Salatino A., Osborne F., Motta E., R-classify: Extracting research papers’ relevant concepts from a controlled vocabulary, Softw. Impacts 14 (2022),. Publisher: Elsevier.
[2]
Al-Moslmi T., Ocaña M.G., Opdahl A.L., Veres C., Named entity extraction for knowledge graphs: A literature overview, IEEE Access 8 (2020) 32862–32881.
[3]
Alkaissi H., McFarlane S.I., Artificial hallucinations in ChatGPT: implications in scientific writing, Cureus 15 (2) (2023).
[4]
Amizadeh S., Palangi H., Polozov O., Huang Y., Koishida K., Neuro-symbolic visual reasoning: Disentangling “visual” from “reasoning”, 2020, arXiv:2006.11524.
[5]
Angioni S., Salatino A., Osborne F., Recupero D.R., Motta E., The AIDA dashboard: Analysing conferences with semantic technologies, in: 19th International Semantic Web Conference, ISWC 2020, 2020, URL http://oro.open.ac.uk/72293/.
[6]
Angioni S., Salatino A., Osborne F., Recupero D.R., Motta E., AIDA: A knowledge graph about research dynamics in academia and industry, Quant. Sci. Stud. 2 (4) (2021) 1356–1398.
[7]
Auer S., Barone D.A., Bartz C., Cortes E.G., Jaradeh M.Y., Karras O., Koubarakis M., Mouromtsev D., Pliukhin D., Radyush D., et al., The SciQA scientific question answering benchmark for scholarly knowledge, Sci. Rep. 13 (1) (2023) 7240.
[8]
Barbieri F., Camacho-Collados J., Espinosa Anke L., Neves L., TweetEval: Unified benchmark and comparative evaluation for tweet classification, in: Findings of the Association for Computational Linguistics, EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 1644–1650,. URL https://aclanthology.org/2020.findings-emnlp.148.
[9]
Beck M., Rizvi S.T.R., Dengel A., Ahmed S., From automatic keyword detection to ontology-based topic modeling, in: International Workshop on Document Analysis Systems, Springer, 2020, pp. 451–465,.
[10]
Borges M.V.M., dos Reis J.C., Semantic-enhanced recommendation of video lectures, in: 2019 IEEE 19th International Conference on Advanced Learning Technologies, vol. 2161, ICALT, IEEE, 2019, pp. 42–46,.
[11]
Bosselut A., Rashkin H., Sap M., Malaviya C., Celikyilmaz A., Choi Y., COMET: Commonsense transformers for automatic knowledge graph construction, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 4762–4779,. URL https://aclanthology.org/P19-1470.
[12]
Caselli T., Basile V., Mitrovic J., Granitzer M., HateBERT: Retraining BERT for abusive language detection in english, 2020, CoRR abs/2010.12472. URL https://arxiv.org/abs/2010.12472.
[13]
Chamorro-Padial J., Rodríguez-Sánchez R., Attention-survival score: A metric to choose better keywords and improve visibility of information, Algorithms 16 (4) (2023),. URL https://www.mdpi.com/1999-4893/16/4/196.
[14]
Chari S., Seneviratne O., Ghalwash M., Shirai S., Gruen D.M., Meyer P., Chakraborty P., McGuinness D.L., Explanation Ontology: A general-purpose, semantic representation for supporting user-centered explanations, Semant Web Preprint (Preprint) (2023) 1–31,. Publisher: IOS Press.
[15]
Chatzopoulos S., Vergoulis T., Kanellos I., Dalamagas T., Tryfonopoulos C., Artsim: improved estimation of current impact for recent articles, in: ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium, Springer, 2020, pp. 323–334,.
[16]
Chessa A., Fenu G., Motta E., Osborne F., Recupero D.R., Salatino A.A., Secchi L., Data-driven methodology for knowledge graph generation within the tourism domain, IEEE Access 11 (2023) 67567–67599,.
[17]
Dessí D., Osborne F., Recupero D.R., Buscaldi D., Motta E., SCICERO: A deep learning and NLP approach for generating scientific knowledge graphs in the computer science domain, Knowl.-Based Syst. 258 (2022).
[18]
Dessí D., Osborne F., Reforgiato Recupero D., Buscaldi D., Motta E., CS-kg: A large-scale knowledge graph of research entities and claims in computer science, in: The Semantic Web–ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23–27, 2022, Proceedings, Springer, 2022, pp. 678–696.
[19]
Devlin J., Chang M.-W., Lee K., Toutanova K., BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186,. URL https://www.aclweb.org/anthology/N19-1423.
[20]
Dodge J., Ilharco G., Schwartz R., Farhadi A., Hajishirzi H., Smith N.A., Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping, 2020, CoRR abs/2002.06305. URL https://arxiv.org/abs/2002.06305.
[21]
Emelin D., Bonadiman D., Alqahtani S., Zhang Y., Mansour S., Injecting domain knowledge in language models for task-oriented dialogue systems, 2022, arXiv:2212.08120.
[22]
Gangopadhyay B., Hazra S., Dasgupta P., Semi-lexical languages: a formal basis for using domain knowledge to resolve ambiguities in deep-learning based computer vision, Pattern Recognit. Lett. 152 (2021) 143–149,. URL https://www.sciencedirect.com/science/article/pii/S0167865521003615.
[23]
Gao S., Alawad M., Young M.T., Gounley J., Schaefferkoetter N., Yoon H.J., Wu X.C., Durbin E.B., Doherty J., Stroup A., et al., Limitations of transformers on clinical text classification, IEEE J. Biomed. Health Inform. 25 (9) (2021) 3596–3607.
[24]
Gosangi R., Arora R., Gheisarieha M., Mahata D., Zhang H., On the use of context for predicting citation worthiness of sentences in scholarly articles, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, 2021, pp. 4539–4545,. URL https://aclanthology.org/2021.naacl-main.359.
[25]
Guu K., Lee K., Tung Z., Pasupat P., Chang M.W., REALM: Retrieval-augmented language model pre-training, in: Proceedings of the 37th International Conference on Machine Learning, ICML ’20, JMLR.org, 2020.
[26]
Han K., Incorporating knowledge resources into natural language processing techniques to advance academic research and application development, 2023.
[27]
Hitzler P., A review of the semantic web field, Commun. ACM 64 (2) (2021) 76–83.
[28]
Joshi M., Lee K., Luan Y., Toutanova K., Contextualized representations using textual encyclopedic knowledge, 2021, arXiv:2004.12006.
[29]
Kalyan K.S., Rajasekharan A., Sangeetha S., AMMUS : A survey of transformer-based pretrained models in natural language processing, 2021, CoRR abs/2108.05542. URL https://arxiv.org/abs/2108.05542.
[30]
Ke P., Ji H., Liu S., Zhu X., Huang M., SentiLARE: Sentiment-aware language representation learning with linguistic knowledge, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP, Association for Computational Linguistics, Online, 2020, pp. 6975–6988,. URL https://aclanthology.org/2020.emnlp-main.567.
[31]
Kim S.W., Gil J.M., Research paper classification systems based on TF-IDF and LDA schemes, Hum. Centr. Comput. Inf. Sci. 9 (2019) 1–21.
[32]
Kumar K., Geotechnical Parrot Tales (GPT): Overcoming GPT hallucinations with prompt engineering for geotechnical applications, 2023, arXiv preprint arXiv:2304.02138.
[33]
Kumar V., Recupero D.R., Helaoui R., Riboni D., K-LM: knowledge augmenting in language models within the scholarly domain, IEEE Access 10 (2022) 91802–91815,.
[34]
Kung T.H., Cheatham M., Medenilla A., Sillos C., De Leon L., Elepaño C., Madriaga M., Aggabao R., Diaz-Candido G., Maningo J., et al., Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models, PLoS Dig. Health 2 (2) (2023).
[35]
Lee J., Yoon W., Kim S., Kim D., Kim S., So C.H., Kang J., BioBERT: a pre-trained biomedical language representation model for biomedical text mining, 2019, CoRR abs/1901.08746. URL http://arxiv.org/abs/1901.08746.
[36]
Leivaditi S., Rossi J., Kanoulas E., A benchmark for lease contract review, 2020, CoRR abs/2010.10386. URL https://arxiv.org/abs/2010.10386.
[37]
Lerer A., Wu L., Shen J., Lacroix T., Wehrstedt L., Bose A., Peysakhovich A., PyTorch-BigGraph: A large-scale graph embedding system, 2019, arXiv:1903.12287.
[38]
Li W., Zhou H., Dong J., Zhang Q., Li Q., Baciu G., Cao J., Huang X., Constructing low-redundant and high-accuracy knowledge graphs for education, in: Learning Technologies and Systems: 21st International Conference on Web-Based Learning, ICWL 2022, and 7th International Symposium on Emerging Technologies for Education, SETE 2022, Tenerife, Spain, November 21–23, 2022, Revised Selected Papers, Springer-Verlag, Berlin, Heidelberg, 2023, pp. 148–160,.
[39]
Liu P., Neubig G., Yuan W., Fu J., Jiang Z., Hayashi H., Neubig G., Pre-train, prompt, and predict : A systematic survey of prompting methods in natural language processing, ACM Comput. Surv. 55 (2023) 1–46,. arXiv:2107.13586v1.
[40]
Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V., RoBERTa: A robustly optimized BERT pretraining approach, 2019, CoRR abs/1907.11692. URL http://arxiv.org/abs/1907.11692.
[41]
Liu W., Zhou P., Zhao Z., Wang Z., Ju Q., Deng H., Wang P., K-BERT: Enabling language representation with knowledge graph, 2019, URL http://arxiv.org/abs/1909.07606.
[42]
Löffler, F., Wesp, V., Babalou, S., Kahn, P., Lachmann, R., Sateli, B., Witte, R., König-Ries, B., 2020. ScholarLensViz: A Visualization Framework for Transparency in Semantic User Profiles. In: Taylor, K., Gon CÇcalves, R., Lecue, F., Yan, J. (Eds.), Proceedings of the ISWC 2020 Demos and Industry Tracks: From Novel Ideas to Industrial Practice Co-Located with 19th International Semantic Web Conference. ISWC 2020, Globally Online, November 1-6, 2020 UTC.
[43]
Mardiah M., Neyman S.N., et al., Aggregate functions in categorical data skyline search (CDSS) for multi-keyword document search, Khazanah Informatika: Jurnal Ilmu Komputer dan Informatika 9 (1) (2023).
[44]
Meloni A., Angioni S., Salatino A., Osborne F., Reforgiato Recupero D., Motta E., Integrating conversational agents and knowledge graphs within the scholarly domain, IEEE Access 11 (2023) 22468–22489,.
[45]
Mendes P.N., Jakob M., García-Silva A., Bizer C., Dbpedia spotlight: Shedding light on the web of documents, in: Proceedings of the 7th International Conference on Semantic Systems, I-Semantics ’11, Association for Computing Machinery, New York, NY, USA, 2011, pp. 1–8,.
[46]
Moiseev F., Dong Z., Alfonseca E., Jaggi M., SKILL: Structured knowledge infusion for large language models, in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Seattle, United States, 2022, pp. 1581–1588,. URL https://aclanthology.org/2022.naacl-main.113.
[47]
Moiseev F., Dong Z., Alfonseca E., Jaggi M., SKILL: structured knowledge infusion for large language models, 2022, arXiv preprint arXiv:2205.08184.
[48]
Nayyeri M., Cil G.M., Vahdati S., Osborne F., Rahman M., Angioni S., Salatino A., Recupero D.R., Vassilyeva N., Motta E., et al., Trans4E: Link prediction on scholarly knowledge graphs, Neurocomputing 461 (2021) 530–542,.
[49]
OpenAI M., GPT-4 technical report, 2023, arXiv:2303.08774.
[50]
Osborne F., Motta E., Klink-2: Integrating multiple web sources to generate semantic topic networks, in: Arenas M., Corcho O., Simperl E., Strohmaier M., d’Aquin M., Srinivas K., Groth P., Dumontier M., Heflin J., Thirunarayan K., Thirunarayan K., Staab S. (Eds.), The Semantic Web, ISWC 2015, Springer International Publishing, Cham, 2015, pp. 408–424,.
[51]
Osborne F., Motta E., Mulholland P., Exploring scholarly data with rexplore, in: Alani H., Kagal L., Fokoue A., Groth P., Biemann C., Parreira J.X., Aroyo L., Noy N., Welty C., Janowicz K. (Eds.), The Semantic Web, ISWC 2013, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 460–477,.
[52]
Ostendorff M., Bourgonje P., Berger M., Schneider J.M., Rehm G., Gipp B., Enriching BERT with knowledge graph embeddings for document classification, 2019, URL http://arxiv.org/abs/1909.08402.
[53]
Peng C., Xia F., Naseriparsa M., Osborne F., Knowledge graphs: opportunities and challenges, Artif. Intell. Rev. (2023) 1–32.
[54]
Qin Y., Lin Y., Takanobu R., Liu Z., Li P., Ji H., Huang M., Sun M., Zhou J., ERICA: Improving entity and relation understanding for pre-trained language models via contrastive learning, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, Online, 2021, pp. 3350–3363,. URL https://aclanthology.org/2021.acl-long.260.
[55]
Reimers N., Gurevych I., Sentence-BERT: Sentence embeddings using siamese BERT-networks, 2019, pp. 3973–3983,.
[56]
Rizvi S.T.R., Ahmed S., Dengel A., ACE 2.0: A comprehensive tool for automatic extraction, analysis, and digital profiling of the researchers in Scientific Communities, Soc. Netw. Anal. Min. 13 (1) (2023) 81.
[57]
Rossanez A., dos Reis J.C., da Silva Torres R., Representing scientific literature evolution via temporal knowledge graphs, 2020.
[58]
Salatino A., Angioni S., Osborne F., Recupero D.R., Motta E., Diversity of expertise is key to scientific impact: a large-scale analysis in the field of computer science, 2023,. URL https://dapp.orvium.io/deposits/6442f3fd947802668eee976c/view.
[59]
Salatino A.A., Osborne F., Birukou A., Motta E., Improving editorial workflow and metadata quality at springer nature, in: Ghidini C., Hartig O., Maleshkova M., Svátek V., Cruz I., Hogan A., Song J., Lefrançois M., Gandon F. (Eds.), The Semantic Web, ISWC 2019, Springer International Publishing, Cham, 2019, pp. 507–525,.
[60]
Salatino A.A., Osborne F., Motta E., AUGUR: Forecasting the emergence of new research topics, in: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, JCDL ’18, Association for Computing Machinery, New York, NY, USA, 2018, pp. 303–312,.
[61]
Salatino A., Osborne F., Motta E., CSO classifier 3.0: a scalable unsupervised method for classifying documents in terms of research topics, Int. J. Dig. Lib. (2022) 1–20.
[62]
Salatino A.A., Osborne F., Thanapalasingam T., Motta E., The CSO classifier: Ontology-driven detection of research topics in scholarly articles, in: Doucet A., Isaac A., Golub K., Aalberg T., Jatowt A. (Eds.), Digital Libraries for Open Knowledge, Springer International Publishing, Cham, 2019, pp. 296–311,.
[63]
Salatino A.A., Thanapalasingam T., Mannocci A., Osborne F., Motta E., The computer science ontology: A large-scale taxonomy of research areas, in: Vrandečić D., Bontcheva K., Suárez-Figueroa M.C., Presutti V., Celino I., Sabou M., Kaffee L.-A., Simperl E. (Eds.), The Semantic Web, ISWC 2018, Springer International Publishing, Cham, 2018, pp. 187–205,.
[64]
Su Y., Han X., Zhang Z., Lin Y., Li P., Liu Z., Zhou J., Sun M., CokeBERT: Contextual knowledge selection and embedding towards enhanced pre-trained language models, AI Open 2 (2021) 127–134,. URL https://www.sciencedirect.com/science/article/pii/S2666651021000188.
[65]
Sun T., Shao Y., Qiu X., Guo Q., Hu Y., Huang X., Zhang Z., CoLAKE: Contextualized language and knowledge embedding, in: Proceedings of the 28th International Conference on Computational Linguistics, International Committee on Computational Linguistics, Barcelona, Spain (Online), 2020, pp. 3660–3670,. URL https://aclanthology.org/2020.coling-main.327.
[66]
Thanapalasingam T., Osborne F., Birukou A., Motta E., Ontology-based recommendation of editorial products, in: Vrandečić D., Bontcheva K., Suárez-Figueroa M.C., Presutti V., Celino I., Sabou M., Kaffee L.-A., Simperl E. (Eds.), The Semantic Web, ISWC 2018, Springer International Publishing, Cham, 2018, pp. 341–358,.
[67]
Touvron H., Martin L., Stone K., Albert P., Almahairi A., Babaei Y., Bashlykov N., Batra S., Bhargava P., Bhosale S., et al., Llama 2: Open foundation and fine-tuned chat models, 2023, arXiv preprint arXiv:2307.09288.
[68]
Vergoulis T., Chatzopoulos S., Dalamagas T., Tryfonopoulos C., VeTo: Expert set expansion in academia, in: Hall M., Merčun T., Risse T., Duchateau F. (Eds.), Digital Libraries for Open Knowledge, Springer International Publishing, Cham, 2020, pp. 48–61,.
[69]
Wang X., Gao T., Zhu Z., Zhang Z., Liu Z., Li J., Tang J., KEPLER: A unified model for knowledge embedding and pre-trained language representation, Trans. Assoc. Comput. Linguist. 9 (2021) 176–194,. URL https://aclanthology.org/2021.tacl-1.11.
[70]
Wang R., Tang D., Duan N., zhongyu wei Z., Huang X., Ji J., Cao G., Jiang D., Zhou M., K-adapter: Infusing knowledge into pre-trained models with adapters, 2021, URL https://openreview.net/forum?id=CLnj31GZ4cI.
[71]
Xu Y., Namazifar M., Hazarika D., Padmakumar A., Liu Y., Hakkani-Tür D., KILM: Knowledge injection into encoder-decoder language models, 2023, arXiv:2302.09170.
[72]
Yamada I., Asai A., Shindo H., Takeda H., Matsumoto Y., LUKE: Deep contextualized entity representations with entity-aware self-attention, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP, Association for Computational Linguistics, Online, 2020, pp. 6442–6454,. URL https://aclanthology.org/2020.emnlp-main.523.
[73]
Yang J., Xiao G., Shen Y., Jiang W., Hu X., Zhang Y., Peng J., A survey of knowledge enhanced pre-trained models, 2021, pp. 1–19. URL http://arxiv.org/abs/2110.00269.
[74]
Zhang, X., Chandrasegaran, S., Ma, K.-L., 2021. ConceptScope: Organizing and Visualizing Knowledge in Documents based on Domain Ontology. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. pp. 1–13.
[75]
Zhang Z., Han X., Liu Z., Jiang X., Sun M., Liu Q., ERNIE: Enhanced language representation with informative entities, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 1441–1451,. URL https://aclanthology.org/P19-1139.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Engineering Applications of Artificial Intelligence
Engineering Applications of Artificial Intelligence  Volume 133, Issue PB
Jul 2024
1659 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 24 July 2024

Author Tags

  1. Knowledge injection
  2. Knowledge graphs
  3. Large language models
  4. Transformers
  5. BERT
  6. Classification
  7. Natural language processing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media