View of A Question Answering Model to identify supporting sentences from Electronic medical records using GNN

(1)

__________________________________________________________________________________

3140

A Question Answering Model to identify supporting sentences

from Electronic medical records using GNN

V.Nandini 1, K C Rajeswari 2

1_{Associate Professor. Sona College of Technology,}_{https://orcid.org/0000-0003-1935-109X} 2_{Assistant Professor (Sr.G). Sona College of Technology}

[email protected] [email protected]

Abstract: Question Answering System is a generative model that has set its foundations to a larger audience of individuals

and an advanced number of fields, including medication. The clinical NLP system has attempted to apply general language models to clinical tasks. Healthcare documents provided to the patient comprise both unstructured and structured data clinical notes. Regularly, these records are huge, containing several archives about a solitary patient. A big barrier to this goal is that much of the information in Electronic Health Records (EMR) is still narrative. The target of this paper is to structure a QA model for COVID medical records utilizing Graph Neural Network (GNN) that recognizes clinically supporting sentences representing the most clinically useful information that explains the health condition of a patient understood by a common man. The graph structure contains nodes that represent individual sentences and some pair represents the relationship between the nodes. The graph accommodates a variety of questions. Since the range of a QA system for handling this variation is important the question can be asked in many different ways. The GNN explains and expands the summary record which answers the questions who? What? Where? When? Why? and how? The supporting answers of multiple question types for the long clinical document have paved the way for a unique challenge that necessitates redesigning and evaluation that aid clinical decision systems.

Keywords: Question Answering, Graphical Neural Network, Sympton Classification

1. Introduction

Natural Language Processing (NLP) is a branch of Artificial Intelligence that facilitates human-computer interaction. It helps out computers to communicate, interpret, and understand human language. The advancement of NLP toward human-computer interaction has more inference for organizations and consumers. The significance of the technique is tremendous as it can recognize the fine distinctions of human language in many contexts, from medicine to law growing exponentially. Its applications are widespread in almost many disciplines that focus more on computational linguistics. With the dawn of NLP computers now able to predict the segregation of relevant information from irrelevant ones. Natural language processing includes various procedures and techniques for interpreting human language. The ongoing machine learning applications has urged researchers to build systems with a huge scope that permits analysts to formulate neural frameworks that consequently addresses questions presented in a human language both as supervised and unsupervised. The general idea of combining reasoning with neural network approach models through Question Answering (QA) models automatically answers natural language questions using data contained in a knowledge graph. Despite significant research efforts question answering for factoid type questions is still a challenge. Indeed for the entire explored researched domain, understanding Electronic medical data poses additional challenging tasks. The Health records comprise of a patient's treatment history for a health issue. This indeed describes medical histories, diagnoses, immunization dates, allergies, and notes of progress. They may also include test results, medications you have that have been prescribed in medical terms that are unaware of a layman. This paper addresses a survey of the QA system which automatically extracts and integrates information for the literature and formulates the medical data to meet the demands of the question with answers in simple terms.

2. Review of Related Studies

Question Answering Models for the medical domain paves the way to be aware of appropriate information from the clinical literature issued for patients after hospitalization. For this QA model is designed to identify

(2)

__________________________________________________________________________________

3141

supporting sentences which hold relevant information. The information recorded in the documents can be converted into a graphical structure using GNN that provides answers to the questions in the form of What, Why, Which, Where, and How. Researchers are working in the QA domain where NLP plays a significant role in human-computer communication. Some of the related works are been given in the following sections.

3. Neural networks for medical QA domain

Lukovnikov, Denis, et al.,[1] has presented an end-to-end, neural network-based approach for answering medical questions over large scale knowledge graphs, leveraging a hierarchical word and character-level question. A different line of research for QA over Knowledge Graphs examines semantic parsing approaches to translate the clinical record of a patient into formal queries that can be executed against a graphical neural network.

Yin et al., [2] have utilized Convolution Network(CNN)s and proposed a detached pooling way to deal with the model that provides a superior answer for the QA model. An entity relation pair is extracted from each sentence. The entity is encoded utilizing a character-level CNN and coordinated against a character-level encoded label. The entity relation pair is encoded utilizing a different word-level CNN with max-pooling and coordinated against a word-level encoded predicate.

Yoon et al.,[3] has focused on a model that can detect supporting sentences to answer questions from EMR. Implementations are centered on assessing and breaking down the competence of the proposed GNN based model for characterizing supporting sentences. A model was proposed that develop the relational information among sentences in passages to classify the supporting sentences that contain the basic information for addressing the question. A GNN organization model named propagate selector was implemented that was utilized as a subsystem in the QA pipeline.

Wenqi Fan et.al, [4] have presented a graph Neural Network framework for social recommendations. The work is a moral approach to jointly capture interactions and opinions in the user-item graph. The research exposes that the opinion information plays a vital role in the improvement of the performance of the model performance. The GraphRec differentiates the strengths by considering heterogeneous strengths of social relations.

Nicola De Cao et al., [5] have addressed graph neural approach that works over a conservative representation with a set of records where nodes are designated as entities and edges specify relations within documents for example, inside and cross-report co reference. The model learns to address questions by gathering proofs from various records through a differentiable message passing algorithm that refreshes node representation based on the neighborhood.

Hong Yu et al., [6] have examined ADE-significant relationship extraction where clinical information is identified with Adverse Drug Event (ADE) and their relations were clarified employing a graph. The experiments have been done and explored in supervised AI approaches. A rule-based methodology has been managed, which is like administered illustrative principle induction. LSTM and attention-based neural network techniques have been utilized to classify clinical relations. The examinations have outperformed appreciable outcomes.

Wenhui Wang et al.,[7], have introduced gated self-matching networks for understanding cognizance and question answering. A gated attention-based Recurrent Neural Network (RNN) and self-matching representation for the question sections is utilized and the pointer concepts to identify answer limits. Additionally to decide the significance of data in the entry concerning a question pointer concepts are utilized. Given question and passage representation the sentence pair has been framed using the alignment of words in documents. The implementation is restricted to the scope of documents containing long passages.

(3)

__________________________________________________________________________________

3142

Fang et al,[8] proposed a new methodology, Hierarchical Graph Network (HGN), for multi-hop question answering. Clues are captured from dissimilar granularity levels and transformed into a graphical network which includes heterogeneous nodes that combine clues from scattered texts across multiple paragraphs. The node depiction of the graph is a hierarchical graph that is created by constructing nodes from different levels of granularity that includes questions, paragraphs, sentences, and entities. The hierarchical differentiation of node granularity enables HGN to support different questions answering sub-tasks simultaneously that enables a detailed analysis of the information recorded in the QA model. The results show the effectiveness of the model which has achieved state-of-the-art performance on HotpotQA.

Marzieh Saeidi et al., [9] developed an annotation protocol to gather annotations for conversational machine reading. Rule-based Text Extraction has been implemented to identify the source documents that contain the rules that are to be annotated. Each document is then converted to a set of rule texts using a heuristic which identifies and group paragraphs and bulleted lists. The classification system implements a decision algorithm to decide the entailment of true sentences. The performance of the system is based on the rules that are framed.

4. ROLE OF NATURAL LANGUAGE PROCESSING (NLP)

Biswal et al.,[10] have innovatively proposed a novel tool to harvest (Electronic Health Record)EHR documents of Portable Document Format (PDF’)s, scan report, email’s formats and extracts Prostate-Specific Antigen (PSA) values from the EHR automatically. The PSA value was verified by clinical experts and the predictors was words before and after the value. A support vector machine was used for training the predictors as bag of words. Validation is carried out for verifying whether the PSA value is correct or not. The accuracy for each report is the largest PSA values returned as final.

Biswal et al.,[11], has addressed an automated system that automatically detects seizures and epileptiform discharges using supervised learning from EEG reports. The documents were labeled based on the presence or absence of seizures and epileptiform discharges. Naïve bayes’ algorithm is used for the classifying the categories. The system extracts the key sentences, features such as key words and special word patterns called elastic word sequences (EWS) using cross validation. The final features are accomplished via sequential backward selection. Using cross validation more number of features for seizure and epileptiform discharge detection was predicted which has outperformed 85% of accuracy.

Harsheen et al.,[12] has achieved promising results in extracting Asthma Predictive Index (API) patients automatically from HER using NLP. An algorithm was developed on the data which is trained with status of asthma symptoms that is taken from manual chart review (CW) existing based on API. Validation was performed by determining the difference between asthma status by NLP-API , knowledge graphs[15] and manual chart review. The risk factors are thus identified and recorded. Pattern based rules were framed for identification of symptoms and the association.

Chin et al.,[13] has emphasized word embedding that is combined with (Convolution Neural Network)CNN in extracting text from medical discharge notes. Classification methods were adopted for mining from discharge summary. Feature extraction module extracts terms, n-gram phrases as features to instruct a set of supervised machine learning models using Support Vector Machines (SVM).

Usha et al.,[14] proposed a standardized calculation based IDS for MAC Intrusions (NMI) to improve the IDS execution significantly. The proposed NMI incorporates two essential parts OFSNP and DCMI. The primary part is ideal element choice utilizing NG and PSO (OFSNP) and the subsequent segment is Detecting and Categorizing MAC 802.11 Intrusions (DCMI) utilizing SVM classifier. The SSC depends on particle swarm optimization (PSO) that utilizations named and unlabeled highlights at the same time to discover a group of optimal features. Utilizing the optimal arrangement of features, the proposed DCMI uses a fast support vector machine (SVM) discovering that classifies the attack under the fitting classes. Along these lines, the proposed

(4)

__________________________________________________________________________________

3143

NMI accomplishes a superior compromise between location precision and learning time. The results show that the NMI precisely identifies and orders the 802.11 explicit interruptions and furthermore, it lessens the false positives and calculation by diminishing the quantity of features.

Satyamoorthi et al.,[15] proposed strategy for web mining incorporates size, cost, frequency, maturing, entry time of the cache and popularity of Web objects in cache removal strategy. It utilizes the Web usage mining as a strategy to improve Web caching policy. Additionally, experimental results shows that proposed strategy performs better compared to existing policies regarding different execution metrics such as hit rate and byte hit rate.

5. CHALLENGES IN QA MODELS IN MEDICAL DOMAIN

QA for clinical record is still challenging, as it involves more machine to be trained with medical phrases. The models need to be trained with combination of multiple difficult tasks such as reading, extracting features, processing of medical vocabulary in terms of tokens, comprehending, reasoning, and finally providing the solutions for any person to interpret easily.

Table 1: Prevailing QA models and methods adopted.

Existing models in clinical QA models

Similarity models Approach

Vector space models Knowledge graphs

Knowledge graphs for sentence level embedding

Knowledge graphs using Word chunk overlaps

Statistic models Map reduce

Probabilistic models Machine translation

Corpus based models NLP based measure

6. SUPPORTING SENTENCES IN A NLP.

A supporting sentence in a medical record is sentences with information that supports details of symptom or description of the illness. Those sentences are mentioned by the medical practitioner that illustrates evidence to means why a claim is true or correct.

The particular issue we plan to handle in this study is to organize supporting sentences. The details are categorized as tuples which is compromised of (<Q, Pn, Yi, A>)

Where Q is the question,

Pn is the passages from the medical record, 2Pn is further consist of a set of sentences Si (Si 2Pn),

Yi is the label that indicates whether Si contains the evidence for answering the questions. To obtain the supporting sentences for question answers pairs the appropriate answers need to carefully chosen first. The existing answer selection methods are adopted for processing.

7. RESEARCH QUESTIONS TO BE ADDRESSED

There is a gap that exists between human and computer in providing solutions to real time. Variations differ in synthesis, paraphrasing and inference. Based on the exploration certain questions need to addressed.

 What are the methods that can improves the vocabulary problem in clinical texts? How to associate the lexical gap between the NLP and clinical Knowledge bases?

(5)

__________________________________________________________________________________

3144

 How to identify the appropriate model and extracct semantic features of the complex questions?  What are the ways to handle reasoning over the complex questions. And what is the most efficient

mechanism that should be measured?

The importance of this study is to extract important sentences from the clinical texts and categorize the solutions under various question types.

8. CONCLUSION

A QA model that utilizes the relational information present as sentences in a medical record as passages using NLP is one of the upcoming research works. The survey illustrates a QA model that extracts relevant essential information from text documents. Each model deals with the extraction and classification of supporting sentences that contain the necessary information for answering the queries. With the gathered information on supporting sentence detection, a QA model was designed and proposed to transform the sentences into a graph that illustrates the required answers for the factoid questions for a clinical record.

9. REFERENCES

1. Lukovnikov, D., Fischer, A., Lehmann, J., & Auer, S. (2017, April). Neural network-based question answering over knowledge graphs on word and character level. In Proceedings of the 26th international conference on World Wide Web (pp.1211-1220).

2. Yin, W., Yu, M., Xiang, B., Zhou, B., & Schütze, H. (2016). Simple question answering by attentive convolutional neural network. arXiv preprint arXiv:1606.03391.

3. Yoon, S., Dernoncourt, F., Kim, D. S., Bui, T., & Jung, K. (2019). Propagate-Selector: Detecting Supporting Sentences for Question Answering via Graph Neural Networks. arXiv preprint arXiv:1908.09137.

4. Fan, W., Ma, Y., Li, Q., He, Y., Zhao, E., Tang, J., & Yin, D. (2019, May). Graph neural networks for social recommendation. In The World Wide Web Conference (pp. 417-426) .

5. De Cao, N., Aziz, W., & Titov, I. (2018). Question answering by reasoning across documents with graph convolutional networks. arXiv preprint arXiv:1808.09920.

6. Munkhdalai, T., Liu, F., & Yu, H. (2018). Clinical relation extraction toward drug safety surveillance using electronic health record narratives: classical learning versus deep learning. JMIR public health and surveillance, 4(2), e29.

7. Wang, W., Yang, N., Wei, F., Chang, B., & Zhou, M. (2017, July). Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 189-198).

8. Fang, Y., Sun, S., Gan, Z., Pillai, R., Wang, S., & Liu, J. (2019). Hierarchical graph network for multi-hop question answering. arXiv preprint arXiv:1911.03631.

9. Leyh-Bannurah, S. R., Dell'Oglio, P., Tian, Z., Graefen, M., Huland, H., & Budäus, L. (2016). 353 A new era of data extraction: Example of automated extraction PSA values from electronic health records. European Urology Supplements, 15(3), e353.

10.Biswal, S., Nip, Z., Junior, V. M., Bianchi, M. T., Rosenthal, E. S., & Westover, M. B. (2015, August). Automated information extraction from free-text EEG reports. In 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 6804-6807). IEEE.

(6)

__________________________________________________________________________________

3145

11. Kaur, H., Sohn, S., Wi, C. I., Ryu, E., Park, M. A., Bachman, K., ... & Juhn, Y. J. (2018). Automated chart review utilizing natural language processing algorithm for asthma predictive index. BMC pulmonary medicine, 18(1), 1-9.

12. Lin, C., Hsu, C. J., Lou, Y. S., Yeh, S. J., Lee, C. C., Su, S. L., & Chen, H. C. (2017). Artificial intelligence learning semantics via external resources for classifying diagnosis codes in discharge notes. Journal of medical Internet research, 19(11), e380.

13. Bao, J., Duan, N., Yan, Z., Zhou, M., & Zhao, T. (2016, December). Constraint-based question answering with knowledge graph. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 2503-2514).

14. Usha, M., & Kavitha, P. J. W. N. (2017). Anomaly based intrusion detection for 802.11 networks with optimal features using SVM classifier. Wireless Networks, 23(8), 2431-2446.

15. Sathiyamoorthi, V. (2016). A novel cache replacement policy for web proxy caching system using web usage mining. International Journal of Information Technology and Web Engineering (IJITWE), 11(2), 1-13.