• Sonuç bulunamadı

Medical education, biomedical structured and unstructured data to construct knowledge and management systems

N/A
N/A
Protected

Academic year: 2021

Share "Medical education, biomedical structured and unstructured data to construct knowledge and management systems"

Copied!
12
0
0

Yükleniyor.... (view fulltext now)

Tam metin

(1)

Medical education, biomedical structured and unstructured data to construct knowledge and

management systems

Biomedical structural and unstructured data collection and management system for medical education

Cai Yifen Jen Chiang 徐珮岚 Fan Biqin Yi-Fen Tsai

a, I-Jen Chiang ab, Pei-Lan Hsu a, Pi-Chin Fann

a

Institute of Medical Informatics, Taipei Medical University,

b

National Taiwan University, Institute of Biomedical Engineering

Summary

This article aims to record in a search, organize and manage the knowledge platform to collect documents and automatic classification.

Bayes theorem, this will be the main direction of the structure, construction of the system are two important modules: (1) Automatic classification of the training module (2) hierarchical knowledge classification module; its files to quantify the concept, and used to compare the thesaurus to find word association, using the system in automatic text classification techniques, guided by clinical experts, learning classification rules generated, and accurate classification of documents, provide users with information in many precision to obtain the required

documentation.

In this study, a higher standard, so not all the databases included in the denominator of non-related literature (The sampling number) basis, but will be made through the keyword search the database after the extract training samples (305 articles) by professional clinicians in the sample system training, the test sample (108 articles) were assessed and found to observe the correct rate of 95.4%.

Keywords: knowledge management, text mining, medical education, medical literature

Introduction

(2)

Improve the quality of clinical decision making, rely on their past experience in physician, textbooks, literature, review, and expert evidence provided by [1, 2, 3].

Most of the medical knowledge to the regional order. In order to improve the quality of clinical decisions, physicians must from the textbooks, clinical guidelines, review, research articles, and expert evidence provided to find answers; often need a lot of time, money, and the huge workload of medical information in the search for the term of [ 4], to express the best of medical questions. The bio-medical knowledge spread rapidly accumulated a large number of medical literature; According to statistics, every 20 years the amount of medical literature will be doubled [7, 10].

On the other hand, because most of the medical diagnosis and management, often at very high decision under uncertainty, when doctors ordered diagnostic decisions facing patients should be accompanied by the possibilities, together with the relative probability value estimate for the possibility to determine the next test or treatment procedure; is the next step in the decision of all, all is based on this and come to the conclusion before; this connection, the conditional probability judgments based strictly natural process Forming; Bayesian inference model good contact to meet the conditions of such conduct in the uncertainty reasoning methods.

Therefore, we use text mining techniques, through the fields to the text mining system for training, for a large number of documents to automatically compile, automatic classification and the concept of clustering and, through the medical literature knowledge organization and management of the platform provides search, browse, and the the concept of correlation between medical vocabulary to predict, so that medical researchers to quickly get from the large number of documents required before a decision is issued by the possibility to estimate.

Text mining is defined as "from the non-structural or semi-structured text, discover the usefulness or meaning implicit in the fragments, model, direction, trends or rules", also defined as "analysis by which capture the important

document information process ", only through the exploration of the stage can the data or information a transformation to a knowledge, or all of the data or

information will simply lack the sense of numbers and symbols can not be Yingyong. How the medical literature to find useful content in the knowledge that will be used in text mining of medical information to the important issues;

health care workers often have to face in the clinical disease, major changes, such

(3)

as in the past AIDS, to the former while the SARS; not the face of illness, you must rely on the dissemination of new clinical knowledge to achieve this knowledge through a new collection of published medical literature access. Therefore, physicians faced with patients to promote the correct decision under the capacity, one challenge is to train physicians decision support process, information systems just to provide considerable support [14, 15, 16]. Through the automatic

classification system to help clinicians quickly and accurately search the required documentation and a knowledge network graph will be the correlation between knowledge expressed.

Literature

 Automation Articles Category

A valid medical knowledge management system will require effective integration of a large number of distributed heterogeneous resources, and provides a quick way to get the correct information to answer the clinical care of the problems encountered.

Automatic document cataloging or classification is to be discussed in the recent years and research issues, and become the focus of research over at least a decade. Since 1961 Maron 's paper presented at the ACM automatically classified, they gradually have some documents related to the application of classification arise, for example: Harold Borko and Benick sucked before manually classified documents, and training documents by computing the key word lexicon and test vector file vector inner product value, the value of inner product can be used as a basis for classification, the greater the value that the greater the similarity [17]. Linear Discriminant Analysis (LDA) is a modular way through the statistical study for the files from the original module to classify the level of its dimensions, and can extract the relevant information [18]. Category Discrimination Method (CDM) for the positive and negative association for the classification, in order to find the best features associated with the weight of its accuracy (Precision) and recovery (Recall) as high as 74.2% [19]. The SYNDIKATE is natural language analysis system, particularly for medical articles of the text structure

development.

Text information for learning and classification of the former, it will be expressed

as tfidf (term frequency times inverse document frequency) vector form. tfidf (i) are

defined as follows:

(4)

tfidf (i) = TF (Wi, dj) * IDF (Wi) = TF (Wi, dj) * log (D / DF (Wi))

Where TF (Wi, dj) that the word Wi in the document dj in frequency; D is the total number of files; DF (Wi), said the number of documents containing the word Wi.

[22]

Training files for all word processing, statistical frequency of each word documents and other information; then under construction for the DF of each article tfidf information vector.

 Word association

We use the term relevance for unstructured resource classification. Physicians will also appear the words, classified in the same categories (category), that the definition of each category to form a meaningful concept (concept); each category exists between concepts and relations. Relevance of these terms (term associations) because of different definitions of the different categories.

More traditional text mining techniques to extract key words and concepts to analyze the literature, but, as Feldman and colleagues found in [12], can also find on the literature data mining association rules between words in order to explore the hidden knowledge. Automatic classification of documents is to use text in a file the size of the number of occurrences, the text part of speech, to do classification of probability, and explain the importance of words contained in documents is the so-called key words can be used as a basis for classification.

The choice of words, the amount required to carry out the word, parts of speech, meaning, the relationship between words and the first combination, control, controlled vocabularies have the advantage, but most commonly in content analysis of the problems is often a lack of distinction between categories of experts in the field guidance, or can not meet user needs analysis categories, so users must comply with the interdisciplinary literature habits and approach to the concept of conceptual categories of users [20], form a category with custom categories.

 Category structure

Automatic document classification to assist interns, residents, and physician

management of a large number of medical information. All the files are classified

(5)

based on word association to the appropriate categories. In response to rapid expansion of medical knowledge, the system must dynamically from the collection and classification of Internet and online digital library of information.

So, how accurate documentation for various types of concept classification by topic as its top priority, in order to achieve effective and useful information quickly get the purpose of document, automatic classification of a text mining system Zhong Yao assessment project. Automatic classification of text mining in the establishment of a systematic study of the sources of knowledge were found to depend on natural language processing, thesauri and controlled vocabulary [21], the development trend of information retrieval on, hoping the common theme of law and classification integrates Through trained by experts in the subject category classification structure formed.

Introduction

This paper used the knowledge management system Clever Craft, is a completely in Java development, specifically engaged in knowledge management and analysis professionals in the fields as professional knowledge discovery instruments, principally in the text unstructured data analysis. Clever Craft with large text data of the user, providing access to text mining of Bayesian network algorithm to analyze the concept of terms for the literature analysis of the relationship, and the semantic network diagram showing the hidden knowledge. Its functions as outlined below:

1. Full Text Information Retrieval (Information retrieval)

2. Document the concept of clustering (Conceptual clustering) 3. Multi-language (Multi-linguistics)

4. Vocabulary System (Dictionary)

5. Statistical Segmentation (Statistical NGram) 6. Category Learning (Document classifications) 7. Automatic Classification (Automatic clustering)

8. Automatic data collection download (Automatic downloads)

(6)

9. The establishment of inference rules (Deductive rules)

System architecture has two important modules: (1) for the automatic

classification of the training module (2) knowledge of the hierarchical classification module.

(1) Automatic classification of the training modules:

Framework as a basic theme to MeSH categories, by clinical specialists and staff for its theme the concept of change to meet the requirements of the relevant clinicians, and then, by the specialist to the training samples (literature) were attributable to the physicians Each article should be attached to the genre, and so that each category has more than a training article, so the system can be on the relevance of its term, after learning to identify the areas of classification rules, find another test samples (literature) for automatic classification system by other physicians for the classification system by learning the rules of separation of the categories to assess the results. (See Figure 1)

Figure 1, the system automatically classification framework (2) knowledge of the hierarchical classification module:

In this classification module, the main experts in the field covered by the word and thesaurus set off words of two modes, according to key words and then sort and filter. The use of word in the thesaurus off the natural language processing, according to word frequency emerged as an automatic construction of key words, from the literature, statistical matching words appear, the higher the frequency of words that is regarded as meaningful words, and to After these words as the basis for articles broken words, literature can often reveal hidden knowledge not easily found. But because the system for medical literature dealing with the literature, not a general term, documents often contain a number of specific medical

terminology as well as various blood product name, it needs to have domain

knowledge as a supplement, so in front of words used in operations libraries need

to have experts in thesaurus to make up for natural language processing system

to automatically build off word dictionary is more accurate than artificial defects,

and experts through a controlled vocabulary words and breaking words down

automatically compare the keyword selection sort, in order to into the guide

(7)

learning classification, according to documents similarity (similarity) classification, to belong to the appropriate category in Figure 2.

Figure 2, knowledge of the hierarchical classification module

System process outlined in Figure 3. Remove all files by some common words (stop word) and tfidf threshold value is less than the way words are converted to vector. Key words filter (keyword filtering) can retain a meaningful word

keywords dry. File classification category will be similar to high-classified into the same category. Each category will have its associated principles.

Figure 3, the system flow chart

For example: We have New England Journal of Medicine and the Lancet to collect information on SARS in the literature. We just want to see it affected areas, therefore only consider the "countries", "CDC", "WHO" and "SARS" these important words. Other related concepts are not high in terms eliminated. So in the Lancet journal, according to results of association rules as shown in Figure 4, and in the New England Journal of Medicine journal, the results shown in Figure 5.

Figure 4, the Lancet. In terms of relevance of SARS

Figure 5, New England Journal of Medicine in The word association SARS Source

We collected a lot of different databases of medical literature, and by

pediatricians in training these documents classified operations. These technical terms (technical term) the classification is based on classification according to MeSH;

MeSH is the theme concept of a hierarchical analysis of spindle subject headings,

the most highly respected in the medical field of subject headings, is the

(8)

traditional controlled vocabulary can increase the wording of a concept of inter- uniform, to help users grasp the concept of focus, the field of blood transfusion in children, a total of 48 categories obtained seven categories. But the disadvantage is that its vocabulary is not fully compatible with the concept of user control is often not enough new vocabulary words, but also because of domain knowledge required for different categories, which led to excessive and inadequate

categories phenomenon, the work by the senior blood bank as well as common clinical pediatrics attending physician for his blood transfusion medicine and pediatrics on the related categories for change, 21 were selected in 10 categories as a classification framework of terminology, as shown in Figure 6, to meet the physician classification of the sample documents categories.

Figure 6, the hierarchical category

Sources of medical literature databases, Transfusion, Transfusion Medicine,

Transfusion Science, Journal of Pediatrics, Archives of Diseases in Childhood Fetal and Neonatal Edition and other periodicals, electronic information from the Journals @ OVID

database (Source: Shin Kong Hospital Library Web site http:// library.skh.org.tw) and SDOS-ES, Blackwall Science (Source: Taipei Medical University Library website http://library.tmu.edu.tw). Each keyword 』『 transfusion and newborn, transfusion and fetal 』 『, transfusion and pediatrics 』 『 retrieval. In the classification, the basis used by the NCBI as the basic framework for doing classification MeSH. We selected 10 categories of 21 technical terms. First, we selected 305 samples for training doctors to do classification. Test samples (108 articles) to automatically classify.

Training samples in different categories (Table 1) for category classification, the system interface shown in Figure 7.

Table 1, the clinical categories defined by pediatricians

Figure 7, automatic classification interface

Result

(9)

(A) System reliability assessment

Analysis of internal documents on the test accuracy and consistency (inter-coder agreement) assessment tool using Kappa Statistics, Kappa value is mainly used to assess the original testing of the system after training clinicians and

documentation of test results of the system can be reliability and consistency.

Therefore, as between each test on the document produced by the classification are the same. Thus the calculation of Kappa values for differences between assessment and analysis, as Table 2. Calculated as follows:

To observe the value of Po, Pr random value, a maximum value for the observations.

Table II, Kappa values assessing the internal consistency of the good or bad

Kappa Reliability evaluation

0.00 Poor

0.01-0.20 Weak 0.21-0.40 Reliable 0.41-0.60 Credible 0.61-0.80 Important 0.81-1.00 Perfect

The system of professional clinicians in training samples (305 articles) obtained after testing samples (108 articles) were then referred to a professional clinician with the system test, will test samples for evaluation, clinicians The papers document classification system has been directed and after learning to classify, the results shown in Table 3.

Table 3, clinicians evaluate the validity of the system classification

(10)

When the mining similarity of 0.5 (cut off = 0.5) when tested by that article by clinicians to guide learning classification system, we found 108 documents as testing samples, there are 103 doctors that literature can be classified class, but five physicians did not document a suitable category that can be classified;

properly be attributed to the category of literature with 98 (89.8%), clinicians feel that there is nothing wrong with taxonomic classification category 5 articles (4.6%), but another five literature (4.6%), its similarity are less than 0.5, the system has not been classified, as it happens that non-physicians to the field of pediatric

transfusion literature. So the doctors feel that the system correctly classified 98 articles with physicians that can not be classified five documents, can be observed as high as 95.4% accuracy

Observed agreement = (98 +5) / 108 = 0.954

Random agreement = 0.907 * 0.954 + 0.092 * 0.046 = 0.869 Kappa = (0.954 - 0.869) / (1 - 0.869) = 0.649

Therefore, we can see from the table, if the test Clever Craft】 【resulting Kappa values greater than 0.6, indicating that this file classification system has been training with the concept of a consistency of clinicians.

(B), the accuracy of various types of projects

This system was pediatric physician training, systems and experts on various projects for the accuracy of test samples tested by the system should be classified documents from the category of each chapter (if the content of cross- category, then the number of open category ) by the clinicians to judge the accuracy of their classification, the results shown in Table 4, and found that, with great precision, that is when the system to guide the clinician through a type of studies, the classification concepts very similar to the beginning of the system, the concept of training staff, so the guidance of learning classification system module of its high reliability.

Table 4, Category accurate assessment

Discussion and Future Prospects

(11)

With the growth of electronic knowledge representation (above 80%), how to find the underlying knowledge is more important. Principle is a word associated with the performance of an important means of knowledge. According to the principle of relevance, we do not only focus on the definition of the document the words appear at the same time, also mentioned in the document to explore the connection between the events.

Explore the past, the principles associated with the text that the method used to file for multi-label, or to extract keywords from the document. This paper is to use the document index instead of Boolean Index (Boolean indexing, simply by a word address appears in the files associated with this file to calculate the reliability of the principle).

The advantage of this method, we do not need and do not require manpower to mark documents prepared in advance can be used in different areas. Our

algorithm can be easily and background knowledge (background knowledge) or ontology (ontology) to establish relevance, and from the specific words to find relevance in principle. Solve the associated principles in the term level of background knowledge can not be the whole problem.

Hospital examination, diagnosis and prescription information such as structured, and medical reports, doctor's advice, and literature and so are all text, a non- structured data, will this information into an orderly organization of knowledge- based rules, with clinical cases as Evidence-based type of case-based learning and the acquisition of knowledge to the visual presentation of the knowledge network diagram to make medical students and residents can be very rapid construction of the knowledge they need, and then in the face of When patients can provide fast and accurate search data and related concepts of maps, literature by

corroboration in order to make the best disposition of the patients.

Through the automated classification of text mining system, so precise

classification of medical literature, and to visually presented to facilitate clinicians

to seek necessary information quickly, so Zhili Yu improve system performance of

automated classification. Classification is based on object relations group of its

sort of behavior, the classification accuracy of the choice of their categories so

that users can clearly understand and comply with the required precision is

important, especially in an absolute theoretical basis with medical knowledge ,

the impact of more giant, making the standardization of various purposes, unique,

(12)

specific and direct to be revealed, the accuracy of the relatively average will also be up to our medical staff can provide fast accurate from the relevant literature find The purpose of literature, so that the system of automatic text classification is quite respectable.

References

1. Huth EJ, The underused medical literature. Ann Intern Med, 1989. 110 (2): 99-100.

2. Covell DG, Uman GC, Manning PR, Information needs in office practice: are they being met? Ann Intern Med, 1985. 103 (4): 596-599.

3. Gorman PN, Helfand M., Information seeking in primary care: how physicians choose which clinical questions to pursue and which to leave unanswered. Med Decis Making, 1995. 15 (2):

113-119.

4. Smith R., What clinical information do doctors need? BMJ, 1996. 313 (7064): p. 1062-1068.

5. Godin P., Hubbs R., Woods B., Tsai M., Nag D., Rindfleisch T., Dev P., Melmon KL, A New Instrument for Medical Decision Support and Education: The Stanford Health Information Network for Education, in Proceedings of the 32nd Hawaii International Conference on System Sciences, 1999.

6. Huth E., Needed: an economics approach to systems for medical informaiton. Ann Intern Med, 1985. 103: p. 617-9.

7. Gorman PN, Ash J., Wykoff L., Can primary care physicians' questions be answered using the medical journal literature?. Bull Med Libr Assoc, 1994. 82 (2) :140-146.

8. Hubbs PR, Tsai MC, et al. The Stanford Health Information Network for Education:

integrated information for decision making and learning. In Proceedings of AMIA Annual Fall Symposium, 1997, 505-508.

9. Barnes BE, Creating the practice-learning environment: using information technology to support a new model of continuing medical education. Acad Med, 1998. 73 (3): 278-281.

10. Wyatt JC, Knowledge management and innovation in medicine: how to go beyond practice guidelines? Advances in Clinical Knowledge Management, 2002; 5.

11. Sebastiani F., Machine learning in automated text categorization. ACM Computer Survey;

2000.

12. Feldman R., Aumann Y., Amir A., Kl'osgen W., Zilberstien A., Text mining at the term level. In Proceedings of 3rd International Conferenceon Knowledge Discovery, KDD-97, pages 167-172, Newport Beach, CA, 1998.

13. Tierney WM, Miller ME, Overhage JM, McDonald CJ, Physician order writing on microcomputer workstations. JAMA 1993; 269: 379-383

14. Barnes BE, Creating the practice-learning environment: using information technology to

support a new model of continuing medical education, Acad Med, 1998. 73 (3): 278-281.

(13)

15. Frize M., Solven FG, Stevenson M., Nickerson BG, Buskard T., Taylor K., Computer- Assisted Decision-support Systems for Patient Management in an Intensive Care Unit. Proc.

Medinfo '95 1995; Vancouver :1009-1012 . 14.

16. Shank RC, Case-based teaching: Four experiences in educational software design, (Technical support No. 7). Institute for Learning Sciences, Northwestern University, 1991.

17. Borko H., and Bernick M.,. Automatic document classification. In ACM, 1963. 10 (2) : 131-135.

18. Fukunaga K., Introduction to statistical pattern recognition (2nd edition). (New York, 1990).

19. Feldman R., Mining unstructured data in ACM SIGIR, pages 182-192, San Diageo, CA, 1999

20. Saracevic T., A Study of Information Seeking and Retrieving. Background and Effectiveness. Journal of the American Society for Information Science, 1988. 39 (3) :177- 196.

21. Sebastiani F., Machine Learning in Automated Text Categorization. ACM Computer Survey, 2002.34 (1): 12.

22. Salton G., Buckley C. (1988). Term-weighting approaches in automatic text retrieval.

Information Processing & Management, 24 (5), 513-523.

Referanslar

Benzer Belgeler

The use of information technologies in the teaching of legal disciplines for the students in the direction of training "Psychological and pedagogical education" should

Such methods as monitoring and content analysis of materials of media and Internet resources from different regions have been used in the study (however, the author was

Knowledge management is defined as: A systematic approach (knowledge acquisition, storage, sharing and diffusion, innovation) to manage the Hospital of implicit and explicit

In this study, knowledge management through the analysis of relevant literature, propose a conceptual framework for the system, supplemented by data mining techniques, and

classification module; its files to quantify the concept, and used to compare the thesaurus to find word association, using the system in automatic text classification

This paper proposes a cross-disciplinary methodology for a fundamental question in product development: How can the innovation patterns during the evolution of an

The purpose of this Semi-Structured Interviews is to collect data about “The Effectiveness and Application of the Moodle LMS (Learning Management System)

Computed tomography angiography demonstrated an abnormal origin and the abnormal course of the right coronary artery between ascending aorta and the main pulmonary