View of A Study on Classical Texts in the Field of Computational Linguistics through Bibliometric Analysis

(1)

Research Article

A Study on Classical Texts in the Field of Computational Linguistics through

Bibliometric Analysis

Athira Najwa Zakaria1, Anida Sarudin2*, Zulkifli Osman3, Husna Faredza Mohamed Redzwan4, Muhammad Fadzllah Hj. Zaini5

1,5_{Universiti Pendidikan Sultan Idris}

2*,3,4_{Fakulti Bahasa dan Komunikasi, Universiti Pendidikan Sultan Idris}

athiranajwa3@gmail.com1, anida@fbk.upsi.edu.my2*, zulkifli@fbk.upsi.edu.my3, husna.faredza@fbk.upsi.edu.my4, muhamadfadzllahhjzaini@gmail.com5

Article History: Received: 10 November 2020; Revised: 12 January 2021; Accepted: 27 January 2021; Published online: 05 April 2021

Abstract: Manuscripts or classic texts written by hand found in papers, barks, and rattans are relics of past generations.

According to generated data, Springer Link publishes a total of 111,010 articles concerning classical texts from the year 2015 to 2019. The present bibliometric analysis focuses on three aspects, namely year of publication, type of document, and field of discipline. Data collection in schedules and visual scheduling display the current trends of classical text studies. Bibliometric analysis discovers that the utmost type of publication generated from the classical text neyword is “article” with 6,962,098 hits. The field of research which records the highest search frequency is “Physics” with 30,705, whereas, the field of “Linguistics” only records 1425. However, the analysis concentrates about research on the subdiscipline field of computational linguistics. The Language and Literature subdiscipline records the highest search numbers of 148. Through the bibliometric study, three prominent lexicons revealed from the field of linguistics closely related to classical texts are Language, Corpus, and the Arabic Language. The process of topic visualisation of research papers through a word cloud can reveal these three lexicons. In conclusion, bibliometric analysis related to the field of linguistics not only provides a clear view of current developments of global classical text studies, but it also predicts the future research potential of the field.

Keywords: Computational Linguistics, Bibliometric Analysis, Subdiscipline

1. Introduction

Researchers all around the world are ardently pursuing the exploration of knowledge through classical text studies. In Malaysia, classical text studies of Malay manuscripts not only piqued the curiosity of local researchers, but even international researchers such as from Leiden University and Hamburg University are also avidly delving into manuscript research (Ming, 2003). These researchers find contemporary relevance of interdisciplinary knowledge in the contents of Malay manuscripts. Utilising higher-order thinking indicators, (Sarudin et al. 2019a; 2019b), advanced Malay language proficiency and high comprehension of socio-cultural skills understood as “santun berbahasa” (socio-cultural language politeness) (Mohamed Redzwan et al., 2020), one can explore comprehensive knowledge in the contents of Malay manuscripts. These include aspects of intellect, intellectuals, weltanschauung (worldview), and civilisation of society. The Malay manuscripts are the highest contribution of traditional Malay yet consisting of versatile knowledge tradition and reflecting the sociological advancement of its civilisation (Abd. Azam & Yatim, 2012). The palace historically produced manuscripts as only rulers such as kings possess the ability to read and write, whereas citizens disseminate knowledge orally. The estimated number of manuscript collections in Jawi writing is 265. Among the prominent manuscripts still being researched until today are Sulalatus-Salatin, Hikayat Amir Hamzah and Hikayat Hang Tuah.

The occupation of the British on the Malay lands has widely disseminated Malay manuscripts to an international audience (Ming, 2003). The Malay Language arose to popularity amongst the West, starting from publications of Malay-English dictionaries for communication during trading. Concurrent with the birth of European scholars in the field of education, European traders who came to the Malay lands were encouraged to purchase and preserve a collection of Malay manuscripts relating to the history and politics of the Malay Archipelago. English rulers at that time actively copied manuscripts if they were unable to purchase them. The transcribers responsible for copying the manuscripts were amongst paid local people or even English people who copied without understanding the contents. Foreign ownership is the reason why the Malays do not own many manuscripts. For example, the copyright owner of Hikayat Sri Rama, Edward Pococke, awarded the manuscript to the library of Oxford University for research (Ming, 2003). As a result, Malay manuscripts are now available in many places in Britain such as at the Royal Asiatic Society, London (the collection of Stamford Raffles), the library of Leiden University and the Baptist College library in Bristol.

(2)

A Study on Classical Texts in the Field of Computational Linguistics through Bibliometric Analysis

The development of classical text studies has not only engaged in local research, hence, investigating how it has impacted international research is the objective of this study. Therefore, this paper will study the trends or development of neywords in classical texts through a bibliometric analysis.

2. Methodology

This paper extensively utilises the bibliometric analysis in mapping a general knowledge structure of the research subject (Rialti, Marzi, Ciappei, & Busso, 2019). The bibliometric analysis enables a demonstration of progress between the method of application and the field of study (Iqbal et al., 2019). This method can accomplish a significant amount of research progress by applying relevant keywords to uncover current trends emerging around the target field. Also, this method facilitates researchers in identifying key players in the target study by analysing the frequency of related works found in, for example, journals, conference papers, books, and reports. Bibliometric analysis is also relevant for application in all fields of study, including science, social science, and Language. Gunashekar, Wooding and Guthrie (2017); Iqbal et al., (2019) and Muslu (2018) are among the researchers who have applied bibliometric analysis in the fields of medicine, computer science, and health science. In bringing about the literature review for bibliometric analysis, the main keyword, “classical text”, is used to obtain related studies that will underpin this chapter.

Articles indexed by SpringerLink provided data in October 2019. The preference to SpringerLink is because of the values of its papers which comprise of a wide range of documents such as journals, book chapters, conference papers, reference work entries, protocols, and videos. SpringerLink is a leading database on par with Scopus and ISI. It maintains its quality by experts in various fields of research from all around the world. In 2019 to date, SpringerLink hosts 13,118,849 research materials in the record, rendering it to be the most suitable database to study the trends of classical texts through bibliometric analysis.

Table 1. General info in SpringerLink

Type of Document Total Publications

Article 6,962,098

Book Chapters 4,391,851

Conference Papers 1,143,906

Reference Work Entries 562,947

Protocols 57,939

Video 108

3. Results and Discussions

All the articles obtained were analysed according to the following aspects, namely type of document, discipline, and subdiscipline.

Table 2. Types of Neyword Documents [Classic Texts]

Types of Documents Frequency Percentage (%)

Articles 111,070 53.94

Book Chapters 90,104 43.76

Reference Work Entry 4,217 2.05

Protocols 494 0.24

Books 36 0.017

Videos 4 0.0019

Book Series 1 0.00049

Total 205,926 100

According to Table 2, articles have the highest number of frequencies with 111,070 equivalents to 53.94%, followed by book chapters with 90,104 or 43.76%. Both of these types of documents show higher frequencies and percentages compared to reference work entry, protocols, books, videos and book series. Reference work entry indicates frequencies of 4,217 or 2.05% while the remaining four documents suggest percentages less than 1% which are 0.24% for protocols, 0.017% for books, 0.0019% for videos, 0.00049% for book series with 494,

(3)

36, 4 and 1 frequencies respectively. The results shown do not tally with the generated data because the data show reachable numbers of articles only. Subsequently, the analysis adheres to the criteria of the type of discipline (refer to Table 3).

Table 3. Types of Neyword Discipline Field [classic text]

Types of Discipline Frequency Percentage (%)

Physics 30,075 15.53 Mathematics 29,076 15.01 Engineering 28,705 14.82 Computer Science 28,565 14.75 Material Science 7,881 4.07 Philosophy 7,262 3.75 Chemistry 6,436 3.32

Medicine and Allied Sciences 6,366 3.29

Life Sciences 6,079 3.14 Earth Sciences 5,161 2.66 Social Sciences 5,161 2.66 Business Management 4,524 2.34 Biomedicine 4,520 2.33 Education 4,382 2.26 Economics 3,881 2.00

Media and Culture Studies 3,725 1.92

Literature 3,681 1.90

Political Sciences and International Relations 2,942 1.52 Psychology 2,362 1.21 History 2,342 1.21 Statistics 2,316 1.20 Law 1,989 1.03 Environmental Studies 1,528 0.79 Linguistics 1,425 0.74 Energy Studies 997 0.51 Religious Studies 980 0.51 Popular Sciences 653 0.34 Geography 639 0.33 Finance 541 0.28

Criminology and Criminal Justice 361 0.19

Pure Sciences, Humanities and Social Sciences, Multidisciplinary Sciences

208 0.11

Dentistry 108 0.06

Pharmacy 36 0.02

Architecture 24 0.01

Total 193,691 100

Table 3 demonstrates the types of discipline fields from the search results of classic text neywords. The field of Physics shows the highest results with 30,075 frequencies or 15.53%. All three fields of Mathematics, Engineering and Computer Science display similar results to Physics with percentages of 15.01%, 14.82%, and 14.75% or frequencies of 29,076, 28,705, and 28,565 respectively. The remaining types of discipline fields show a far disparity against the highest four fields. The field of Material Science illustrates frequencies of 7,881 or 4.07% only.

Meanwhile, discipline fields that record around 3% are Philosophy, Chemistry, Medicine and Allied Health Sciences, and Life Sciences. The disciplines with percentages of 2.00% to 2.66% are Earth Sciences, Social Sciences, Business Management, Biomedicine, Education and Economics. The 12 disciplines having the least percentage lower than 1% among them are Environmental Studies, Linguistics, Finance, Dentistry and Pharmacy, with frequencies between 24 to 1,528.

The results of neywords [classic text] according to Table 2 and 3 display a general trend of classical text studies which encompass a wide range of interdisciplinary fields. Therefore, the next process is limiting the

(4)

search results to articles related to only Linguistics based on neywords [classic text]- articles-linguistics-2015-2020 (refer to Diagram 1).

Diagram 1. Neyword Search [classic text]

According to the search specifications, the articles related to Linguistics came back with 362 results comprising of several subdisciplines related to classical text studies (refer Table 4). Only related articles from the subdisciplines are selected, as shown in the following table:

Table 4. List of Neyword [classic text] Subdisciplines

Type of Subdisciplines Frequency Percentage %

Language and Literature 148 10.33

General Linguistics 131 9.14 Comparative Linguistics 93 6.49 General Education 90 6.28 Syntax 82 5.72 Historical Linguistics 66 4.61 Language Education 62 4.33 Language Philosophy 60 4.19 Comparative Literature 56 3.91 Philology 56 3.91 Computer Linguistics 53 3.70

General Computer Science 51 3.56

Neurology 47 3.30 Psycholinguistics 47 3.30 General Sociology 43 3.00 Literature 41 2.86 Semantics 41 2.86 Applied Linguistics 30 2.09 Neuro-Linguistic Programming (NLP) 30 2.09

Phonetics and Phonology 23 1.61

Sign Language 23 1.61

Theoretical Linguistics 23 1.61

Political Science 19 1.33

Sociolinguistics 19 1.33

Chinese Language 16 1.12

(5)

Humanities

Pragmatics 11 0.77

Culture and Religious Studies 10 0.70

Russian Language 10 0.70 Cognitive Linguistics 9 0.63 Corpus Linguistics 9 0.63 Multi-language 9 0.63 German Language 7 0.49 Japanese Language 7 0.49 Total 1,433 100

According to Table 4, there are 34 subdisciplines with 1,433 articles.

Based on Table 4, there are 34 subdisciplines of 1,433 articles from the previously mentioned linguistic disciplines. The subdiscipline of Language and Literature shows the highest result with 148 frequencies or 10.33%, followed by General Linguistics with 131 frequencies or 9.14%. Both Comparative Linguistics and General Education show a similar percentage of 6.49% and 6.28% or frequencies of 93 and 90 respectively. The subdisciplines scoring lower than 1% are Computer Applications in Literature and Humanities, Pragmatics, Culture and Religious Studies, Russian Language, Cognitive Linguistics, Corpus Linguistics, Multi-language, German Language and Japanese Language with percentages and frequencies of 0.77% (11), 0.77% (11), 0.70% (10), 0.70% (10), 0.63% (9), 0.63% (9), 0.63% (9), 0.49% (7), 0.49% (7) respectively.

The subdiscipline, which can be detailed further, to signify more correlation and relevance to previous literature reviews, is Computer Linguistics. Based on this paper’s topic of interest related to Computer Linguistics, the WordSift website forms a holistic word cloud view that reflects lexical frequencies of the research topic achieved through the performed search.

Diagram 2. Word Cloud of Neywords [classic text] Source: https://wordsift.org/

Diagram 2 illustrates the word cloud produced by WordShift generated from titles of uploaded research papers. Word cloud is a web-based application service offering visualisation of words derived from Language or text, conforming to the pursuit of big data which is gaining prominence (Jin, 2017). Academics make the most out of word cloud in achieving a critical and holistic overview of research ideas (Qeis, 2015). Based on Diagram 2.2, the most outstanding lexicons, namely Language, corpus, and Arabic Language, signify a substantial correlation with classical text studies in the context of linguistics. Other lexicons include historical, method, resources, recognition, analysis and linguistics. From the lexicons seen in the word cloud, the initial overview perceives 54 articles as related to classical texts.

In total, the lexicon of Language appears the most in the neyword classic text, since 53 out of 54 uploaded articles discuss classical text studies from the perspective of Language not limited to one but all languages. In the classical text studies, the Arabic Language dominates the discipline of Linguistics and its subdiscipline;

(6)

Computer Linguistics. Among the research in the Arabic Language are by Belinkov, Magidow, Barrón-Cedeño, Shmidman and Romanov (2019); Neme and Paumier (2019), Hammo, Yagi, Ismail and Abu Shariah (2016); Hammo et al. (2016); Djellab, Amrouche, Bouridane, and Mehallegue (2017) and Al-Thubaity (2015). These studies discuss the development of language corpus aimed at uncovering the beauty of Language from the aspects of sociology, history, semantics and grammar. Studies that explore the history of Arabic Language such as Hammo et al. (2016) intended at developing historical corpus on classical text studies inclusive of the al-Quran have existed for thousands of years ago. Meanwhile, studies that discover lexicology such as Neme and Paumier (2019) and Mohamed and Oussalah (2019) investigated the correlation of text to semantics through a hybrid approach.

Drawing on a discussion of classical textual, linguistic studies, researchers from Asia who have conducted their study of native languages such as Koo, (2015) have studied unsupervised methods for developing applications for identifying loan words in Korean using a statistical approach completely to identify word frequency and analyse it bi-gram.

Asian researchers such as Koo (2015) have studied classical texts in their native tongue. He studied an unsupervised method for developing a character-based n-gram classifier that detects loanwords or transliterated imported vocabularies in Korean classical text. Other than that, Pham, Tucker, and Baayen (2019) studied general corpus-based research of various Vietnamese materials, including children literary books and compared them to film subtitles. Pham et al. also investigated the perspective of semantics using the Latent Semantic Analysis (LSA) method. Classical text studies in other languages include Latin language (Kabala, 2018; Boschetti, 2015), French Language (Magistry, Ligozat, & Rosset (2019), Turkish language Eryiğit et al. (2019), German language (Schulz & Ketschik, 2019), Slovenian Language (Fišer, Ljubešić, & Erjavec, 2018); Indo-European Language (Eckhoff et al., 2018) and Hausa language (Bimba, Idris, Khamis, & Noor, 2016). In summary, classical text studies apply to all languages of the world. However, classical text studies find more Arabic Language research compared to other languages.

The word that frequently occurs in the word cloud of articles related to the classical text is corpus. Classical texts store an abundance of data related to various aspects of life. Researchers investigating classical texts organise the data and information in the form of digital corpus for easy access to future research. For example, Rubinstein (2019) developed the first corpus database of Emergent Modern Hebrew, which combines writings and visuals from various genres. The database creation has improved research in the Hebrew language through the historical development of Hebrew from the classical era to modern times. Other than observing the language progress, corpus database can identify differences of intra-language dialects.

Jarrar, Habash, Alrimawi, Akra, and Zalmout (2017); Djellab et al., (2017); Masmoudi, Bougares, Ellouze, Estève, dan Belguith (2018); and Abainia (2019) are among researchers who study Arabic Language but with diverse dialects according to their respective geographies, such as the Arabic Language in Tunisian dialect, Algerian dialect, Palestinian dialect, and French dialect. These researches are made possible by corpus that records all classical texts. Intra-language dialect researches prove that classical text studies are not limited to only literary studies but encompass a wide range of disciplines including in-depth discussions on phonetics and phonology.

4. Conclusion

In brief, international classical text studies in the search results of SpringerLink are mostly in forms of articles compared to other publications such as book chapters, conference papers, reference work entries, protocols and videos. The significant number of articles suggest to researchers that publication in article format has a broader reach and availability in large quantities of high quality and relevant related materials.

The type of discipline with the highest searchings Physics with 30,075 or 15.53% followed by Mathematics, Engineering and Computer Science. Whereas, disciplines possessing the lowest percentages below 1% are Environmental Studies, Linguistics, Finance, Dentistry, and Pharmacy with frequencies 24 to 1,528. Pure Sciences record higher several classical text studies compared to this paper’s field of interest that is Language and Linguistics. Academic exploration of classical text studies in all related disciplines is recommended to be enhanced further in understanding the extent of respective disciplines. However, for this research, only the development and trends of related classical text are studied through the generated neyword classic text.

Besides, 34 subdisciplines related to linguistics comprise of 1,433 articles were scrutinised as previously discussed. The subdiscipline of Language and Literature shows the highest engagement with 148 frequencies or 10.33%, whereas, the lowest which is lower than 1% are Computer Applications in Literature and Humanities,

(7)

Pragmatics, Culture and Religious Studies, Russian Language, Cognitive Linguistics, Corpus Linguistics, Multi-Language, German Multi-Language, and Japanese Language with percentages and frequencies of 0.77% (11), 0.77% (11), 0.70% (10), 0.70% (10), 0.63% (9), 0.63% (9), 0.63% (9), 0.49% (7), and 0.49% (7) respectively. Subsequently, the article focuses on the subdiscipline of Computer Linguistics aforementioned in the initial bibliometric analysis. As noted in the research title, holistic observation is given to the WordShift website in generating a word cloud of frequent lexicons created from the performed database search. The three most frequent lexicons are Language, corpus and Arabic Language. Each lexicon provides a clear overview of the related field of study.

In conclusion, the bibliometric analysis of classical texts in Computer Linguistics has given an overview to other researchers interested in studying a similar field related to types of publications, disciplines and subdisciplines in a more systematic way. The research development mapping by way of bibliometric analysis is crucial in observing a holistic global education trend using related neywords.

Bibliography

1. Abainia, K. (2019). DZDC12: a new multipurpose parallel Algerian Arabizi–French code-switched corpus. Language Resources and Evaluation. https://doi.org/10.1007/s10579-019-09454-8.

2. Abd. Azam, A. and Yatim, O. (2012). Manuskrip lama: asas keupayaan dan kearifan Melayu tradisi. International journal of the Malay world and civilisation, Vol. 30, No, 1, pp. 29 – 39.

3. Al-Thubaity, A. O. (2015). A 700M+ Arabic corpus: KACST Arabic corpus design and construction. Language Resources and Evaluation, Vol. 49, No. 3, pp. 721–751. https://doi.org/10.1007/s10579-014-9284-1.

4. Alkema, A. (2021). Lines and semi-countably differentiable primes. Mathematical Statistician and Engineering Applications, 70(2), 90-98.

5. Belinkov, Y., Magidow, A., Barrón-Cedeño, A., Shmidman, A., and Romanov, M. (2019). Studying the history of the Arabic Language: language technology and a large-scale historical corpus. Language Resources and Evaluation, Vol. 53, No. 4, pp. 771–805. https://doi.org/10.1007/s10579-019-09460-w. 6. Bimba, A., Idris, N., Khamis, N., and Noor, N. F. M. (2016). Stemming Hausa text: using

affix-stripping rules and reference look-up. Language Resources and Evaluation, Vol. 50, No. 3, 687–703. https://doi.org/10.1007/s10579-015-9311-x.

7. Boschetti, F. (2015). Barbara McGillivray: Methods in Latin computational linguistics. (Brill’s studies in historical linguistics). Language Resources and Evaluation, Vol. 49, No. 4, pp. 927–931. https://doi.org/10.1007/s10579-015-9305-8.

8. Chen, X., Xie, H., Wang, F. L., Liu, Z., Xu, J., and Hao, T. (2018). A bibliometric analysis of natural language processing in medical research. BMC Medical Informatics and Decision Making, Vol. 18(Suppl 1), 1–14. https://doi.org/10.1186/s12911-018-0594-x

9. Djellab, M., Amrouche, A., Bouridane, A., and Mehallegue, N. (2017). Algerian Modern Colloquial Arabic Speech Corpus (AMCASC): regional accents recognition within complex socio-linguistic environments. Language Resources and Evaluation, Vol. 51, No. 3, pp. 613–641. https://doi.org/10.1007/s10579-016-9347-6.

10. Djellab, M., Amrouche, A., Bouridane, A., and Mehallegue, N. (2017). Algerian Modern Colloquial Arabic Speech Corpus (AMCASC): regional accents recognition within complex socio-linguistic environments. Language Resources and Evaluation, Vol. 51, No. 3, pp. 613–641. https://doi.org/10.1007/s10579-016-9347-6.

11. Eckhoff, H., Bech, K., Bouma, G., Eide, K., Haug, D., Haugen, O. E., and Jøhndal, M. (2018). The PROIEL treebank family: a standard for early attestations of Indo-European languages. Language Resources and Evaluation, Vol. 52, No. 1, pp. 29–65. https://doi.org/10.1007/s10579-017-9388-5. 12. Eryiğit, G., Eryiğit, C., Karabüklü, S., Kelepir, M., Özkul, A., Pamay, T., … Köse, H. (2019). Building

the first comprehensive machine-readable Turkish sign language resource: methods, challenges and solutions. Language Resources and Evaluation. https://doi.org/10.1007/s10579-019-09465-5.

13. Fišer, D., Ljubešić, N., and Erjavec, T. (2018). The Janes project: language resources and tools for Slovene user-generated content. Language Resources and Evaluation. https://doi.org/10.1007/s10579-018-9425-z.

14. Gunashekar, S., Wooding, S., and Guthrie, S. (2017). How do NIHR peer review panels use bibliometric information to support their decisions? Scientometrics, Vol. 112, No. 3, pp. 1813–1835. https://doi.org/10.1007/s11192-017-2417-8.

15. Hammo, B., Yagi, S., Ismail, O., and AbuShariah, M. (2016). Exploring and exploiting a historical corpus for Arabic. Language Resources and Evaluation, Vol. 50, No. 4, pp. 839–861. https://doi.org/10.1007/s10579-015-9304-9.

(8)

16. Iqbal, W., Qadir, J., Tyson, G., Mian, A. N., Hassan, S., and Crowcroft, J. (2019). A bibliometric analysis of publications in computer networking research. In Scientometrics, Vol. 119. https://doi.org/10.1007/s11192-019-03086-z.

17. Jarrar, M., Habash, N., Alrimawi, F., Akra, D., and Zalmout, N. (2017). Curras: an annotated corpus for the Palestinian Arabic dialect. Language Resources and Evaluation, Vol. 51, No. 3, pp. 745–775. https://doi.org/10.1007/s10579-016-9370-7.

18. Jin, Y. (2017). Development of Word Cloud Generator Software Based on Python. Procedia Engineering, Vol;. 174, pp. 788–792. https://doi.org/10.1016/j.proeng.2017.01.223.

19. Kabala, J. (2018). Computational authorship attribution in medieval Latin corpora: the case of the Monk of Lido (ca. 1101–08) and Gallus Anonymous (ca. 1113–17). Language Resources and Evaluation. https://doi.org/10.1007/s10579-018-9424-0.

20. Koo, H. (2015). An unsupervised method for identifying loanwords in Korean. Language Resources and Evaluation, Vol. 49, No. (2), pp. 355–373. https://doi.org/10.1007/s10579-015-9296-5.

21. Magistry, P., Ligozat, A. L., and Rosset, S. (2019). Exploiting languages proximity for part-of-speech tagging of three French regional languages. Language Resources and Evaluation, Vol. 53, No. 4, pp. 865–888. https://doi.org/10.1007/s10579-019-09463-7.

22. Masmoudi, A., Bougares, F., Ellouze, M., Estève, Y., and Belguith, L. (2018). Automatic speech recognition system for Tunisian dialect. Language Resources and Evaluation, Vol. 52, No. 1, pp. 249– 267. https://doi.org/10.1007/s10579-017-9402-y.

23. Ming, D. C. (2003). Kajian manuskrip Melayu: masalah, kritikan dan cadangan. Kuala Lumpur: Utusan Publications & Distributors Sdn Bhd.

24. Mohamed, M., and Oussalah, M. (2019). A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics. Language Resources and Evaluation. https://doi.org/10.1007/s10579-019-09466-4.

25. Mohamed Redzwan, H. F., Bahari, K. A., Sarudin, A. and Osman, Z. (2020). Strategi pengukuran upaya berbahasa menerusi kesantunan berbahasa sebagai indikator profesionalisme guru pelatih berasaskan skala morfofonetik, sosiolinguistik dan sosiopragmatik. Malaysian Journal of Learning & Instruction, Vol. 17, No. 1, pp. 187-228.

26. Muslu, Ü. (2018). The Evolution of Breast Reduction Publications: A Bibliometric Analysis. Aesthetic Plastic Surgery, Vol. 42, No. 3, pp. 679–691. https://doi.org/10.1007/s00266-018-1080-7.

27. Neme, A. A., and Paumier, S. (2019). Restoring Arabic vowels through omission-tolerant dictionary lookup: ةّيبوساح دراوَم َرْبَع تامِلَكلا ليكْشت. In Language Resources and Evaluation. https://doi.org/10.1007/s10579-019-09464-6.

28. Pham, H., Tucker, B. V., and Baayen, R. H. (2019). Constructing two vietnamese corpora and building a lexical database. Language Resources and Evaluation, Vol. 53, No. 3, 465–498. https://doi.org/10.1007/s10579-019-09451-x.

29. Qeis, M. I. (2015). Aplikasi wordcloud sebagai alat bantu analisis wacana. International Conference on Language, Culture, and Society - ICLCS LIPI, (November 2015). Retrieved from https://www.researchgate.net/publication/316736417_APLIKASI_WORDCLOUD_SEBAGAI_ALAT _BANTU_ANALISIS_WACANA.

30. Rialti, R., Marzi, G., Ciappei, C., & Busso, D. (2019). Big data and dynamic capabilities: a bibliometric analysis and systematic literature review. Management Decision. https://doi.org/10.1108/MD- 07-2018-0821.

31. Rubinstein, A. (2019). Historical corpora meet the digital humanities: the Jerusalem Corpus of Emergent Modern Hebrew. Language Resources and Evaluation, Vol. 53, No. 4, pp. 807–835. https://doi.org/10.1007/s10579-019-09458-4.

32. Sarudin, A., Mohamed Redzwan, H. F., Osman, Z., Raja Ma’amor Shah, R. N. F., and Mohd Ariff Albakri, I. S. (2019a). Menangani kekaburan kemahiran prosedur dan terminologi awal Matematik: Pendekatan leksis berdasarkan Teori Prosodi Semantik. Malaysian Journal of Learning and Instruction, Vol. 16, No. 2, pp. 255-294.

33. Sarudin, A., Mohamed Redzwan, H. F, Osman, Z., and Mohd Ariff Al-Bakry, I. S. (2019b). Using the Cognitive Research Trust scale to assess the implementation of the elements of higher-order thinking skills in Malay Language teaching and learning. International Journal of Recent Technology and Engineering(IJRTE,), Vol. 8, No. 2S2, pp. 392-398.

34. Schulz, S., and Ketschik, N. (2019). From 0 to 10 million annotated words: part-of-speech tagging for Middle High German. Language Resources and Evaluation, Vol. 53, No. 4, pp. 837–863. https://doi.org/10.1007/s10579-019-09462-8