ChatGPT vs Gemini: Which Digs Deeper into Arabic Semantics?
DOI:
https://doi.org/10.25217/mantiqutayr.v6i1.7146Keywords:
Arabic AI Translation, ChatGPT and Gemini, METEOR Evaluation, Semantic InterpretationAbstract
This study examined the performance of AI models in translating classical Arabic grammatical literature, focusing on Alfiyah Ibn Mālik and Naẓm al-Imrīṭī, two foundational texts marked by dense syntactic structures and strong pedagogical significance. ChatGPT and Gemini were evaluated in terms of translation accuracy, terminological precision, and contextual sensitivity. A panel of expert evaluators with more than fifteen years of experience in Arabic instruction assessed each model’s capacity to apply syntactic rules, preserve semantic coherence, and maintain stylistic and didactic integrity. The aim and scope of the paper centred on measuring translation quality through a combined framework of METEOR scoring and human expert judgement. Qualitative evaluation further explored the models’ adaptability to classical Arabic rhetorical patterns and instructional conventions. The results showed that ChatGPT achieved higher lexical alignment and word-level accuracy than Gemini according to METEOR scores; however, both models demonstrated notable limitations in rendering idiomatic expressions and conveying deeper grammatical and contextual meanings. Statistical analysis using the Mann–Whitney U test revealed no significant difference between the two models, underscoring the limited explanatory power of automated metrics when applied to highly structured classical texts. These findings underscored the ongoing need for expert validation beyond numerical scoring and supported the adoption of a hybrid translation framework, in which AI-generated outputs are systematically refined through scholarly review. Future research was suggested to broaden the textual corpus, incorporate additional AI models and evaluation metrics, and further strengthen expert-led validation to enhance the reliability of AI-assisted translation in advanced Arabic grammatical studies.
References
Abdelhay, Mohammed, Ammar Mohammed, and Hesham A. Hefny. “Deep Learning for Arabic Healthcare: MedicalBot.” Social Network Analysis and Mining 13, no. 1 (April 2023): 71. DOI: https://doi.org/10.1007/s13278-023-01077-w
Abdulkader, Zena, and Yousra Al-Irhayim. “A Review of Arabic Intelligent Chatbots: Developments and Challenges.” Al-Rafidain Engineering Journal (AREJ) 27, no. 2 (September 2022): 178–89. DOI: https://doi.org/10.33899/rengj.2022.132550.1148
Ahmed, Imtiaz, Mashrafi Kajol, Uzma Hasan, and Partha Protim Datta. ChatGPT vs. Bard: A Comparative Study. Preprint. 2023. https://doi.org/10.36227/techrxiv.23536290.v1
AlAfnan, Mohammad Awad. “Artificial Intelligence and Language: Bridging Arabic and English with Technology.” Journal of Ecohumanism 4, no. 1 (2025): 1. DOI: https://doi.org/10.62754/joe.v4i1.4961
Al-Ayyoub, Mahmoud, Aya Nuseir, Kholoud Alsmearat, Yaser Jararweh, and Brij Gupta. “Deep Learning for Arabic NLP: A Survey.” Journal of Computational Science 26 (2018): 522–31. DOI: https://doi.org/10.1016/j.jocs.2017.11.011
Alshater, Muneer. “Exploring the Role of Artificial Intelligence in Enhancing Academic Performance: A Case Study of ChatGPT.” SSRN Electronic Journal, ahead of print, 2022. DOI: https://doi.org/10.2139/ssrn.4312358
Azmi, Aqil M, Abdulaziz O Al-Qabbany, and Amir Hussain. “Computational and Natural Language Processing Based Studies of Hadith Literature: A Survey.” Artificial Intelligence Review 52 (2019): 1369–414. DOI: https://doi.org/10.1007/s10462-019-09692-w
Balloccu, Simone, Ehud Reiter, Karen Jia-Hui Li, Rafael Sargsyan, Vivek Kumar, Diego Reforgiato, Daniele Riboni, and Ondrej Dusek. “Ask the Experts: Sourcing a High-Quality Nutrition Counseling Dataset through Human-AI Collaboration.” Findings of the Association for Computational Linguistics: EMNLP 2024, Association for Computational Linguistics. 2024. 11519–45. DOI: https://doi.org/10.18653/v1/2024.findings-emnlp.674
Berkey, Jonathan Porter. The Transmission of Knowledge in Medieval Cairo: A Social History of Islamic Education. Princeton. New Jersey: Princeton University Press. 2014.
Bilquise, Ghazala, Samar Ibrahim, and Khaled Shaalan. “Bilingual AI-Driven Chatbot for Academic Advising.” International Journal of Advanced Computer Science and Applications 13, no. 8 (2022): 34. DOI: https://doi.org/10.14569/IJACSA.2022.0130808
Bingley, William J., Caitlin Curtis, Steven Lockey, Alina Bialkowski, Nicole Gillespie, S. Alexander Haslam, Ryan K. L. Ko, Niklas Steffens, Janet Wiles, and Peter Worthy. “Where Is the Human in Human-Centered AI? Insights from Developer Priorities and User Experiences.” Computers in Human Behavior 141 (April 2023): 107617. DOI: https://doi.org/10.1016/j.chb.2022.107617
Campello de Souza, Bruno, Agostinho Serrano de Andrade Neto, and Antonio Roazzi. “Are the New AIs Smart Enough to Steal Your Job? IQ Scores for ChatGPT, Microsoft Bing, Google Bard and Quora Poe.” SSRN Scholarly Paper No. 4412505. Rochester. NY. April 7. 2023. DOI: https://doi.org/10.2139/ssrn.4412505
Carvalho, Lucila, Roberto Martinez-Maldonado, Yi-Shan Tsai, Lina Markauskaite, and Maarten De Laat. “How Can We Design for Learning in an AI World?” Computers and Education: Artificial Intelligence 3 (2022): 100053. DOI: https://doi.org/10.1016/j.caeai.2022.100053
Chauhan, Chhavi. “The Impact of Generative Artificial Intelligence in Scientific Content Synthesis for Authors.” The American Journal of Pathology 194, no. 8 (August 2024): 1406–8. DOI: https://doi.org/10.1016/j.ajpath.2024.06.002
Dalayli, Feyza. “Use of NLP Techniques in Translation by ChatGPT: Case Study.” In Proceedings of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable Corpora (BUCC), edited by Amal Haddad Haddad, Ayla Rigouts Terryn, Ruslan Mitkov, Reinhard Rapp, Pierre Zweigenbaum, and Serge Sharoff, 19–25. Varna, Bulgaria: INCOMA Ltd. Shoumen. Bulgaria. 2023. https://aclanthology.org/2023.contents-1.3
Farghal, Mohammed, and Ahmad S. Haider. “Translating Classical Arabic Verse: Human Translation vs. AI Large Language Models (Gemini and ChatGPT).” Cogent Social Sciences 10, no. 1 (December 2024): 2410998. DOI: https://doi.org/10.1080/23311886.2024.2410998
Hamadneh, Nawaf N., Samer Atawneh, Waqar A. Khan, Khaled A. Almejalli, and Adeeb Alhomoud. “Using Artificial Intelligence to Predict Students’ Academic Performance in Blended Learning.” Sustainability 14, no. 18 (January 2022): 18. DOI: https://doi.org/10.3390/su141811642
He, Sui. “Prompting ChatGPT for Translation: A Comparative Analysis of Translation Brief and Persona Prompts.” Version 2. Preprint, arXiv. 2024. DOI: https://doi.org/10.48550/ARXIV.2403.00127
Hong, Matthew K, Adam Fourney, Derek DeBellis, and Saleema Amershi. “Planning for Natural Language Failures with the Ai Playbook.” 2021. 1–11. DOI: https://doi.org/10.1145/3411764.3445735
Ibrahim, Nourhan, Samar Aboulela, Ahmed Ibrahim, and Rasha Kashef. “A Survey on Augmenting Knowledge Graphs (KGs) with Large Language Models (LLMs): Models, Evaluation Metrics, Benchmarks, and Challenges.” Discover Artificial Intelligence 4, no. 1 (November 2024): 76. DOI: https://doi.org/10.1007/s44163-024-00175-8
Inel, Oana, Tim Draws, and Lora Aroyo. “Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection.” Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 11, no. 1 (November 2023): 1. DOI: https://doi.org/10.1609/hcomp.v11i1.27547
Johnson, Douglas, Rachel Goodman, J Patrinely, Cosby Stone, Eli Zimmerman, Rebecca Donald, Sam Chang, Sean Berkowitz, Avni Finn, and Eiman Jahangir. Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. 2023. DOI: https://doi.org/10.21203/rs.3.rs-2566942/v1
Kataria, Pratik, Kiran Rode, Akshay Jain, Prachi Dwivedi, Sukhada Bhingarkar, and MCP India. “User Adaptive Chatbot for Mitigating Depression.” International Journal of Pure and Applied Mathematics 118, no. 16 (2018): 349–61. https://openreview.net/forum?id=r16Vyf-0-
Khoshafah, Faten. “ChatGPT for Arabic-English Translation: Evaluating the Accuracy.” Ministry of Education, Yemen, ahead of print. April 17. 2023. https://doi.org/10.21203/rs.3.rs-2814154/v2
Koehn, Philipp, Franz Josef Och, and Daniel Marcu. “Statistical Phrase-Based Translation.” Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL ’03 1 (2003): 48–54. DOI: https://doi.org/10.3115/1073445.1073462
Kruschke, John K. “Bayesian Analysis Reporting Guidelines.” Nature Human Behaviour 5, no. 10 (October 2021): 10. DOI: https://doi.org/10.1038/s41562-021-01177-7
Liang, Wei, and Hongsheng Dai. “Bayesian Inference.” In Quantum Chemistry in the Age of Machine Learning. 233–50. Elsevier, 2023. DOI: https://doi.org/10.1016/B978-0-323-90049-2.00005-6
Lozano, Michael, Stefan Winthrop, Cedric Goldsworthy, Artemas Leventis, and Alistair Birkenshaw. “Semantic Depth Redistribution in Large Language Models to Contextual Embedding Preservation.” Preprint, November 5. 2024. DOI: https://doi.org/10.22541/au.173083529.98863661/v1
Moneus, Ahmed Mohammed, and Yousef Sahari. “Artificial Intelligence and Human Translation: A Contrastive Study Based on Legal Texts.” Heliyon 10, no. 6 (March 2024): 55. DOI: https://doi.org/10.1016/j.heliyon.2024.e28106
Muthiah, Anisatun, and Luqman Zain. “Konsep Ittishal Al-Sanad Sebagai Syarat Kajian Kitab Kuning Dalam Tradisi Pesantren An-Nahdliyyah Cirebon.” Jurnal Studi Hadis Nusantara 2, no.1 (2020): 75. DOI: https://doi.org/10.24235/jshn.v2i1.6746
Raj, Harsh, Vipul Gupta, Domenic Rosati, and Subhabrata Majumdar. “Semantic Consistency for Assuring Reliability of Large Language Models.” arXiv:2308.09138. Preprint, arXiv, August 17, 2023. DOI: https://doi.org/10.48550/arXiv.2308.09138
Ras, Gabriëlle, Marcel Van Gerven, and Pim Haselager. “Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges.” In Explainable and Interpretable Models in Computer Vision and Machine Learning, edited by Hugo Jair Escalante, Sergio Escalera, Isabelle Guyon, Xavier Baró, Yağmur Güçlütürk, Umut Güçlü, and Marcel Van Gerven, 19–36. The Springer Series on Challenges in Machine Learning. Cham: Springer International Publishing, 2018. DOI: https://doi.org/10.1007/978-3-319-98131-4_2
Riina, Nicholas, Likhitha Patlolla, Camilo Hernandez Joya, Roger Bautista, Melissa Olivar-Villanueva, and Anish Kumar. “An Evaluation of English to Spanish Medical Translation by Large Language Models.” In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations), edited by Marianna Martindale, Janice Campbell, Konstantin Savenkov, and Shivali Goel, 222–36. Chicago, USA: Association for Machine Translation in the Americas. 2024. https://aclanthology.org/2024.amta-presentations.15/
Russell, Regina G, Laurie Lovett Novak, Mehool Patel, Kim V Garvey, Kelly Jean Thomas Craig, Gretchen P Jackson, Don Moore, and Bonnie M Miller. “Competencies for the Use of Artificial Intelligence–Based Tools by Health Care Professionals.” Academic Medicine 98, no. 3 (2023): 348–56. DOI: https://doi.org/10.1097/ACM.0000000000004963
Shahriar, Sakib, Brady D. Lund, Nishith Reddy Mannuru, Muhammad Arbab Arshad, Kadhim Hayawi, Ravi Varma Kumar Bevara, Aashrith Mannuru, and Laiba Batool. “Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency.” Applied Sciences 14, no. 17 (September 2024): 7782. DOI: https://doi.org/10.3390/app14177782
Sulaeman, Islamiyah, Syuhadak Syuhadak, and Insyirah Sulaeman. “ChatGPT as a New Frontier in Arabic Education Technology.” Al-Arabi: Journal of Teaching Arabic as a Foreign Language 7, no. 1 (2023): 83–105. DOI: http://dx.doi.org/10.17977/um056v7i1p83-105
Vaswani, Ashish, Niki Parmar, Jakob Uszkoreit, Noam Shazeer, and Lukasz Kaiser. Image Transformer. February 15, 2018. https://openreview.net/forum?id=r16Vyf-0-
Wagenmakers, Eric-Jan, Richard D. Morey, and Michael D. Lee. “Bayesian Benefits for the Pragmatic Researcher.” Current Directions in Psychological Science 25, no. 3 (June 2016): 169–76. DOI: https://doi.org/10.1177/0963721416643289
Zaimah, Nely Rahmawati, Eko Budi Hartanto, and Fatchiatu Zahro. “Acceptability and Effectiveness Analysis of Large Language Model-Based Artificial Intelligence Chatbot Among Arabic Learners.” Mantiqu Tayr: Journal of Arabic Language 4, no. 1 (2024): 1. DOI: https://doi.org/10.25217/mantiqutayr.v4i1.3951
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Nely Rahmawati Zaimah, Chafidhoh Rizqiyah, Syamsul Hadi, Rifatul Muthiah, Wakhidati Nurrohmah Putri

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.







.png)
