ChatGPT vs Gemini: Which Digs Deeper into Arabic Semantics?

Authors

  • Nely Rahmawati Zaimah Islamic Elementary School Teacher Education Program Sekolah Tinggi Agama Islam Al-Anwar Rembang, Indonesia.
  • Chafidhoh Rizqiyah Islamic Religious Teaching Sekolah Tinggi Agama Islam Subang, Indonesia.
  • Syamsul Hadi Islamic Elementary School Teacher Education Program Sekolah Tinggi Agama Islam Al-Anwar Rembang, Indonesia.
  • Rifatul Muthiah Islamic Education Management Sekolah Tinggi Agama Islam Al-Kamal Rembang, Indonesia.
  • Wakhidati Nurrohmah Putri Arabic Language Education Department Universitas Islam Negeri Salatiga, Indonesia.

DOI:

https://doi.org/10.25217/mantiqutayr.v6i1.7146

Keywords:

Arabic AI Translation, ChatGPT and Gemini, METEOR Evaluation, Semantic Interpretation

Abstract

This study examined the performance of AI models in translating classical Arabic grammatical literature, focusing on Alfiyah Ibn Mālik and Naẓm al-Imrīṭī, two foundational texts marked by dense syntactic structures and strong pedagogical significance. ChatGPT and Gemini were evaluated in terms of translation accuracy, terminological precision, and contextual sensitivity. A panel of expert evaluators with more than fifteen years of experience in Arabic instruction assessed each model’s capacity to apply syntactic rules, preserve semantic coherence, and maintain stylistic and didactic integrity. The aim and scope of the paper centred on measuring translation quality through a combined framework of METEOR scoring and human expert judgement. Qualitative evaluation further explored the models’ adaptability to classical Arabic rhetorical patterns and instructional conventions. The results showed that ChatGPT achieved higher lexical alignment and word-level accuracy than Gemini according to METEOR scores; however, both models demonstrated notable limitations in rendering idiomatic expressions and conveying deeper grammatical and contextual meanings. Statistical analysis using the Mann–Whitney U test revealed no significant difference between the two models, underscoring the limited explanatory power of automated metrics when applied to highly structured classical texts. These findings underscored the ongoing need for expert validation beyond numerical scoring and supported the adoption of a hybrid translation framework, in which AI-generated outputs are systematically refined through scholarly review. Future research was suggested to broaden the textual corpus, incorporate additional AI models and evaluation metrics, and further strengthen expert-led validation to enhance the reliability of AI-assisted translation in advanced Arabic grammatical studies.

References

Abdelhay, Mohammed, Ammar Mohammed, and Hesham A. Hefny. “Deep Learning for Arabic Healthcare: MedicalBot.” Social Network Analysis and Mining 13, no. 1 (April 2023): 71. DOI: https://doi.org/10.1007/s13278-023-01077-w

Abdulkader, Zena, and Yousra Al-Irhayim. “A Review of Arabic Intelligent Chatbots: Developments and Challenges.” Al-Rafidain Engineering Journal (AREJ) 27, no. 2 (September 2022): 178–89. DOI: https://doi.org/10.33899/rengj.2022.132550.1148

Ahmed, Imtiaz, Mashrafi Kajol, Uzma Hasan, and Partha Protim Datta. ChatGPT vs. Bard: A Comparative Study. Preprint. 2023. https://doi.org/10.36227/techrxiv.23536290.v1

AlAfnan, Mohammad Awad. “Artificial Intelligence and Language: Bridging Arabic and English with Technology.” Journal of Ecohumanism 4, no. 1 (2025): 1. DOI: https://doi.org/10.62754/joe.v4i1.4961

Al-Ayyoub, Mahmoud, Aya Nuseir, Kholoud Alsmearat, Yaser Jararweh, and Brij Gupta. “Deep Learning for Arabic NLP: A Survey.” Journal of Computational Science 26 (2018): 522–31. DOI: https://doi.org/10.1016/j.jocs.2017.11.011

Alshater, Muneer. “Exploring the Role of Artificial Intelligence in Enhancing Academic Performance: A Case Study of ChatGPT.” SSRN Electronic Journal, ahead of print, 2022. DOI: https://doi.org/10.2139/ssrn.4312358

Azmi, Aqil M, Abdulaziz O Al-Qabbany, and Amir Hussain. “Computational and Natural Language Processing Based Studies of Hadith Literature: A Survey.” Artificial Intelligence Review 52 (2019): 1369–414. DOI: https://doi.org/10.1007/s10462-019-09692-w

Balloccu, Simone, Ehud Reiter, Karen Jia-Hui Li, Rafael Sargsyan, Vivek Kumar, Diego Reforgiato, Daniele Riboni, and Ondrej Dusek. “Ask the Experts: Sourcing a High-Quality Nutrition Counseling Dataset through Human-AI Collaboration.” Findings of the Association for Computational Linguistics: EMNLP 2024, Association for Computational Linguistics. 2024. 11519–45. DOI: https://doi.org/10.18653/v1/2024.findings-emnlp.674

Berkey, Jonathan Porter. The Transmission of Knowledge in Medieval Cairo: A Social History of Islamic Education. Princeton. New Jersey: Princeton University Press. 2014.

Bilquise, Ghazala, Samar Ibrahim, and Khaled Shaalan. “Bilingual AI-Driven Chatbot for Academic Advising.” International Journal of Advanced Computer Science and Applications 13, no. 8 (2022): 34. DOI: https://doi.org/10.14569/IJACSA.2022.0130808

Bingley, William J., Caitlin Curtis, Steven Lockey, Alina Bialkowski, Nicole Gillespie, S. Alexander Haslam, Ryan K. L. Ko, Niklas Steffens, Janet Wiles, and Peter Worthy. “Where Is the Human in Human-Centered AI? Insights from Developer Priorities and User Experiences.” Computers in Human Behavior 141 (April 2023): 107617. DOI: https://doi.org/10.1016/j.chb.2022.107617

Campello de Souza, Bruno, Agostinho Serrano de Andrade Neto, and Antonio Roazzi. “Are the New AIs Smart Enough to Steal Your Job? IQ Scores for ChatGPT, Microsoft Bing, Google Bard and Quora Poe.” SSRN Scholarly Paper No. 4412505. Rochester. NY. April 7. 2023. DOI: https://doi.org/10.2139/ssrn.4412505

Carvalho, Lucila, Roberto Martinez-Maldonado, Yi-Shan Tsai, Lina Markauskaite, and Maarten De Laat. “How Can We Design for Learning in an AI World?” Computers and Education: Artificial Intelligence 3 (2022): 100053. DOI: https://doi.org/10.1016/j.caeai.2022.100053

Chauhan, Chhavi. “The Impact of Generative Artificial Intelligence in Scientific Content Synthesis for Authors.” The American Journal of Pathology 194, no. 8 (August 2024): 1406–8. DOI: https://doi.org/10.1016/j.ajpath.2024.06.002

Dalayli, Feyza. “Use of NLP Techniques in Translation by ChatGPT: Case Study.” In Proceedings of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable Corpora (BUCC), edited by Amal Haddad Haddad, Ayla Rigouts Terryn, Ruslan Mitkov, Reinhard Rapp, Pierre Zweigenbaum, and Serge Sharoff, 19–25. Varna, Bulgaria: INCOMA Ltd. Shoumen. Bulgaria. 2023. https://aclanthology.org/2023.contents-1.3

Farghal, Mohammed, and Ahmad S. Haider. “Translating Classical Arabic Verse: Human Translation vs. AI Large Language Models (Gemini and ChatGPT).” Cogent Social Sciences 10, no. 1 (December 2024): 2410998. DOI: https://doi.org/10.1080/23311886.2024.2410998

Hamadneh, Nawaf N., Samer Atawneh, Waqar A. Khan, Khaled A. Almejalli, and Adeeb Alhomoud. “Using Artificial Intelligence to Predict Students’ Academic Performance in Blended Learning.” Sustainability 14, no. 18 (January 2022): 18. DOI: https://doi.org/10.3390/su141811642

He, Sui. “Prompting ChatGPT for Translation: A Comparative Analysis of Translation Brief and Persona Prompts.” Version 2. Preprint, arXiv. 2024. DOI: https://doi.org/10.48550/ARXIV.2403.00127

Hong, Matthew K, Adam Fourney, Derek DeBellis, and Saleema Amershi. “Planning for Natural Language Failures with the Ai Playbook.” 2021. 1–11. DOI: https://doi.org/10.1145/3411764.3445735

Ibrahim, Nourhan, Samar Aboulela, Ahmed Ibrahim, and Rasha Kashef. “A Survey on Augmenting Knowledge Graphs (KGs) with Large Language Models (LLMs): Models, Evaluation Metrics, Benchmarks, and Challenges.” Discover Artificial Intelligence 4, no. 1 (November 2024): 76. DOI: https://doi.org/10.1007/s44163-024-00175-8

Inel, Oana, Tim Draws, and Lora Aroyo. “Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection.” Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 11, no. 1 (November 2023): 1. DOI: https://doi.org/10.1609/hcomp.v11i1.27547

Johnson, Douglas, Rachel Goodman, J Patrinely, Cosby Stone, Eli Zimmerman, Rebecca Donald, Sam Chang, Sean Berkowitz, Avni Finn, and Eiman Jahangir. Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. 2023. DOI: https://doi.org/10.21203/rs.3.rs-2566942/v1

Kataria, Pratik, Kiran Rode, Akshay Jain, Prachi Dwivedi, Sukhada Bhingarkar, and MCP India. “User Adaptive Chatbot for Mitigating Depression.” International Journal of Pure and Applied Mathematics 118, no. 16 (2018): 349–61. https://openreview.net/forum?id=r16Vyf-0-

Khoshafah, Faten. “ChatGPT for Arabic-English Translation: Evaluating the Accuracy.” Ministry of Education, Yemen, ahead of print. April 17. 2023. https://doi.org/10.21203/rs.3.rs-2814154/v2

Koehn, Philipp, Franz Josef Och, and Daniel Marcu. “Statistical Phrase-Based Translation.” Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL ’03 1 (2003): 48–54. DOI: https://doi.org/10.3115/1073445.1073462

Kruschke, John K. “Bayesian Analysis Reporting Guidelines.” Nature Human Behaviour 5, no. 10 (October 2021): 10. DOI: https://doi.org/10.1038/s41562-021-01177-7

Liang, Wei, and Hongsheng Dai. “Bayesian Inference.” In Quantum Chemistry in the Age of Machine Learning. 233–50. Elsevier, 2023. DOI: https://doi.org/10.1016/B978-0-323-90049-2.00005-6

Lozano, Michael, Stefan Winthrop, Cedric Goldsworthy, Artemas Leventis, and Alistair Birkenshaw. “Semantic Depth Redistribution in Large Language Models to Contextual Embedding Preservation.” Preprint, November 5. 2024. DOI: https://doi.org/10.22541/au.173083529.98863661/v1

Moneus, Ahmed Mohammed, and Yousef Sahari. “Artificial Intelligence and Human Translation: A Contrastive Study Based on Legal Texts.” Heliyon 10, no. 6 (March 2024): 55. DOI: https://doi.org/10.1016/j.heliyon.2024.e28106

Muthiah, Anisatun, and Luqman Zain. “Konsep Ittishal Al-Sanad Sebagai Syarat Kajian Kitab Kuning Dalam Tradisi Pesantren An-Nahdliyyah Cirebon.” Jurnal Studi Hadis Nusantara 2, no.1 (2020): 75. DOI: https://doi.org/10.24235/jshn.v2i1.6746

Raj, Harsh, Vipul Gupta, Domenic Rosati, and Subhabrata Majumdar. “Semantic Consistency for Assuring Reliability of Large Language Models.” arXiv:2308.09138. Preprint, arXiv, August 17, 2023. DOI: https://doi.org/10.48550/arXiv.2308.09138

Ras, Gabriëlle, Marcel Van Gerven, and Pim Haselager. “Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges.” In Explainable and Interpretable Models in Computer Vision and Machine Learning, edited by Hugo Jair Escalante, Sergio Escalera, Isabelle Guyon, Xavier Baró, Yağmur Güçlütürk, Umut Güçlü, and Marcel Van Gerven, 19–36. The Springer Series on Challenges in Machine Learning. Cham: Springer International Publishing, 2018. DOI: https://doi.org/10.1007/978-3-319-98131-4_2

Riina, Nicholas, Likhitha Patlolla, Camilo Hernandez Joya, Roger Bautista, Melissa Olivar-Villanueva, and Anish Kumar. “An Evaluation of English to Spanish Medical Translation by Large Language Models.” In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations), edited by Marianna Martindale, Janice Campbell, Konstantin Savenkov, and Shivali Goel, 222–36. Chicago, USA: Association for Machine Translation in the Americas. 2024. https://aclanthology.org/2024.amta-presentations.15/

Russell, Regina G, Laurie Lovett Novak, Mehool Patel, Kim V Garvey, Kelly Jean Thomas Craig, Gretchen P Jackson, Don Moore, and Bonnie M Miller. “Competencies for the Use of Artificial Intelligence–Based Tools by Health Care Professionals.” Academic Medicine 98, no. 3 (2023): 348–56. DOI: https://doi.org/10.1097/ACM.0000000000004963

Shahriar, Sakib, Brady D. Lund, Nishith Reddy Mannuru, Muhammad Arbab Arshad, Kadhim Hayawi, Ravi Varma Kumar Bevara, Aashrith Mannuru, and Laiba Batool. “Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency.” Applied Sciences 14, no. 17 (September 2024): 7782. DOI: https://doi.org/10.3390/app14177782

Sulaeman, Islamiyah, Syuhadak Syuhadak, and Insyirah Sulaeman. “ChatGPT as a New Frontier in Arabic Education Technology.” Al-Arabi: Journal of Teaching Arabic as a Foreign Language 7, no. 1 (2023): 83–105. DOI: http://dx.doi.org/10.17977/um056v7i1p83-105

Vaswani, Ashish, Niki Parmar, Jakob Uszkoreit, Noam Shazeer, and Lukasz Kaiser. Image Transformer. February 15, 2018. https://openreview.net/forum?id=r16Vyf-0-

Wagenmakers, Eric-Jan, Richard D. Morey, and Michael D. Lee. “Bayesian Benefits for the Pragmatic Researcher.” Current Directions in Psychological Science 25, no. 3 (June 2016): 169–76. DOI: https://doi.org/10.1177/0963721416643289

Zaimah, Nely Rahmawati, Eko Budi Hartanto, and Fatchiatu Zahro. “Acceptability and Effectiveness Analysis of Large Language Model-Based Artificial Intelligence Chatbot Among Arabic Learners.” Mantiqu Tayr: Journal of Arabic Language 4, no. 1 (2024): 1. DOI: https://doi.org/10.25217/mantiqutayr.v4i1.3951

Downloads

Published

2026-02-08

How to Cite

Zaimah, N. R., Rizqiyah, C., Hadi, S., Muthiah, R., & Putri, W. N. (2026). ChatGPT vs Gemini: Which Digs Deeper into Arabic Semantics?. Mantiqu Tayr: Journal of Arabic Language, 6(1), 330–347. https://doi.org/10.25217/mantiqutayr.v6i1.7146