Acceptability and Effectiveness Analysis of L Artificial Intelligence

This research stems from the broad use of AI based on Large Language Models (LLMs), which many academics find relevant and effective these views.This research is a mixed reseach that methodologies. The qualitative segment involves observations and literature reviews. reviewing how participants used chatbots and carefully checking the accuracy and consistency of platform responses. The quantitative facet utilizes a paired experimental design, encompassing both classical and Bayesian Paired Sample t-Tests analysis. The research encompasses 45 individuals with a proficient understanding of Modern Standard Arabic and no hindrances in comprehending the material. These individuals are enrolled as students at Islamic College (STAI) Al show increased motivation and ease of use with the chatbot in Arabic language learning. However, concerns about the consistency of chatbot content have arisen, affecting participants' confidence in response accuracy of AI. This prompts an evaluation of effectiveness through classical and Bayesian tests, which fail to demonstrate statistically significant variances, even in the adaptive Bayesian probability analysis. These outcomes deviate from previous research on relevance and effectiveness a on academic apprehensions and accuracy enhancements. The researchers advocate for further investigations, especially concerning the accuracy analysis of AI chatbots in Arabic

Accepted: 05-12-2023   This research stems from the broad use of AI based on Large Language Models (LLMs), which many in higher education Arabic language learning.The goal is to confirm of qualitative and quantitative methodologies.The qualitative segment involves observations and literature reviews.Observations involved reviewing how participants used chatbots and carefully checking the accuracy and consistency of platform .The quantitative facet utilizes a paired experimental design, encompassing both classical and ts analysis.The research encompasses 45 individuals with a proficient understanding of Modern Standard Arabic and no hindrances in comprehending the material.These Anwar Rembang, Indonesia.The results show increased motivation and ease of use with the chatbot in Arabic language learning.However, concerns about the consistency of chatbot content have arisen, affecting participants' confidence in response accuracy luation of effectiveness through classical and Bayesian tests, which fail to demonstrate statistically significant variances, even in the adaptive Bayesian probability analysis.These nd corroborate preceding studies on academic apprehensions and accuracy enhancements.

Introduction
The utilization of Artificial Intelligence become a new trend in virtual communication research topic and is still under development in terms of its positive or negative aspects emergence of Large Language Models (LLMs) such as Bing chatbot and ChatGPT has opened up new opportunities for creating more adaptive and interactive learning experiences.numerous studies on its effectiveness and progress in variou that encourage new considerations and caution in the use of AI, for example in the fields of medicine, religion, and linguistics. 3his research arises as a response to the growing use of Artificial Intelligence (AI) increasing recognition of its utility as an aid or resource in various academic fields.The relevance and effectiveness of AI have drawn the attention of many scholars and experts, who see it as a significant influence in the educational landscape.validate these views in the context of Arabic language learning in higher education.It's about understanding AI's position and relevance in Arabic language learning and to what extent it influences learners.As AI continues to advance, especially in the field of language learning, this research becomes a necessary step.
However, the context of the Arabic language differs from other educational landscapes.With all its intricacies and rich linguistic heritage the relevance of AI chatbots in Arabic language studies.The complexities and variations within the Arabic language make it a challenging yet highly valuable subject to study.Understanding how AI several years and has has become an interesting research topic and is still under development in terms of its positive or negative aspects. 1 The emergence of Large Language Models (LLMs) such as Bing chatbot and ChatGPT has opened up new opportunities for creating more adaptive and interactive learning experiences. 2In addition to the s aspects, there are also many studies that encourage new considerations and caution in the use of AI, for example in the fields of This research arises as a response to the growing use of Artificial Intelligence (AI) and the increasing recognition of its utility as an aid or resource in various academic fields.The relevance and effectiveness of AI have drawn the attention of many scholars and experts, who see it as a The main motivation behind this research is to validate these views in the context of Arabic language learning in higher education.It's about understanding AI's position and relevance in Arabic language learning and to what extent it As AI continues to advance, especially in the field of language learning, this However, the context of the Arabic language differs from other educational landscapes.With , it serves as the primary backdrop for this exploration of the relevance of AI chatbots in Arabic language studies.The complexities and variations within the Arabic language make it a challenging yet highly valuable subject to study.Understanding how AI can be utilized to facilitate Arabic language learning is not just an academic pursuit but also a practical need for educators and learners. 4n essence, this research aims to bridge the gap between the theoretical potential of AI in language education and its practical application in the unique context of Arabic language learning.It seeks to provide insights into AI's position and relevance in the field of Arabic language learning in higher institutions, shedding light on the extent to which it impacts learners.As AI continues to evolve, this research serves as a crucial step in understanding its role in Arabic language education in higher institutionsjust in line withRitonga et al., in implementational systematization. 5he main goal of this research is to assess the level of acceptability of AI chatbots that use Large Language Models (LLMs) among Arabic language learners as a second language (L2) in Indonesia, within the context of Arabic language learning.This study also aims to identify the factors that influence the acceptance of AI chatbots based on LLMs as a learning tool for non-native speakers of Arabic and measure its effectiveness.These factors include motivation-enthusiasm 6 , learners' beliefs 7 , and the validity of using AI in education, making them instruments of measurement and analysis themes.
Furthermore, this research intends to conduct a comparative analysis of its findings with recent research in the field, which has shown its effectiveness 8 , although there are also concerns 9 , especially in the medical, psychological and educational landscapes.Thus, this research aims to determine whether its findings align with or contradict recent research, contributing to the current discussion about AI chatbots and their role in Arabic language learning.This comparative analysis serves as a means to highlight any updates or novelty in the findings and to establish the research's position within the current academic landscape.
Artificial Intelligence has brought many new opportunities to the way society functions and in the dynamics of education. 10The extensive use of AI has separated humanity into two groups: the progressive ones who are enthusiastic about using and developing AI in their respective fields, and on the other side, the concerned ones who request careful consideration before further entanglement. 11In the realm of academic research, there has been a conflict between these two camps.Many researchers have even advocated for institutional or ethical policies and further scrutinized its negative aspects.The Arabic language, which is directly linked to social, cultural, and religious studies, is highly sensitive to this issue.Its application in the learning of the Arabic language and culture is also highlighted. 12ositively, the research conducted by Abdulkader and Al-Irhayimin 13 emphasize that Arabic, with its unique characteristics and various variations, including Classical Arabic (CAL), Modern Standard Arabic (MSA), and Arabic dialects (DA), used practically differently in various contexts, can be integrated into various chatbots as intelligent technology using Artificial Intelligence (AI) to communicate with humans (in this case, language learners) in their natural language.The primary function of chatbots is to understand user requests and provide the most appropriate responses using Natural Language Processing (NLP) techniques. 14These findings reflect the importance of expanding the scope of technology and measuring its effectiveness for pedagogical and language education purposes.
Additionally, Fuad and Yahya's 15 paper discussing the introduction of AI in the context of conversational Arabic language learning illustrates the significant potential of using AI-based chatbots for language learning but also acknowledges the need to understand how learners accept this technology for learning Arabic.In line with that, the research conducted by Rumaisha titled "Application of Natural Language Processing (NLP) in Education," and the research by Chiu et al.  on "Teacher Support and Student Motivation to Learn with an Artificial Intelligence (AI) Based Chatbot" 16 highlight the effectiveness and relevance of AI chatbots in the field of education.Furthermore, Shao et al.,17 emphasize that teaching Arabic as an L2 in an online or technology-based environment will have a significant impact on speaking skills in Arabic and has become a new phenomenon.However, the above-mentioned research does not delve further into examining the experts' acceptance regarding language norms and contextual precision that supported by empirical studies.This aspect of acceptance will undoubtedly correlate with the study's outcomes.
Therefore, an analysis of the acceptance of LLMs-based chatbots in Arabic language learning will provide valuable insights into how this technology can be effectively integrated, especially in Arabic language learning.With various advantages, there are no longer any limitations in developing AI-based learning, even in challenging situations like the COVID-19 pandemic, as well as Arabic Language in open and distributed learning context. 18n the other hand, De Angelis et al.in the research tittle "ChatGPT and the Rise of Large Language Models: the New AI-Driven Infodemic Threat in Public Pealth" and also Perkins 19 in "Academic integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond" emphasize the importance of rapid policy development to address potential threats and ethical issues.One of the main challenges is the difficulty in accurately detecting text generated by artificial intelligence.LLMs can quickly generate a large amount of text, which can be used to spread misinformation or misleading information on an unprecedented scale.
Previously, in the medical field, both Hatherley 20 and Gilbert et al. 21indicate thatgenerative AI has been questioned for its accuracy and reliability.However, technology based on LLMs is primarily a language communication development model, which is not predominantly designed to consider scientific precision and the empirical spirit of science.So, how can its use be reliable in constructing educational frameworks?Further research is needed in various measurement variables, including acceptance and effectiveness aspects, as in this study.There is a need for in-depth analysis of its usage, especially in determining its relevance and linguistic validity.Therefore, further studies and research are needed to develop tools and formulate policies to address these gaps.
The authors attempt to explore the acceptability factors and analyze them in the context of Arabic language learning, where there are strict constructions regarding proper Arabic language techniques and practices, as reviewed by several previous researchers, whether curriculum-based or using various approaches.There are several acceptability parameters to be analyzed, including Usability, Effectiveness, and Validity parameters.The Usability parameter measures how easy users find it to use AI chatbots.This includes reviews measuring the ease of interaction with leading AI chatbots, especially those with interfaces and features in Indonesian and Arabic for communication.Further analysis will assess whether the chatbot interface is intuitive and how quickly learners can master its use.The Effectiveness parameter is used to measure whether the use of AI chatbots based on LLMs actually improves learners' Arabic language proficiency.The Authors measures their progress before and after using chatbots to see how effective they are in supporting learning.The Validity evaluation parameter 22 can be used to analyze the standardization of these chatbots in linguistic terms, such as their recognition of learners' language style and adaptation, and whether there is a risk of diglossic spelling and grammar violations. 23By formulating these parameters, researchers will have a strong framework for analyzing the acceptability of AI chatbots based on LLMs in Arabic language learning.
Lastly, the research findings will provide valuable input into the scientific knowledge related to the capabilities of AI, its usage, accuracy, level of acceptance, and effectiveness in the field of education, especially its applicability in Arabic language learning.Of course, time limitations and research subjectivity are also potential gaps.However, the authors hope for new research in the scope of learning with this technology to further expand the horizons of knowledge in this field.

Method
The research employs a two-fold methodology, mixed both qualitative and quantitative approaches, to gauge the acceptability of AI chatbots based on LLMs in the context of Arabic language learning as an L2.The qualitative approach involves assessing the participants' initial performance by observing their average grades before commencing the Arabic language course.This data will be leveraged to identify patterns and trends in how participants interact with AI chatbots.Additionally, a brief questionnaire will be administered to delve into the motivations driving individuals to learn Arabic with the assistance of AI chatbots.
On the other hand, the quantitative approach adopts an experimental design, utilizing a paired sample t-test to measure the initial effectiveness after distribution testing.Quantitative data will be collected through pre-tests and post-tests containing relevant Arabic language tasks. 24The participants in the experiment will consist of 45 students, encompassing both males and females, from STAI Al-Anwar Sarang Rembang.These students are enrolled in Arabic language classes (IQT) and possess a sufficient level of proficiency in comprehending Modern Standard Arabic (MSA) with an average background of traditional pesantren students who have been studying Standard Arabic for a long time.
The research design revolves around observing the acceptance levels in Arabic language learning through AI chatbots.Following this, an experimental phase is initiated employing paired sample t-tests, aimed at evaluating the initial effectiveness post-distribution testing.The data derived from these tests will undergo analysis, which can be executed using either the conventional paired sample t-test or Bayesian inference models.This analysis aims to compare the disparities between pre-test and post-test scores in Arabic language tasks, complete with their respective probability values.This comprehensive evaluation facilitates the measurement of the AI chatbots' effectiveness in enhancing the Arabic language skills of the participants.For this analysis, JASP 0.18.0.0 software is employed due to its proficiency in elucidating Bayesian factors and their associated posterior distribution. 25

Moving Beyond the Acceptance Observation
The process of collecting qualitative data began by gathering academic records from previous years found on the official website of STAI Al-Anwar Sarang Rembang.Out of the 45 students enrolled in the Department of Quranic Studies (IQT), there was significant variation in their academic performance.The researcher also conducted an initial assessment of their competence in various aspects of the Arabic language, including conversation, grammar, vocabulary, and presentation skills.Among the 45 students, 27 of them scored above average and exceeded the threshold set by the researcher beforehand (5.50).Their average score was 6.81.There were 12 students within the score range close to the threshold (between 5.51 to 6.80), 1 student had a score equal to the threshold, while 5 students scored below the threshold.Therefore, the assessment of these skills qualified them for participation in the research.
Five questions were posed to all participants during the observation phase and questionnaire filling, resulting in enthusiastic responses.Only minor notes were added as additional information, with an emphasis on guidance provided by the researcher and instructors to optimize AI chatbot usage using Bing chat and ChatGPT.The observations and questionnaires provided valuable insights into the participants' initial conditions and their readiness to interact with AI chatbots in Arabic language learning.Academic records and competence assessments played a crucial role in identifying variations in the language abilities of the students.The questionnaires administered to all participants yielded positive and enthusiastic responses.Participants showed a desire and eagerness to interact with AI chatbots in their language learning journey.The minor notes added to the responses emphasized the valuable guidance provided by the researcher and instructors, with a focus on using Bing chat and ChatGPT to maximize AI chatbot capabilities.In the initial stage of this research, the researcher successfully measured the participants' initial language abilities, paving the way for the next phase of the research, which will involve a quantitative assessment of the effectiveness of AI chatbots in improving their Arabic language skills.This information is crucial for designing and implementing effective language learning experiences with AI chatbots as valuable learning tools.
The researcher conducted a brief training session on the use of AI chatbots in Arabic language learning, aiming to provide students with a strong understanding of how to integrate AI chatbot technology into the Arabic language learning process, including its intricacies.This training will include various steps that will enable participants to effectively utilize AI chatbots in enhancing their Arabic language skills.The duration of this brief training session is one hour.The duration may expand during the utilization and assessment sessions, allowing participants to gain a better understanding of the concept of AI chatbots in Arabic language learning.This training incorporates various learning approaches and categories, including presentations, direct demonstrations, practice sessions, and discussions in conversation, writing, and interpretation panels.Each participant will have access to their own personal AI chatbot account for direct practice.The brief training session is expected to equip participants with the necessary skills and understanding to maximize the use of AI chatbots in their Arabic language learning.The integration of AI chatbots into Arabic language learning sessions will be conducted over two subsequent meetings, including practical exercises.Afterward, the researcher will administer utilization tasks or exams consisting of three assessment categories: conversational fluency, writing skills, and interpretation.
In the integration phase of Arabic language learning, the first step will begin with a session of easy conversation with Arabic Standard (MSA) and Classical chatbots, freely discussing anything related to Mubtada' (the subject or the noun that begins a sentence) and Khobar (the predicate or the part of the sentence that provides information about the subject) materials and scrutinizing every response provided by ChatGPT and Bing Chat.All of this will be conducted in Arabic.Out of the 45 students, they will be organized into 5 groups, each containing 8 and more participants, with 5 question models focusing on the same material, which will be further developed based on each chatbot's responses.The responses provided by the chatbot vary.The first group's theme is the definition of Mubtada' and Khobar, and each participant follows the chatbot's responses while the researcher allows them to discuss with their group members.The answers accompanying each participant are different.The second group focuses on the types of Mubtada' and Khobar along with examples.All participants receive nearly the same answers, albeit in different phrasing, and then collectively discuss them.The third group creates questions related to examples of the development of Mubtada-Khobar from ism dhomir and ism dhohir, and all participants receive different answers.The fourth group asks the chatbot about Mubtada' and Khobar, focusing on Mufrod and Ghoyru Mufrod.In the development of their responses, all participants receive identical answers but with varying phrasing.The last group, with the highest average score, focuses on fact-based questions derived from Mubtada' and Khobar, regardless of type, and takes examples of ghoryb or unique instances rarely found but sometimes encountered in classical literature.Almost all participants receive different answers.Finally, the researcher provides a key question for the interactive comprehension level, material absorption, and acceptability (in the form of criticism and feedback), which will then be assessed numerically.
In the subsequent integrative meetings, a more complex scheme is introduced with a final assessment for the same categories.The same groups are assigned different themes.The first group discusses An-na'tal-haqîqy (the real or essential predicate) and An-na't As-sababy (the attributive predicate) and follows the chatbot's responses.The second group focuses on At-tawkid.The third group deals with Al-Hal and the dynamics of their responses.The fourth group explores At-tamyiz with its models and rules.The fifth group delves into differentiating factors of Adawātul-Istifhām and anything that enhances their understanding in the chatbot.The researcher also evaluates them in the same categories as before.
The third session involves translating a paraphrase from Arabic using the chatbot and then reading it aloud.This phase assesses participants' understanding and critical thinking.Differences in translation between the chatbot and other translation platforms, both literally and contextually, are examined.At this stage, participants encounter difficulties due to variations in the chatbot's responses.Group 1, which tends to provide more literal translations, may have difficulties understanding and recognizing figurative language (majāz) in Arabic texts.The discussion of majāz rules provides a better understanding of how majāz can be used in Arabic to convey deeper meanings or for rhetorical effects.Group 2, which tends to provide more contextual translations, is more open to discussing majāz.They have started to recognize the presence of majāz in Arabic texts and try to understand them.This discussion helps them better understand how majāz can be used to express complex and profound meanings in Arabic.Groups 3 and 4, with variations in their translation approaches, also benefit from the discussion of majāz rules.They begin to understand that the use of majāz can make Arabic texts more beautiful and expressive.Group 5, which also strives to reflect nuances in their translations, finds that the discussion of majāz rules helps them appreciate the beauty of the Arabic language and its ability to convey meaning creatively.They begin to recognize various majāz and how they can be used to express deep and complex meanings.The discussion of majāz rules provides deeper insights to all participant groups about how Arabic is used rhetorically and artistically in classical literature.This helps them better understand nuances in Arabic texts and recognize the use of majāz in the language.With a better understanding of majāz rules, participants can become more skilled readers and translators in Arabic.The researcher also awards extra points because participants have a non-material mastery of linguistic understanding in a broader context.
Based on the observations and monitoring of using a chatbot in Arabic language learning, we have uncovered several crucial findings regarding usability, user comfort, and the validity of utilizing this chatbot in the context of Arabic language learning.In this discussion, we will delve deeper into these discoveries.
First and foremost, concerning the chatbot's usability in Arabic language learning, our findings from observations and monitoring indicate that the chatbot can serve as a relatively user-friendly tool for learners.This is attributed to the chatbot's intuitive interface and its well-designed, userfriendly features.Learners, especially those who are tech-savvy, tend to feel at ease while interacting with the chatbot to enhance their Arabic language skills.However, it's important to note a few critical considerations.Firstly, the chatbot's usability may vary depending on the learners' initial proficiency levels.Learners with a basic understanding of Arabic may find it more accessible to adapt to the chatbot compared to complete beginners.Therefore, there should be adjustments in the instructional design to cater to learners with varying proficiency levels.Additionally, continuous improvement and development of the chatbot are vital to make it more responsive to users' requirements.In the realm of language learning, the chatbot should have the ability to detect learners' errors, offer constructive feedback, and customize learning materials based on individual progress.These measures will enhance usability and effectiveness.
Regarding comfortability in using the chatbot as a tool for Arabic language learning, most participants feel comfortable interacting with the chatbot during the learning process.They perceive the chatbot as a helpful learning partner and not intimidating.However, there are some aspects of comfort that need attention.Some participants may experience slight anxiety or discomfort when interacting with technology, especially if they are not tech-savvy users.Therefore, additional support in the form of user guides or resources that can help participants feel more comfortable is needed.Additionally, the chatbot should create an inclusive and friendly learning environment.It should not make participants feel pressured or afraid to make mistakes.Instead, the chatbot should encourage language experimentation and provide constructive feedback.
Lastly, in terms of the validity of using the chatbot in Arabic language learning.Validity is a measure of how well the chatbot can assess what should be assessed, which is the participants' Arabic language proficiency.Findings indicate that the chatbot has the potential to assess participants' Arabic language proficiency effectively.However, the validity of the chatbot can be influenced by several factors.First, it is important to ensure that the chatbot assesses various aspects of Arabic language, including comprehension, speaking, reading, and writing.Additionally, the chatbot should have evaluation items that comprehensively measure language proficiency in many terms.Validity can also be enhanced by ensuring that the chatbot provides accurate and informative feedback to participants.This feedback should help participants understand their errors and provide guidance for improvement.Interestingly, different responses emerged from each task presented to the participants.These varying responses also corresponded to the columns for question corrections (1, 2, etc.).This raised doubts about the accuracy of the chatbot regarding the participants' mastery of the subject matter and somewhat diminished their motivation.Some consider it just for fun, while others are afraid to develop it further.

Effectiveness Hindered by Concerns
The results of the assessment for each participant within the groups and plots follows: Table 1.
Based on table 1 it is found that the values before and after an experiment or intervention involving 45 participants.Before the intervention, the average score was 6.457 with a stand intervention, the average score increased to 6.562 with a standard deviation of 1.000.This description table indicates that the intervention or experiment may have had a positive impact, as the participants' average scores increased.The lower standard deviation in the scores after the intervention suggests that the data tends to be more concentrated around the mean, which can be interpreted as an increase in consistency in the results.indicating the precision of the results.The lower SE value in the scores after the intervention (0.149) suggests that the estimated average derived from the sample is closer to the true value.Coefficient of Variation (CoV) is a comparison of scores after the intervention (0.152) indicates that the relative variation in the data has decreased, meaning that the participants may have become more consistent in achieving results.Overal table illustrates that the intervention tends to improve participants' outcomes with less variation in results and better precision in estimating the mean.Based on figure 1 it is found that the and the descriptive chart are merely data and not statistical references as it follows.illustration of the significant gap between the mean pre illustrates perfectly the table 1.
Meanwhile, the results of the assumption test (Shapiro minimum requirement of assumptions (p > 0,05) as follows: Nely Rahmawati Zaimah, Eko Budi Hartanto, Fatchiatu Zahro : Acceptability and Effectiveness Analysis Mantiqu Tayr: Journal of Arabic Language, Vol. 4, No. 1, Januari 2024 of the subject matter and somewhat diminished their motivation.Some consider it just for fun, while others are afraid to develop it further.

Effectiveness Hindered by Concerns
The results of the assessment for each participant within the groups can be described as table it is found that the provides information about the comparison of average values before and after an experiment or intervention involving 45 participants.Before the intervention, the average score was 6.457 with a standard deviation of 1.164, while after the intervention, the average score increased to 6.562 with a standard deviation of 1.000.This indicates that the intervention or experiment may have had a positive impact, as scores increased.The lower standard deviation in the scores after the intervention suggests that the data tends to be more concentrated around the mean, which can be interpreted as an increase in consistency in the results.There is a change in the Standar indicating the precision of the results.The lower SE value in the scores after the intervention (0.149) suggests that the estimated average derived from the sample is closer to the true value.Coefficient of Variation (CoV) is a comparison of the standard deviation to the mean.The lower CoV value in the scores after the intervention (0.152) indicates that the relative variation in the data has decreased, meaning that the participants may have become more consistent in achieving results.Overal table illustrates that the intervention tends to improve participants' outcomes with less variation in results and better precision in estimating the mean.

Figure 1. Descriptives of Mean-Measuring Plots
it is found that the graphically shows an improvement, but both the table and the descriptive chart are merely data and not statistical references as it follows.illustration of the significant gap between the mean pre-test value and the mean pos Meanwhile, the results of the assumption test (Shapiro-Wilk) before the tminimum requirement of assumptions (p > 0,05) as follows: and Effectiveness Analysis... E-ISSN: 2774-6372 of the subject matter and somewhat diminished their motivation.Some consider it just for fun, can be described as table Coefficient of variation 0.180 0.152 provides information about the comparison of average values before and after an experiment or intervention involving 45 participants.Before the ard deviation of 1.164, while after the intervention, the average score increased to 6.562 with a standard deviation of 1.000.This indicates that the intervention or experiment may have had a positive impact, as scores increased.The lower standard deviation in the scores after the intervention suggests that the data tends to be more concentrated around the mean, which can be here is a change in the Standard Error (SE), indicating the precision of the results.The lower SE value in the scores after the intervention (0.149) suggests that the estimated average derived from the sample is closer to the true value.Coefficient of the standard deviation to the mean.The lower CoV value in the scores after the intervention (0.152) indicates that the relative variation in the data has decreased, meaning that the participants may have become more consistent in achieving results.Overall, this table illustrates that the intervention tends to improve participants' outcomes with less variation in graphically shows an improvement, but both the table and the descriptive chart are merely data and not statistical references as it follows.This is an test value and the mean post-test value.It -Test, needs to meet

Tab
Rata-rata Sbl Note.Significant results suggest a deviation from normality Based on table 2 it is found that the normally distributed (p > 0.05) and fulfills parametric test requirements test table indicates that the data is normally distributed.This is evidenced by the p greater than 0.05, which suggests that the distribution of the data does not sign a normal distribution.Therefore, the data fulfills the requirements for a parametric test.This is important as it validates the use of statistical techniques that assume a normal distribution, and it ensures the reliability and validity of the subsequent statistical analysis.
Based on table 3 it is found that the difference between the "Average Before" and "Average Now" measurements.The t measures the difference between the two averages, is value, at 0.252, suggests that the observed difference is not statistically significant as it exceeds the typical significance threshold (e.g., 0.05).The m 0.105, indicating a slight decrease in the results in the "Average Now" measurement compared to the "Average Before."The Cohen's d value of standard deviation units, with a negative value indicating that the "Now" measurement is lower than the "Before" measurement.
As described in Table 3 and visually confirmed by the bar plot in Figure 2, the data findings are consistent.The bar plot provides a visual representation of the data, with the horizontal axis illustrating what is described in the  it is found that the Shapiro-Wilk normality test shows that the data is 0.05) and fulfills parametric test requirements.The Shapiro test table indicates that the data is normally distributed.This is evidenced by the p greater than 0.05, which suggests that the distribution of the data does not sign a normal distribution.Therefore, the data fulfills the requirements for a parametric test.This is important as it validates the use of statistical techniques that assume a normal distribution, and it idity of the subsequent statistical analysis.Then, here is the result of classic paired sample T-Test; it is found that the statistical results indicate that there is no significant difference between the "Average Before" and "Average Now" measurements.The t measures the difference between the two averages, is -1.161 with 44 degrees of freedom.The p value, at 0.252, suggests that the observed difference is not statistically significant as it exceeds the typical significance threshold (e.g., 0.05).The mean difference between the two measurements is 0.105, indicating a slight decrease in the results in the "Average Now" measurement compared to the "Average Before."The Cohen's d value of -0.173 shows the magnitude of this difference in units, with a negative value indicating that the "Now" measurement is lower than shows that the data is The Shapiro-Wilk normality test table indicates that the data is normally distributed.This is evidenced by the p-value being greater than 0.05, which suggests that the distribution of the data does not significantly deviate from a normal distribution.Therefore, the data fulfills the requirements for a parametric test.This is important as it validates the use of statistical techniques that assume a normal distribution, and it indicate that there is no significant difference between the "Average Before" and "Average Now" measurements.The t-statistic, which 1.161 with 44 degrees of freedom.The pvalue, at 0.252, suggests that the observed difference is not statistically significant as it exceeds the ean difference between the two measurements is -0.105, indicating a slight decrease in the results in the "Average Now" measurement compared to 0.173 shows the magnitude of this difference in units, with a negative value indicating that the "Now" measurement is lower than As described in Table 3 and visually confirmed by the bar plot in Figure 2, the data findings are consistent.The bar plot provides a visual representation of the data, with the horizontal axis of the 95% confidence interval with a high level of confidence further supports these findings.This visual aid complements the tabular data, making it easier to understand the distribution and relationships within the data.
The result of Bayesian Paired T Based on table 4 it is found that Bayes Factor (BF₁₀) between Measure 1 (Mean Before) and Measure 2 (Mean) is 0.303 with an error rate of approximately 0.045%.This indicates a moderate level of evidence for the difference between the two measures, favoring the alternative hypothesis.In other words, this result suggests that the Bayesian method is more adaptive in measuring differences compared to the classical statistical method.
Furthermore, the results of the inferential c probability analysis are visualized in the diagram as follows: Based on figure 3 it is found that the from the Bayesian Paired Sample T that the Bayes Factor (BF₁₀) between Measure 1 (Pre value is a measure of the strength of ev closer to 0 indicating stronger evidence for the null hypothesis.The error rate associated with this calculation is 0.045% (-0.447, 0.122), which represents the range within which the true value o Bayes Factor is likely to fall with 95% confidence.This visual aid complements the tabular data, making it easier to understand the statistical relationships within the data.
In Bayesian factor analysis, there is also the concept of Bayes Factor Rob the stability of findings, sensitivity of results, and potential outliers.Based on the data, it can be visualized as follows: Nely Rahmawati Zaimah, Eko Budi Hartanto, Fatchiatu Zahro : Acceptability and Effectiveness Analysis Mantiqu Tayr: Journal of Arabic Language, Vol. 4, No. 1, Januari 2024 high level of confidence further supports these findings.This visual aid complements the tabular data, making it easier to understand the distribution and relationships within the data.
he result of Bayesian Paired T-Test as follows; it is found that the result of the Bayesian Paired Sample T ) between Measure 1 (Mean Before) and Measure 2 (Mean) is 0.303 with an error rate of approximately 0.045%.This indicates a moderate level of evidence for the difference ween the two measures, favoring the alternative hypothesis.In other words, this result suggests that the Bayesian method is more adaptive in measuring differences compared to the classical Furthermore, the results of the inferential calculations and posterior distribution for probability analysis are visualized in the diagram as follows:

Inferential Plotsof Mean-Measuring Test
it is found that the provides a visual representation of the from the Bayesian Paired Sample T-Test, as detailed in the corresponding table.The graph indicates ) between Measure 1 (Pre-Test) and Measure 2 (Post value is a measure of the strength of evidence in favor of the alternative hypothesis, with values closer to 0 indicating stronger evidence for the null hypothesis.The error rate associated with this 0.447, 0.122), which represents the range within which the true value o Bayes Factor is likely to fall with 95% confidence.This visual aid complements the tabular data, making it easier to understand the statistical relationships within the data.
In Bayesian factor analysis, there is also the concept of Bayes Factor Robustness Check to test the stability of findings, sensitivity of results, and potential outliers.Based on the data, it can be and Effectiveness Analysis... E-ISSN: 2774-6372 high level of confidence further supports these findings.This visual aid complements the tabular data, making it easier to understand the distribution and relationships within the data.the result of the Bayesian Paired Sample T-Test indicates that ) between Measure 1 (Mean Before) and Measure 2 (Mean) is 0.303 with an error rate of approximately 0.045%.This indicates a moderate level of evidence for the difference ween the two measures, favoring the alternative hypothesis.In other words, this result suggests that the Bayesian method is more adaptive in measuring differences compared to the classical alculations and posterior distribution for provides a visual representation of the results obtained Test, as detailed in the corresponding table.The graph indicates Test) and Measure 2 (Post-Test) is 0.303.This idence in favor of the alternative hypothesis, with values closer to 0 indicating stronger evidence for the null hypothesis.The error rate associated with this 0.447, 0.122), which represents the range within which the true value of the Bayes Factor is likely to fall with 95% confidence.This visual aid complements the tabular data, ustness Check to test the stability of findings, sensitivity of results, and potential outliers.Based on the data, it can be Based on figure 4 it is found that the hypothesis across different prior widths.The Bayes Factor does not decrease significantly far from 1 for all prior widths, but remains in the moderate range.This suggests that the evidence for the alternative hypothesis is not strong and remains moderate regardless of the prior width used.The maximum Bayes Factor (BF10) observed is 0.9994 at a prior width of r=5e prior, the Bayes Factor is 0.3032, indicating moderate evidence for the the wide prior, the Bayes Factor is slightly lower at 0.2235, and for the ultra lowest at 0.1618.These values suggest that the strength of evidence for the alternative hypothesis decreases as the prior width increases.
As a significance control and for further decision analysis used to test hypotheses by analyzing data sequentially.Here is sequential analysis that presented by the researchers: it is found that the presents of Bayes Factor effect size for the alternative hypothesis across different prior widths.The Bayes Factor does not decrease significantly far from 1 for all prior widths, but remains in the moderate range.This suggests that the evidence for the hypothesis is not strong and remains moderate regardless of the prior width used.The maximum Bayes Factor (BF10) observed is 0.9994 at a prior width of r=5e-04.For the user prior, the Bayes Factor is 0.3032, indicating moderate evidence for the alternative hypothesis.For the wide prior, the Bayes Factor is slightly lower at 0.2235, and for the ultra lowest at 0.1618.These values suggest that the strength of evidence for the alternative hypothesis width increases.As a significance control and for further decision-making reference, Bayesian sequential analysis used to test hypotheses by analyzing data sequentially.Here is sequential analysis that Bayes Factor effect size for the alternative hypothesis across different prior widths.The Bayes Factor does not decrease significantly far from 1 for all prior widths, but remains in the moderate range.This suggests that the evidence for the hypothesis is not strong and remains moderate regardless of the prior width used.The 04.For the user-defined alternative hypothesis.For the wide prior, the Bayes Factor is slightly lower at 0.2235, and for the ultra-wide prior, it is at its lowest at 0.1618.These values suggest that the strength of evidence for the alternative hypothesis making reference, Bayesian sequential analysis used to test hypotheses by analyzing data sequentially.Here is sequential analysis that Based on figure 5 it is found that the visualizes of hypotheses typically tested in Bayesian sequential analysis, which are the null hypothesis (H0) and the alternative hypothesis (H1).In this context, the null hypothesis (H0) posits that "Moderate" is the most likely or closest to the observed outcome based on the collected data.This means that, according to the null hypothesis, the data collected so far suggests that the "Moderate" category is the most probable.This hypothesis is tested against the alternative hypothesis (H1), which posits a different outcome.The process of Bayesian sequential analysis involves continuously updating our beliefs about these hypotheses as more data is collected, allowing us to make more accurate predictions over time.
As a result of the effectiveness test using a quantitative approach, descriptive data has been found in the descriptive table.Values such as t-statistic, degrees of freedom (df), and p-value (p) are often used to measure the effectiveness of a method or treatment.In this case, the t-value is -1.161, df is 44, and p is 0.252.The t-value measures how statistically significant the difference between two groups or conditions being compared is.In this case, the t-value is -1.161.This value indicates that the difference between the two groups or conditions being compared is not statistically significant.In other words, there is no statistically significant difference between the groups or conditions being compared.Degrees of freedom (df) refer to the amount of data used in the analysis.In this case, df is 44.The higher the df value, the more data is used in the analysis.This can increase the accuracy of the analysis and help detect smaller differences.However, in this case, df is relatively large, indicating that the analysis is based on a substantial amount of data.The p-value (p) is a measure of statistical significance.In this case, p is 0.252.A high p-value indicates that there is no statistical significance in the observed difference.In other words, the difference between the groups or conditions is not statistically significant.
In the context of effectiveness analysis, these results can be interpreted as follows: The method or treatment being tested is not statistically proven to be effective.The analysis does not support the presence of a significant difference between the groups receiving the treatment and those that do not.However, it's important to remember that statistical analysis is just one tool in assessing the effectiveness of a method or treatment.There are many other factors that can influence the results, and statistical analysis does not always reflect the actual impact of an action or method.In conclusion, the results of the analysis, with a t-value of -1.161, df of 44, and p-value of 0.252, indicate that the method or treatment being tested is not statistically proven to be effective.This analysis can serve as a starting point for further evaluation or changes in the approach used.This may be due to participants' doubts about the validity and accuracy of the chatbot used in various cases.
The results of the Bayesian paired t-test, which yielded a Bayes Factor (Bf 10 ) value of 0.303 and an error rate of around 0.045 percent, provide a deep understanding of the extent to which the observed data supports the tested hypothesis in the context of the analysis.Bayes Factor (Bf 10 ) is a measure used in Bayesian analysis to assess the strength of evidence supporting the alternative hypothesis (H 1 ) compared to the null hypothesis (H 0 ).A Bf 10 value of 0.303 indicates that the existing evidence does not strongly support the alternative hypothesis compared to the null hypothesis.In this case, the alternative hypothesis may be that there is a significant effect or difference, while the null hypothesis is that there is no significant effect or difference.With a Bf 10 value of 0.303, this suggests that the existing evidence is not strongly in favor of the alternative hypothesis.It may indicate that the observed data is not strong enough to support a significant difference or effect between the groups or conditions being compared.The error rate of around 0.045 percent is the significance level used in Bayesian analysis.This error rate measures the acceptable level of error in making statistical decisions.In this context, an error rate of around 0.045 percent indicates that this analysis has a very high level of statistical significance, meaning that the findings are considered highly statistically significant.However, despite the very low error rate, the low Bf 10 value indicates that the existing evidence still does not strongly support the alternative hypothesis.Therefore, the results of this analysis suggest that there is a possibility that the observed data may not be strong enough to support the claim of a significant effect or difference.In further interpretation, it is essential to consider the specific context of this analysis and the practical implications of these results.These findings can serve as a basis for further evaluation or advanced analysis and can help in making further decisions related to the tested hypothesis.
In the context of the analysis with a Bayes Factor (Bf 10 ) of 0.303 and an error rate of around 0.045 percent, the generated interval plot can illustrate the confidence interval for the observed parameter.This interval will encompass possible values for the parameter based on the observed data and the chosen confidence level.Interval plots typically depict two vertical lines representing the boundaries of the confidence interval.This interval can have a certain confidence level, such as 95 percent, which means that the parameter is estimated to be within that interval with a 95 percent confidence level.In other words, there is a 95 percent chance that the true parameter value falls within that interval.The mentioning of"95% CI: [-0.447, 0.112]" represents a 95 percent confidence interval for the observed parameter in the statistical analysis.This confidence interval provides an estimate of the range of possible values for the parameter with approximately 95 percent confidence.The lower bound of the confidence interval is -0.447, while the upper bound is 0.112.This means that based on the observed data and the statistical analysis conducted, we have about 95 percent confidence that the true value of the parameter lies within the range of -0.447 to 0.112.In other words, there is a 95 percent chance that the parameter has a value between -0.447 and 0.112.This confidence interval provides information about the level of uncertainty in the parameter estimate.The narrower the confidence interval, the higher the confidence in the parameter's value within the given range 26 .So, in the context of this data, we have about 95 percent confidence that the parameter value falls within the range of -0.447 to 0.112.This confidence interval provides valuable information about the extent of our confidence in the parameter estimate based on the observed data.In sequential evidence plots, founded that "evidence for H 0 : Moderate" in Bayesian sequential analysis means that the data observed or collected is more in favor of the idea that "Moderate" is the most likely outcome or closest to the observed outcome based on that data.Finally, the discussion on the acceptability and effectiveness of using AI chatbots based on Large Language Models (LLMs) in Arabic language learning as a second language (L2) revealed several important insights that can enhance our understanding of how AI chatbots can influence the Arabic language learning process and their impact on learners.
The research highlights the significance of the context and learners' experiences in accepting and integrating AI chatbots into Arabic language learning.Qualitative findings indicate that the majority of learners are enthusiastic about using AI chatbots.They see them as valuable tools for improving their understanding and proficiency in the Arabic language.However, some learners also expressed concerns about the AI chatbots' ability to understand colloquial Arabic and dialects commonly used in everyday conversations.This suggests that the context of using AI chatbots in Arabic language learning needs to be carefully considered to maximize their benefits.
These findings provide insights into the importance of learners' motivation and confidence in accepting and using AI chatbots.Learners with high motivation to learn Arabic and confidence in using technology are more likely to embrace AI chatbots as learning aids.Therefore, learners with lower motivation or less confidence in their technological abilities may require additional support to integrate AI chatbots into their learning.This research also underscores the need for validity in using AI chatbots in the context of Arabic language learning as mutual concern.The validity of using AI chatbots in teaching Arabic should be ensured to ensure that learners not only learn well but also have a good understanding of the material being taught.Therefore, the development of AI chatbots should consider accuracy in understanding Arabic language, not only in the context of daily conversations, but throughout various subject matters of Arabic learning.This aligns with the research by Hong and Fourney 27 , although not on the same platform, and contradicts the findings of several researchers 28 , for setting in different objects and categories and also in line with content accuracy concern. 29owever, it is essential to note that this research has some limitations.Firstly, this research was conducted in a specific educational institution with a group of learners with varying levels of Arabic language proficiency.The results of the research may differ when applied to groups of learners with different characteristics.Moreover, the varying answers from chatbots' room to room responses to the same question with different definitions raised concerns, particularly in the context of Arabic literature.It may not have been expected to exhibit such behavior, especially in Arabic literature.Mostly in Arabic Language fileds, the responses should be more consistent and precise.Aspect of science such as technology, information, sociology, medic, and psychology may approach high accuracy with transformer-based analyses 30 .However, this might not be the case for Arabic literature, which has a strong foundation and strict rituals in its hierarchy of validation.Secondly, this research focused on the use of AI chatbots in Arabic language learning and may not be directly applicable to other language learning contexts or in different educational settings.

Closing
In the progressive landscape of Arabic language education, learners' motivation and selfconfidence wield substantial influence over the acceptance and effective utilization of AI chatbots.Those who harbor high levels of motivation to master Arabic and possess a robust confidence in technology tend to seamlessly integrate AI chatbots into their learning journey.However, safeguarding the validity of AI chatbots in Arabic language instruction emerges as a paramount concern.These chatbots must exhibit a profound understanding of Arabic encompassing a wide array of subject matters, extending beyond everyday conversations.While participants acknowledge this critical aspect, lingering doubts persist regarding the accuracy of the learning materials.These doubts cast a shadow on the full embracement of AI utilization within this context and play a pivotal role in shaping its adoption.
Turning to the test results, both classical and Bayesian paired t-tests unveil an inconsequential difference in Arabic language proficiency among learners post-AI chatbot use, resulting in the dismissal of the efficacy hypothesis (Ha).This implies that while some degree of enhancement in Arabic language proficiency is evident, it falls short of attaining statistical significance.The meager Bayes Factor (Bf10) value, resting at 0.303, signals that the existing body of evidence fails to robustly support a substantial distinction between the pre-test and post-test groups.This suggests that the impact of AI chatbot usage may not be as pronounced as initially envisioned, potentially influenced by the characteristics of the learners, the learning environment, or the measurement methods applied in this study.Consequently, an in-depth scrutiny of these outcomes, coupled with an exploration of their underlying influencers, becomes imperative.Moreover, effective harnessing of AI chatbots in the realm of learning necessitates continuity and the simultaneous consideration of an array of assessment factors.The researcher is cognizant of the constraints imposed by time limitations and the confined research timeline, which encompassed only a few sessions.Nonetheless, AI retains its stature as a valuable component in the pursuit of Arabic language proficiency.Future research endeavors are poised to make significant contributions to scholarly discourse, with a particular focus on the accuracy of Arabic language learning materials and their adaptability across diverse educational tiers.In summary, this study underscores the importance of learners' motivation and self-assurance in embracing AI chatbots in Arabic language education.It also highlights the need for ongoing efforts to enhance the validity and accuracy of AI chatbots and their role in fostering language proficiency.

Table 3 .
Paired Samples T-Test