Conversational AI—i.e., chatbots, talking robots, conversational agents and dialogue systems—offers a promising affordance for language learning: a low-anxiety environment for meaningful interactional practice that can potentially be controlled for complexity and targeted exposure, scaffolding strategies and corrective feedback. Yet, beyond the attractive vision offered by dialogue-based CALL—the term we will use to refer to conversational AI applied to language learning—, we know relatively little about how real systems fare in practice and how different approaches compare in terms of effectiveness for learning. The present thesis will attempt to address this gap by providing insights from, first, a comprehensive research synthesis and meta-analysis, and second, a large-scale experimental evaluation of two versions of dialogue-based CALL differing in interactivity.
Our systematic review of the literature on dialogue-based CALL (chapter 2) offered a structured overview of the domain into a conceptual framework. From a corpus of 417 publications, we formalised an operational definition of dialogue-based CALL and a typology of systems built on a continuum of constraints on form and meaning. We summarised the main results from empirical studies on such systems. We discussed the impact of dialogue-based CALL on motivation and L2 development, identifying positive evidence for both outcomes.
From the same corpus of publications, we meta-analysed 17 relevant effectiveness studies (chapter 3). We analysed the $k=100$ individual effect sizes through a multilevel random-effects model. Results confirm that dialogue-based CALL practice had a significant medium effect size on L2 proficiency development ($\bar{d}= 0.58$). Our extensive moderator analyses confirmed, among other insights, the effectiveness of form-focused and goal-oriented systems, system-guided interactions, corrective feedback provision, and gamification features. Significant effects for lower proficiency learners and on vocabulary, morphosyntax, holistic proficiency, and accuracy are established.
Based on our meta-analysis’ conclusions, we helped design a dialogue-based CALL game, LanguageHero, which offered contextualised, meaning-focused and prolonged dialogic interactions with non-player characters, governed by a series of microtasks. The game implemented automated corrective feedback, scaffolding via on-demand glossing, output-supporting hints and model answers, and light gamification. We analyse its dialogue management design, its constraints on meaning, and the type of conversational tasks and microtasks it features, as well as its limitations (chapter 4).
We conducted a multisite controlled experiment with $N = 215$ Dutch-speaking teenage learners of French. They were cluster-randomly assigned to three conditions: a Dialogue System condition, using LanguageHero interactively and dynamically, a Dialogue Completion condition, performing the same dialogues in the same interface, but where the interlocutor-side of the conversation was already displayed and static, and a “business-as-usual” control group. We studied the effect of three sessions of interactions with each system on the learners’ perceptions, engagement and vocabulary development.
In particular, we wanted to measure the effect of interactivity in dialogue-based CALL, operationalised in our proposed bidimensional model of user control and bidirectionality of communication (chapter 5). We observed no differences in perceptions across the two types of systems, except for the pilot versions, but evident differences in behavioural and cognitive task engagement. The absence of difference in perceptions could be explained by the explicit constraints originating in the microtask prompts, which reduced the emergent nature and interactivity of the dialogue. On the other hand, the engagement results pointed towards faster, fluency-focused interactions with the dialogue system, contrasting with a dialogue completion behaviour more focused on form.
We observe that dialogue-based CALL is an ecologically valid and effective environment for incidental vocabulary learning (chapter 6). We show that the effect of the frequency of occurrence in input is complemented by the frequency of use in output as a stronger predictor of vocabulary learning. The interactivity of the dialogue system increases the quantity of output and the number of target word uses, which slightly improves the incidental learning of productive word knowledge.
Our findings suggest a new understanding of interactivity in dialogue-based CALL, not so much as a motivational quality of open-endedness and user control, but rather as the intensity of the negotiation of form and meaning, of which interactional or corrective feedback and scaffolding are key components (chapter 7). This interactivity increased engagement, production, and, hence, incidental learning. This study also demonstrates the potential of dialogue systems to offer designable meaningful interactions, particularly valuable for autonomous language learning and interactional research.