Dialogue-based CALL: a multilevel meta-analysis


Dialogue-based CALL systems allow a learner to practice meaningfully an L2 with an automated agent, through an oral (spoken dialogue systems) or written interface (chatbots) (Bibauw, François, & Desmet, 2015). In order to obtain a better comprehension of their effects on L2 proficiency development, we conducted a multilevel meta-analysis on all the experimental studies measuring an impact of such systems on language learning outcomes (40 publications). Effect sizes for each variable and group under observation were systematically computed ($k = 96$). By combining all studies into a multilevel linear model, we observed a significant medium effect of dialogue-based CALL on general L2 proficiency development ($d = .61$). By integrating moderator variables into our statistical model, we are able to provide insights on the relative effectiveness of certain technological and instructional characteristics (spoken vs. written, task-oriented vs. open-ended, form-focused vs. meaning-focused) on different learning outcomes (writing vs. speaking vs. comprehension skills, complexity, accuracy and fluency measures…) and different samples of populations (L2 proficiency, age, context…), as well as to model the effect of treatment duration (number of sessions and time on task) and spacing on these outcomes, to better inform future system and research design.

23 July 2018
LEAD Summer School in L2 Acquisition
Tübingen, Germany


Download the posterDescargar el poster en español

Dialogue-based CALL

Dialogue-based CALL systems involve

  • a dialogue (i.e., sequence of conversational turns)
  • with an automated agent (chatbot, robot, voice assistant, non player character…)
  • as a language learning task (≠scaffolding).

See Bibauw, François & Desmet, 2015 and 2019 for a full discussion.


Formulas for effect size calculation, on a single “raw” metric (aligned to between-groups effects) across experimental designs, from Morris & DeShon (2002):

$$d_{ \text{PP} } = J ( df_{\text{PP}} ) \left( \frac { M_{ \text{post,E} } - M_{ \text{pre,E} } } { \mathit{SD}_{ \text{pre,E}} } \right) $$

$$d_{\text{ECPP}} = J (df_{\text{ECPP}}) \left(\frac{M_{\text{post,E}}-M_{\text{pre,E}}}{\mathit{SD}_{\text{pre,E}}}-\frac{M_{\text{post,C}}-M_{\text{pre,C}}}{\mathit{SD}_{\text{pre,C}}}\right)$$

In previous Equations, we use Hedges’ $J$ as a correction function for small sample size bias (the original formula rather than the commonly used approximation) in order to obtain a more accurate estimate of the effect size (Hedges & Olkin, 1985):

$$J(df)=\frac{\Gamma{\left(df/2\right)}}{\sqrt{df/2}\ \Gamma{\left[\left(df-1\right)/2\right]}}$$

where $df$ corresponds to the degrees of freedom, calculated from the subsample sizes ($n$) in each study as $df_{\text{PP}}=n_{\text{E}}-1$ and $df_{\text{ECPP}}=n_{\text{E}}-1 + n_{\text{C}}-1$.

Multilevel meta-analysis modeling

LevelNumber of clusters/itemsSource of variance
Subjects$k=96$ ($n=804$)Random sampling variance
Effect sizes$k=96$Variation within study
Studies$k_{studies}=17$Variation between studies

All computations were done in R, with the metafor package, using the rma.mv() function (Viechtbauer, 2010):

rma.mv(di, vi, data = dataset, random = ~1|Paper/Effect)

See Van den Noortgate, López-López, Marín-Martínez & Sánchez-Meca (2013) for a discussion of multilevel modeling in meta-analyses.


Mean effect of use of dialogue-based CALL for L2 development: $d = .61$ (95% CI: $[.373, .831]$).

Moderator analyses

Significant moderators:

  • L2 proficiency level
  • Outcome variable type (production vs. comprehension) and dimension (accuracy, complexity, fluency…)

Many interesting exploratory results needing to be confirmed in future research.


  • Bibauw, S., François, T., & Desmet, P. (2015). Dialogue-based CALL: an overview of existing research. In F. Helm, L. Bradley, M. Guarda, & S. Thouësny (Eds.), Critical CALL – Proceedings of the 2015 EUROCALL Conference, Padova, Italy (pp. 57–64). Dublin: Research-publishing.net.

  • Bibauw, S., François, T., & Desmet, P. (2019). Discussing with a computer to practice a foreign language: from a conceptual framework to a research agenda for dialogue-based CALL. Computer Assisted Language Learning.

  • Mackey, A., & Goo, J. (2007). Interaction research in SLA: A meta-analysis and research synthesis. In A. Mackey (Ed.), Conversational interaction in second language acquisition: A collection of empirical studies (pp. 407–452). Oxford: Oxford University Press.

  • Morris, S. B., & DeShon, R. P. (2002). Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. Psychological Methods, 7(1), 105‑125. doi:10.1037//1082-989X.7.1.105

  • Van den Noortgate, W., López-López, J. A., Marín-Martínez, F., & Sánchez-Meca, J. (2013). Three-level meta-analysis of dependent effect sizes. Behavior Research Methods, 45(2), 576–594. doi:10.3758/s13428-012-0261-6

  • Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3). doi:10.18637/jss.v036.i03