Dialogue-based CALL systems allow a learner to practice meaningfully an L2 with an automated agent, through an oral (spoken dialogue systems) or written interface (chatbots) (Bibauw, François, & Desmet, 2015). In order to obtain a better comprehension of their effects on L2 proficiency development, we conducted a multilevel meta-analysis on all the experimental studies measuring an impact of such systems on language learning outcomes (40 publications). Effect sizes for each variable and group under observation were systematically computed ($k = 96$). By combining all studies into a multilevel linear model, we observed a significant medium effect of dialogue-based CALL on general L2 proficiency development ($d = .61$). By integrating moderator variables into our statistical model, we are able to provide insights on the relative effectiveness of certain technological and instructional characteristics (spoken vs. written, task-oriented vs. open-ended, form-focused vs. meaning-focused) on different learning outcomes (writing vs. speaking vs. comprehension skills, complexity, accuracy and fluency measures…) and different samples of populations (L2 proficiency, age, context…), as well as to model the effect of treatment duration (number of sessions and time on task) and spacing on these outcomes, to better inform future system and research design.
Dialogue-based CALL systems involve
See Bibauw, François & Desmet, 2015 and 2019 for a full discussion.
Formulas for effect size calculation, on a single “raw” metric (aligned to between-groups effects) across experimental designs, from Morris & DeShon (2002):
$$d_{ \text{PP} } = J ( df_{\text{PP}} ) \left( \frac { M_{ \text{post,E} } - M_{ \text{pre,E} } } { \mathit{SD}_{ \text{pre,E}} } \right) $$
$$d_{\text{ECPP}} = J (df_{\text{ECPP}}) \left(\frac{M_{\text{post,E}}-M_{\text{pre,E}}}{\mathit{SD}_{\text{pre,E}}}-\frac{M_{\text{post,C}}-M_{\text{pre,C}}}{\mathit{SD}_{\text{pre,C}}}\right)$$
In previous Equations, we use Hedges' $J$ as a correction function for small sample size bias (the original formula rather than the commonly used approximation) in order to obtain a more accurate estimate of the effect size (Hedges & Olkin, 1985):
$$J(df)=\frac{\Gamma{\left(df/2\right)}}{\sqrt{df/2}\ \Gamma{\left[\left(df-1\right)/2\right]}}$$
where $df$ corresponds to the degrees of freedom, calculated from the subsample sizes ($n$) in each study as $df_{\text{PP}}=n_{\text{E}}-1$ and $df_{\text{ECPP}}=n_{\text{E}}-1 + n_{\text{C}}-1$.
Level | Number of clusters/items | Source of variance |
---|---|---|
Subjects | $k=96$ ($n=804$) | Random sampling variance |
Effect sizes | $k=96$ | Variation within study |
Studies | $k_{studies}=17$ | Variation between studies |
All computations were done in R, with the metafor package, using the rma.mv()
function (Viechtbauer, 2010):
rma.mv(di, vi, data = dataset, random = ~1|Paper/Effect)
See Van den Noortgate, López-López, Marín-Martínez & Sánchez-Meca (2013) for a discussion of multilevel modeling in meta-analyses.
Mean effect of use of dialogue-based CALL for L2 development: $d = .61$ (95% CI: $[.373, .831]$).
Significant moderators:
Many interesting exploratory results needing to be confirmed in future research.
Bibauw, S., François, T., & Desmet, P. (2015). Dialogue-based CALL: an overview of existing research. In F. Helm, L. Bradley, M. Guarda, & S. Thouësny (Eds.), Critical CALL – Proceedings of the 2015 EUROCALL Conference, Padova, Italy (pp. 57–64). Dublin: Research-publishing.net.
Bibauw, S., François, T., & Desmet, P. (2019). Discussing with a computer to practice a foreign language: from a conceptual framework to a research agenda for dialogue-based CALL. Computer Assisted Language Learning.
Mackey, A., & Goo, J. (2007). Interaction research in SLA: A meta-analysis and research synthesis. In A. Mackey (Ed.), Conversational interaction in second language acquisition: A collection of empirical studies (pp. 407–452). Oxford: Oxford University Press.
Morris, S. B., & DeShon, R. P. (2002). Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. Psychological Methods, 7(1), 105‑125. doi:10.1037//1082-989X.7.1.105
Van den Noortgate, W., López-López, J. A., Marín-Martínez, F., & Sánchez-Meca, J. (2013). Three-level meta-analysis of dependent effect sizes. Behavior Research Methods, 45(2), 576–594. doi:10.3758/s13428-012-0261-6
Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3). doi:10.18637/jss.v036.i03