Automatizing L2 fluency measurement: validity and developmental sensitivity of temporal fluency metrics variations

18 August 2021 15:10 — 16:40
Gröningen, Netherlands


Speaking utterance fluency, as a dimension of L2 performance, is assumed to be correlated to L2 proficiency, and the ability to measure it objectively and precisely is key for testing and research. Many utterance fluency metrics have been proposed, compared, and validated in terms of how well they discriminate or predict proficiency levels, allow to measure short-term L2 development or correlate with perceived fluency (e.g., Segalowitz et al, 2017; Tavakoli et al, 2020). However, the precise operationalization of these fluency measurements is rarely discussed in detail and often diverges among studies (Dumont, 2018). While some issues, such as the silent pause threshold, have been studied in more detail (de Jong & Bosker, 2013), others, such as pruning, have rarely been discussed in depth.

The present study attempts to (semi-)automatize the testing and the computation of multiple variations of L2 fluency metrics, to compare how well they predict external proficiency estimates, including within a limited proficiency range, and how sensitive they are to very-short-term developmental changes.

We used a computer-delivered oral interview to record 215 young low-intermediate learners of French in a pre- and a posttest separated by 1-3 weeks and, for the experimental group, a short pedagogical intervention based on interactions in a dialogue-based computer-assisted language learning game. The resulting 12’000 audio files were transcribed by automatic speech recognition, manually corrected, and annotated for a series of “disfluencies”. We computed both signal-based (e.g., via de Jong et al 2020) and transcription-based fluency metrics, in as many variations as possible in terms of pruning (e.g., do L1-words count? proper nouns? self-talk?) and normalizations (words, syllables, silent pauses…).

We evaluate how well each metric’s variations correlate with external proficiency estimates, including a vocabulary size test, and are able to detect changes in such a short timeframe, and how reliable the fully automated metrics are.



Automated estimators vs. Manual annotation

Raw metricsMAE
Cron. α
(intern. consist.)
(pred. power)
Nb of syllables (auto count, manual transcript)“truth”.92.373
vs. Google ASR transcript (auto count)
vs. Syllable Nuclei Praat script (de Jong et al.)


Number of syllables Variant / PruningMSDCron. α$r$#Syll.-VS$r$SpeechRate-VS
Unpruned (manual transcript)
‘Meant’ pruning: –disfluencies (f.pauses, repet., self-corr., meta)
‘Meant’, L2-only pruning: –L1/lingua franca words12.
‘Meant’, L2-only, –proper nouns12.

Best predictors of L2 proficiency

Semi-auto vs. fully automated composite metrics
Fully auto*,
ASR-based count
Fully auto*,
Fully auto
signal alt.
Length of runs.628.588.479
Speech rate.609.585.461
Articulation rate.524.496.392.172
Syllable duration-1.473.283.473.106
Number of syllables.473.370.154
Number of words.463.355
Silent pausing rate-1.409.428
Duration of runs.338.352
Speech-time ratio.269.305

Developmental sensitivity


  • Bosker, H. R., Pinget, A.-F., Quené, H., Sanders, T., & de Jong, N. H. (2013). What makes speech sound fluent? The contributions of pauses, speed and repairs. Language Testing, 30(2), 159–175. DOI: 10.1177/0265532212455394
  • Cucchiarini, C., Strik, H., & Boves, L. (2002). Quantitative assessment of second language learners’ fluency: Comparisons between read and spontaneous speech. The Journal of the Acoustical Society of America, 111(6), 2862–2873. DOI: 10.1121/1.1471894
  • de Jong, N. H., & Bosker, H. R. (2013). Choosing a threshold for silent pauses to measure second language fluency. In R. Eklund (Ed.), Proceedings of the 6th Workshop on Disfluency in Spontaneous Speech (DiSS) (pp. 17–20).
  • de Jong, N. H., Pacilly, J., & Heeren, W. (2020). Praat scripts to measure fluency automatically.
  • de Jong, N. H., Steinel, M. P., Florijn, A. F., Schoonen, R., & Hulstijn, J. H. (2012). Facets of speaking proficiency. Studies in Second Language Acquisition, 34(1), 5–34. DOI: 10.1017/S0272263111000489
  • Detey, S., Fontan, L., Le Coz, M., & Jmel, S. (2020). Computer-assisted assessment of phonetic fluency in a second language: A longitudinal study of Japanese learners of French. Speech Communication, 125, 69–79. DOI: 10.1016/j.specom.2020.10.001
  • Dumont, A. (2018). Fluency and disfluency: A corpus study of non-native and native speaker (dis)fluency profiles [Unpublished doctoral dissertation]. Université catholique de Louvain.
  • Ferrari, S. (2012). A longitudinal study of complexity, accuracy and fluency variation in second language development. In A. Housen, F. Kuiken, & I. Vedder (Eds.), Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA (pp. 277–298). John Benjamins. DOI: 10.1075/lllt.32.12fer
  • Götz, S. (2013). Fluency in native and nonnative English speech. John Benjamins.
  • Hilton, H. (2014). Oral fluency and spoken proficiency: Considerations for research and testing. In P. Leclercq, A. Edmonds, & H. Hilton (Eds.), Measuring L2 proficiency: Perspectives from SLA (pp. 27–53). Multilingual Matters.
  • Koizumi, R. (2005). Predicting speaking ability from vocabulary knowledge. Japan Language Testing Association Journal, 7, 1–20.
  • Leclercq, P., & Edmonds, A. (2014). How to assess L2 proficiency? An overview of proficiency assessment research. In P. Leclercq, A. Edmonds, & H. Hilton (Eds.), Measuring L2 proficiency: Perspectives from SLA. Multilingual Matters.
  • Milton, J. (2013). Measuring the contribution of vocabulary knowledge to proficiency in the four skills. In C. Bardel, C. Lindqvist, & B. Laufer (Eds.), L2 vocabulary acquisition, knowledge and use (pp. 57–78). European Second Language Association.
  • Noreillie, A.-S. (2019). It’s all about words. Three empirical studies into the role of lexical knowledge and use in French listening and speaking tasks [Doctoral dissertation, KU Leuven].
  • Noreillie, A.-S., Kestemont, B., Heylen, K., Desmet, P., & Peters, E. (2018). Vocabulary knowledge and listening comprehension at an intermediate level in English and French as foreign languages. ITL - International Journal of Applied Linguistics, 169(1), 212–231. DOI: 10.1075/itl.00013.nor
  • Révész, A., Ekiert, M., & Torgersen, E. N. (2016). The effects of complexity, accuracy, and fluency on communicative adequacy in oral task performance. Applied Linguistics, 37(6), 828–848. DOI: 10.1093/applin/amu069
  • Saito, K., Ilkan, M., Magne, V., Tran, M. N., & Suzuki, S. (2018). Acoustic characteristics and learner profiles of low-, mid- and high-level second language fluency. Applied Psycholinguistics, 39(3), 593–617. DOI: 10.1017/S0142716417000571
  • Segalowitz, N. (2010). Cognitive bases of second language fluency. Routledge.
  • Segalowitz, N., French, L., & Guay, J.-D. (2017). What features best characterize adult second language utterance fluency and what do they reveal about fluency gains in short-term immersion? Canadian Journal of Applied Linguistics / Revue Canadienne de Linguistique Appliquée, 20(2), 90–116. DOI: 10.7202/1050813ar
  • Tavakoli, P. (2016). Fluency in monologic and dialogic task performance: Challenges in defining and measuring L2 fluency. International Review of Applied Linguistics in Language Teaching, 54(2), 133–150. DOI: 10.1515/iral-2016-9994
  • Tavakoli, P., Campbell, C., & McCormack, J. (2016). Development of speech fluency over a short period of time: Effects of pedagogic intervention. TESOL Quarterly, 50(2), 447–471. DOI: 10.1002/tesq.244
  • Tavakoli, P., Nakatsuhara, F., & Hunter, A.-M. (2020). Aspects of fluency across assessed levels of speaking proficiency. Modern Language Journal, 104(1), 169-191. DOI: 10.1111/modl.12620
  • Tonkyn, A. P. (2012). Measuring and perceiving changes in oral complexity, accuracy and fluency: Examining instructed learners’ short-term gains. In A. Housen, F. Kuiken, & I. Vedder (Eds.), Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA (pp. 221–244). John Benjamins. DOI: 10.1075/lllt.32.10ton
  • Williams, J., Segalowitz, N., & Leclair, T. (2014). Estimating second language productive vocabulary size: A Capture-Recapture approach. The Mental Lexicon, 9(1), 23–47. DOI: 10.1075/ml.9.1.02wil
  • Wright, C., & Tavakoli, P. (2016). New directions and developments in defining, analyzing and measuring L2 speech fluency. International Review of Applied Linguistics in Language Teaching, 54(2), 73–77. DOI: 10.1515/iral-2016-9990