Pilot RCT in youth who stutter: real-time photorealistic-avatar VR is well-accepted and elicits arousal, but one session did not outperform SLP role-play
How this was rated
Pilot RCT with randomized allocation (n=12; 6 per group), bootstrap inference (10,000 resamples) appropriate for small samples, and pre-specified objectives. Objective 1 (acceptability, presence) was supported; Objective 2 (eliciting physiological responses) was partially supported with an unexpected HR decrease in both groups; Objective 3 (added value of VR over SLP role-play after a single session) was NOT supported. Limitations constraining certainty: very small sample, single training session, two different actor-teachers in Session 2, retrospective SUDS, no immersive-tendencies measure, and a relevant industry-academic relationship - co-author Stephane Bouchard is consultant for and holds equity in Cliniques et Developpement In Virtuo (a VR-development company), though the paper notes that company did not create the environments used in this study.
Ratings use a simplified four-tier scheme (High, Moderate, Low, Very Low) informed by the GRADE working group. Learn more about how studies are rated.
A pilot RCT randomized 12 children and adolescents who stutter (ages 9-18) to one of two training conditions before facing an unknown actor-teacher: a conversation with a photorealistic virtual teacher in VR controlled live by their own SLP via facial motion capture (n=6), or face-to-face SLP role-play (n=6). The VR system was well-accepted (high presence, low cybersickness). Skin conductance was elevated from baseline in the VR group; SLP role-play raised self-reported anxiety more. A single session did not outperform role-play on self-efficacy or post-task in vivo anxiety.
A small-sample pilot RCT (n=12, 6 per group) showing that a real-time photorealistic-avatar VR system driven by the SLP via facial motion capture is acceptable and feasible for adolescents who stutter and elicits physiological arousal during face-to-face conversation. The study did NOT demonstrate added value of one VR session over one SLP role-play session for self-efficacy or for reducing anxiety during a subsequent in vivo conversation. Best interpreted as a feasibility study and a signal for multi-session research, not as evidence of clinical effectiveness.
Key findings
- System acceptability and presence were high: ITC-SOPI Spatial Presence M=3.60/5 (SD=0.70), Engagement M=3.87/5 (SD=0.78), Ecological Validity M=3.27/5 (SD=1.53), Negative Effects M=1.58/5 (SD=0.60). IWA items showed 'felt like talking to a real person' M=8.43/10, 'practiced situations were relevant' M=9.29/10, 'would like to use this tool for stressful situations' M=9.57/10
- Most participants did NOT recognize their own SLP behind the avatar (IWA item 2 M=2.86/10; lower is better here). Voice modulation (Clownfish Voice Changer) plus Live Link Face motion capture was effective, except for the two oldest participants (16 and 18 years) who recognized their SLP's prosody (scores 10 and 8 respectively)
- Skin conductance level (SCL) in the experimental group was significantly elevated from baseline during both speech preparation and conversation in Session 1 (in virtuo p=.006 and p=.009) and Session 2 (in vivo p<.001 and p=.008). The SLP role-play control group did NOT show significant SCL elevation from baseline in any phase of either session
- Skin conductance responses (SCRs) to specific anxiety-provoking stimuli were significantly MORE frequent in the SLP role-play group than in the VR group during Session 1, especially for frowning (t=-3.79, p<.05) and across all stimuli combined (t=-3.76, p<.05). Within the experimental group, SCR detection rates were comparable between in virtuo (Session 1) and in vivo (Session 2) conditions (t(4)=-1.07, p=.35)
- Self-reported SUDS in the VR group did NOT differ significantly from baseline during the in virtuo conversation, whereas the SLP role-play group's SUDS was significantly elevated above baseline (p<.001 at both conversation start and end). Between-group effect at conversation end was large (d=1.35, 95% CI [-2.77, -0.67], p=.031)
- Unexpectedly, heart rate DECREASED from baseline in both groups across most phases (e.g., experimental Session 1 conversation HR 95% CI [-9.828, -2.325], p=.002; control Session 1 conversation HR 95% CI [-6.038, -2.029], p<.001), and RMSSD often INCREASED - interpreted by the authors as autonomic adaptation/habituation rather than the predicted arousal increase
- Self-efficacy did NOT differ significantly between groups at any timepoint (all p>.58) and did NOT change significantly within either group from Session 1 to Session 2 (experimental Z=-1.36, p=.17, r=.60; control Z=-0.94, p=.345, r=.39). Moderate within-group effect sizes (r=0.36-0.60) suggest potential trends that larger samples might detect
- A single training session in either condition did NOT reduce anxiety during the subsequent in vivo conversation with the unfamiliar actor-teacher - both groups showed elevated SUDS at conversation start in Session 2 (experimental p=.003; control p=.002), returning to near-baseline by the end of the conversation
Background
Stuttering in school-age children and adolescents is frequently accompanied by social anxiety. Iverach et al. (2016) reported that children who stutter are approximately six times more likely than non-stuttering peers to develop a social anxiety disorder, and social anxiety in this population tends to increase through adolescence. CBT with graded exposure is effective at reducing social anxiety in adults who stutter, but few empirical interventions have specifically addressed anxiety in young people who stutter.
Exposure therapy traditionally relies on in vivo experiences or in-office role-play with the clinician. Both have constraints: in vivo exposure is logistically difficult and gives the therapist little control over the situation, while role-play is limited by participants’ awareness that the clinician is a familiar, safe figure rather than a stranger. Virtual reality has been proposed as a controllable middle ground, but most stuttering-VR work to date has used group or audience scenarios (Brundage et al. 2006, 2016; Brundage & Hancock 2015; Moise-Richard et al. 2021) rather than naturalistic one-on-one face-to-face conversation, and has relied on pre-scripted or static avatar behavior rather than real-time dynamic responses.
Delangle and colleagues (the same research team as Moise-Richard et al. 2021) set out to address two gaps: the absence of physiological measures in the team’s prior virtual-classroom work, and the absence of a real-time face-to-face VR scenario that simulates the naturalistic, reciprocal dynamics of everyday conversation.
What the researchers did
Thirteen children and adolescents who stutter were recruited from the Marie Enfant Rehabilitation Center (CHU Sainte-Justine), the Raymond-Dewar Institute, and private SLP clinics in Montreal, Quebec. One adolescent was excluded (prior similar VR exposure during a television show recording), leaving twelve participants in the analysis (6 girls; ages 9-18, M=13.1, SD=3.0). All had been professionally diagnosed with stuttering by a certified SLP, were currently in therapeutic treatment, and had completed at least three sessions with their own therapist. Random assignment by random number generator produced two balanced groups (n=6 per arm). Baseline measures of social anxiety (LSAS-CA) and public-speaking confidence (PRCS short form) showed no significant between-group differences.
The VR system. A photorealistic virtual teacher was built using MetaHuman and rendered on a Meta Quest 2 in Unreal Engine 5.03. From an adjacent room, the SLP controlled the avatar in real time: an iPhone 11 running Live Link Face captured the therapist’s lip, mouth, eye, eyebrow, and head movements and streamed them to the avatar; Clownfish Voice Changer applied a consistent male-voice effect (the SLPs were predominantly female, and a male avatar approximately 40-50 years old was chosen specifically to limit recognition). The SLP saw a live mirror of the participant’s view on a laptop, enabling dynamic verbal and non-verbal responsiveness. Subtle avatar body animation (slight rocking, neutral arm movement) was driven by Adobe Mixamo to avoid an unnatural static appearance. The virtual classroom and the experimentation room used matched desk geometry to enhance embodiment.
Procedure. Each participant completed two sessions approximately two weeks apart (M=12 days, SD=11.3, range 3-42). Each session had three phases:
- Baseline - a 3-minute conversation with an unfamiliar research-team member (different from both the SLP and the actor-teacher) on neutral topics (hobbies, music, travel destinations) to capture speaking-related arousal.
- Speech Preparation - a 3-minute discussion with the same team member about possible classroom accommodations the participant might request related to their stuttering.
- Conversation Task - a 5-minute conversation (M=4.7 min, SD=1.6) with the “teacher” in which the participant disclosed their stuttering and negotiated classroom accommodations. The teacher introduced three categories of anxiety-provoking stimuli, with each presented at least once: frowning, yawning, and scripted uncomfortable phrases (“That’s not fair to the other students”, “Why should I make special accommodations for you?”). SLPs received training to wait at least 20 seconds between stimuli and not introduce two simultaneously; they were instructed not to reinforce reassurance-seeking or avoidance behaviors.
In Session 1, the experimental group’s Conversation Task was in virtuo (their own SLP controlled the avatar from the adjacent room); the control group’s was an in-person SLP role-play. In Session 2 (in vivo, for both groups), the Conversation Task was with an unfamiliar actor-teacher in person.
Measures.
- Acceptability and presence (Objective 1): ITC-SOPI (44 items across Spatial Presence, Engagement, Ecological Validity, Negative Effects subscales) and a tailor-made Interaction with Avatars (IWA) questionnaire.
- Physiological anxiety (Objectives 2 and 3): Electrodermal activity (skin conductance level SCL, skin conductance response SCR) and electrocardiogram (heart rate HR, heart rate variability via RMSSD) recorded with a Biopac MP36R system and processed with AcqKnowledge and NeuroKit2. An adapted peripheral ECG electrode placement (left leg + right forearm) was used instead of the standard clavicle-costal placement to avoid being intrusive to children. Physiological measures were expressed as change from each session’s own baseline to handle natural day-to-day variation.
- Self-reported anxiety (Objectives 2 and 3): a modified 0-10 SUDS administered after baseline, after speech preparation, and retrospectively after the conversation (for the start and end of the conversation separately).
- Self-efficacy (Objective 3): a 14-item tailor-made questionnaire (1 = “Not at all confident” to 5 = “Very confident”) adapted from Bray et al. 2003 / Manning 1994 (Bandura’s social cognitive framework). Cronbach’s alpha = 0.87.
Statistical analysis. Because of the small sample, the authors used parametric (continuous physiological data) and non-parametric (ordinal questionnaire data) bootstrap resampling with 10,000 iterations in IBM SPSS Statistics 29 and Python. Significance was inferred from 95% confidence intervals (non-overlap with the zero-baseline or with the other group’s interval) and bootstrap p-values. Effect sizes (Cohen’s d, Pearson’s r) were reported alongside. No corrections for multiple comparisons were applied because bootstrap resampling provides empirical significance estimates without parametric assumptions.
What they found
Objective 1 - acceptability and presence. ITC-SOPI subscales indicated good presence: Spatial Presence M=3.60 (SD=0.70), Engagement M=3.87 (SD=0.78), Ecological Validity M=3.27 (SD=1.53; one participant rated this as 1/5, contributing to the larger SD), Negative Effects M=1.58 (SD=0.60). On the IWA, participants felt they were talking to a real person (M=8.43/10), did NOT feel they were talking to their own SLP (M=2.86/10; lower scores indicate the SLP-embodiment was successfully masked), found the simulated scenarios relevant (M=9.29/10), and strongly wanted access to the tool to practice stressful situations (M=9.57/10). The two oldest participants (16 and 18 years) gave the highest “felt like talking to my SLP” scores (10 and 8 respectively), recognizing their SLP’s prosody and intonation rather than the modulated voice. Open-ended responses showed participants wanted to practice with the tool more frequently (daily to twice-weekly before in vivo presentations) and identified oral presentations to a virtual class as a desired future scenario. Two participants on the same day reported voice-lip lag due to a weak Wi-Fi connection.
Objective 2 - physiological and subjective anxiety responses. The VR experimental group’s SCL was significantly elevated from baseline during both speech preparation (Session 1: 95% CI [0.404, 2.578], p=.006; Session 2: 95% CI [0.215, 0.935], p<.001) and conversation (Session 1: 95% CI [0.351, 2.142], p=.009; Session 2: 95% CI [0.859, 4.471], p=.008). The SLP role-play control group did NOT show significant SCL elevation from baseline in any phase of either session (all p > .05). Between-group SCL effects were generally small (d=0.06-0.41), with one medium effect for the Session 2 conversational task (d=0.80).
SCR detection rates (proportion of anxiety-provoking stimuli that elicited a skin conductance response) showed an unexpected pattern in Session 1: the control group exhibited significantly MORE SCRs than the experimental group, particularly for frowning (t=-3.79, p<.05) and across all stimuli combined (t=-3.76, p<.05). The authors interpret this as frowning being a subtle non-verbal cue that may be harder to perceive accurately from a photorealistic avatar than from a real person. Within the experimental group, SCR detection rates were comparable between the in virtuo (Session 1) and in vivo (Session 2) conditions (t(4)=-1.07, p=.35), suggesting the avatar elicited stimulus-locked physiological responses comparable to those evoked by a real person for the same individuals.
Heart rate and RMSSD produced an unexpected pattern across both groups. Rather than the predicted HR increase (and RMSSD decrease) with stress, both groups showed DECREASED HR from baseline across most phases (e.g., experimental Session 1 conversation HR 95% CI [-9.828, -2.325], p=.002; control Session 1 conversation HR 95% CI [-6.038, -2.029], p<.001) and INCREASED RMSSD during the experimental Session 1 conversation (95% CI [4.699, 12.030], p<.001). The authors interpret this as autonomic adaptation or habituation during social stress (Kreibig, 2010), noting that HR responses do not always follow the textbook pattern in social-anxiety contexts.
Self-reported SUDS showed a clear dissociation from physiological arousal. In Session 1, the VR group’s SUDS during the in virtuo conversation did NOT differ significantly from baseline (start 95% CI [-1.167, 2.833], p=.40; end 95% CI [-2.667, 1.833], p=1.0). The SLP role-play group’s SUDS was significantly elevated above baseline at both start (95% CI [2.00, 4.33], p<.001) and end (95% CI [2.16, 5.16], p<.001). The between-group effect at conversation end was large (d=1.35, 95% CI [-2.77, -0.67], p=.031); other Session 1 between-group comparisons showed large effect sizes (d=0.94-1.27) that did not reach statistical significance with this sample.
In Session 2 (in vivo for both groups), both groups showed elevated SUDS at conversation start (experimental p=.003; control p=.002) and returned to near-baseline by the end, with between-group effect sizes becoming small (d=0.25-0.30).
Objective 3 - added value of VR over SLP role-play. Self-efficacy showed no statistically significant between-group differences at any measurement time (Mann-Whitney U tests, all p>.58) and no statistically significant within-group changes from before the Session 1 conversation to after the Session 2 conversation (experimental Wilcoxon Z=-1.36, p=.17, r=.60; control Z=-0.94, p=.345, r=.39). The authors conclude that a single training session in either condition did not produce significant gains in self-efficacy or reduce anxiety during the subsequent in vivo conversation with the unfamiliar actor-teacher. The moderate within-group effect sizes (r=0.36-0.60) suggest larger or longer studies might be able to detect signals that this pilot was underpowered to demonstrate.
Why this matters
This is the first study in stuttering to implement a real-time face-to-face VR environment with a photorealistic avatar whose verbal and non-verbal behavior is driven live by a clinician via facial motion capture, and to combine that with paired physiological and subjective measures. It extends the same research team’s prior virtual-classroom work (Moise-Richard et al. 2021) from group-audience scenarios into one-on-one conversation, and addresses that prior study’s absence of physiological measurement.
The headline interpretive contribution is the dissociation between elevated physiological arousal (SCL) and unchanged subjective distress (SUDS) in the VR condition - a pattern consistent with Lang’s tripartite model of anxiety and with Brundage et al. (2016) in adults who stutter. If a future multi-session protocol confirms this dissociation reliably, the authors propose it could support VR as an early-stage entry point for avoidant adolescents who would otherwise refuse in vivo exposure: the body engages the fear-extinction-relevant arousal mechanism while the conscious experience of threat remains tolerable. The authors are explicit that this is an inference from the dissociation pattern, not a demonstrated treatment effect of this study.
Equally important is what the study did NOT show: a single VR training session was not superior to a single SLP role-play session in reducing anxiety or improving self-efficacy when participants subsequently faced an unfamiliar actor-teacher. The authors are clear that multi-session protocols within a full CBT framework are needed before any clinical recommendation about VR’s added value can be made.
For Therapy withVR specifically: this study did not use, evaluate, or compare against Therapy withVR. The system tested is a custom Unreal Engine 5.03 / MetaHuman application driven by SLP facial motion capture on a Meta Quest 2. Therapy withVR is a different platform with a different control model (clinician-adjustable environments, emotions, and audience behavior from a web application rather than facial embodiment of a single avatar). The Delangle paper is included in the Evidence Hub because it adds to the broader evidence base on immersive VR for stuttering anxiety in young people, not because it relates to Therapy withVR.
Limitations
The authors explicitly flag the following in their Discussion (Section 6):
- Very small sample. N=12, 6 per group. Substantial individual variation in both perceived and physiological responses was observed. A larger sample is needed to characterize variability and identify response profiles.
- Single training session only. The authors are explicit that one session is insufficient to assess training effects on self-efficacy and anxiety; multi-session protocols are needed.
- Inconsistency of actor-teachers in Session 2. Two different actors played the in vivo teacher across the sample (matched on age range and body type, both engaged across groups), but each actor naturally had different prosodic features and unique reactions.
- Same SLP played both roles across participants. Each participant’s own SLP sometimes played the avatar (experimental group) and sometimes the in-person role-play partner (control group). The authors note this reflects natural clinical-setting variation but adds variability to the data.
- Recognition of SLP by older participants. The two oldest participants (16 and 18 years) recognized their SLP’s prosody and speaking style despite voice modulation, which may have influenced their emotional responses.
- Retrospective SUDS measurement. SUDS was administered after the conversation to avoid interrupting the task. Participants who naturally regulated their anxiety during the conversation may have underestimated peak anxiety retrospectively.
- No immersive-tendencies measure. The Immersive Tendencies Questionnaire (Witmer & Singer, 1998) was excluded to reduce cognitive load on younger participants. Individual differences in immersion propensity may account for variability in subjective presence and emotional responses.
- One-on-one vs group settings not compared. Some participants reported lower anxiety than expected in one-on-one settings; future work could compare single-avatar VR to virtual-group scenarios.
- No eye-tracking. Some children appeared to avoid eye contact with the virtual teacher; eye-tracking would help quantify avoidance behaviors.
- Unknown actor instead of real teacher. The in vivo Session 2 used an unfamiliar actor rather than each participant’s own teacher. This improved experimental control but reduced personal stakes (no real academic or social consequences for the participant’s responses), potentially contributing to lower perceived anxiety.
- Not a full CBT exposure framework. The procedure is more accurately a “training session” than formal exposure therapy. A full CBT-based exposure protocol with graded hierarchy, expectancy-violation framing, and post-exposure consolidation across multiple sessions was not implemented.
- COI to disclose. Co-author Stephane Bouchard is consultant for and holds equity in Cliniques et Developpement In Virtuo, a VR-development company. The paper explicitly states that company did not create the environments used in this study, but the equity relationship is a relevant background factor when evaluating the paper’s interpretive framing about VR’s therapeutic potential.
Implications for practice
For clinicians considering immersive VR for adolescents who stutter, this pilot trial supports acceptability and feasibility of a real-time photorealistic-avatar VR system but provides NO evidence that one VR session reduces anxiety or improves self-efficacy more than one SLP role-play session before a real-world speaking task. The authors' interpretive proposal - that elevated physiological arousal combined with unchanged subjective distress could make VR a useful entry point for avoidant young people who would refuse in vivo exposure - is an inference from the dissociation pattern, not a demonstrated treatment effect of this study. The authors are explicit that VR should be used within a multi-session CBT framework alongside traditional approaches, not as a standalone single-session intervention.
Implications for research
Replication is needed in larger samples and across multiple training sessions before any claim of clinical added value of VR over SLP role-play can be made. Future studies should incorporate a validated immersive-tendencies measure (e.g., Witmer & Singer's ITQ), eye-tracking for avoidance behaviors, comparison of one-on-one with group/audience VR scenarios, and a full CBT-based exposure protocol with expectancy-violation framing and a graded session hierarchy. Authors also note that prosodic recognition by older adolescents (16-18 years) of their own SLP behind the avatar warrants further investigation.
Where this connects to Therapy withVR
The study above is independent research and does not endorse any product. The notes below are commentary from withVR on how the themes in this research relate to features of Therapy withVR. The research findings are not claims about Therapy withVR.
Real-time clinician-controlled avatar (different platform)
This study used a custom Unreal Engine 5.03 / MetaHuman system on a Meta Quest 2, where the SLP controlled the virtual teacher's facial expressions in real time via Live Link Face on an iPhone 11, with voice modulation through Clownfish Voice Changer. Therapy withVR uses a different control model: the clinician adjusts environmental parameters, avatar emotions, and audience behavior from a web application rather than facially embodying a single avatar via motion capture. Editorial parallel only - the studied tool is research software custom-built by the authors, not a commercial product.
Adjustable conversational difficulty
The Delangle study introduced graduated anxiety-provoking stimuli (frowning, yawning, scripted uncomfortable phrases such as 'That's not fair to the other students') during the conversation, with the SLP timing each stimulus based on participant reactions. Therapy withVR's clinician controls allow analogous real-time adjustments to avatar emotions and conversational dynamics within its own design. Editorial parallel only.
Multi-session flexibility
The Delangle authors explicitly note that one training session was insufficient to detect effects on self-efficacy or anxiety transfer, and recommend multi-session protocols within a full CBT framework. Therapy withVR's session profiles and saved configurations facilitate the kind of repeated, graded practice that multi-session research calls for. Editorial parallel only.
Cite this study
If you reference this study in your work, the canonical citation formats are:
@article{delangle2026,
author = {Delangle, M. and Moise-Richard, A. and Leclercq A-L and Labbe, D. and Bouchard, S. and Andrews, S. and Menard, L.},
title = {Speaking face-to-face with a virtual avatar to reduce anxiety in students who stutter: Tool development and pilot study results},
journal = {Journal of Fluency Disorders},
year = {2026},
doi = {10.1016/j.jfludis.2026.106194},
url = {https://withvr.app/evidence/studies/delangle-2026}
} TY - JOUR
AU - Delangle, M.
AU - Moise-Richard, A.
AU - Leclercq A-L
AU - Labbe, D.
AU - Bouchard, S.
AU - Andrews, S.
AU - Menard, L.
TI - Speaking face-to-face with a virtual avatar to reduce anxiety in students who stutter: Tool development and pilot study results
JO - Journal of Fluency Disorders
PY - 2026
DO - 10.1016/j.jfludis.2026.106194
UR - https://withvr.app/evidence/studies/delangle-2026
ER - Know of research that should be in this hub? If a relevant peer-reviewed study is not listed here, send the reference to hello@withvr.app. The hub is kept up to date as the literature grows.
Funding & independence
From the paper's own Declaration of Competing Interest: 'Stephane Bouchard is consultant for, and holds equity in Cliniques et Developpement In Virtuo, which develops virtual environments; however, Cliniques et Developpement In Virtuo did not create the virtual environments used in this study. None of the authors have any conflicts of interest to declare.' Bouchard's equity in a commercial VR-development company is a relevant background relationship that any reader should be aware of when evaluating the paper's interpretive framing about VR's therapeutic potential, even though the company is explicitly not the developer of the tool tested here. From the paper's Acknowledgements: 'This work was supported by the Fonds de recherche du Quebec (FRQ) through the AUDACE program (grant number 2022-AUDC-300126).' The VR tool was custom-developed by the research team using Unreal Engine 5.03 (Epic Games) with assets from Quixel Bridge and a MetaHuman avatar, running on a Meta Quest 2 connected to a laptop with Intel Core i7-12700H, 16 GB RAM, GeForce RTX 3070i; it is not a commercial product and is not Therapy withVR. No withVR BV involvement in funding, study design, or authorship. Summary prepared independently by withVR using the published paper.