What is ecological validity in VR speech therapy?

Ecological validity is the extent to which a research situation produces the behaviors and responses that would happen in the real-world situation it is meant to represent. A virtual cafe that triggers the same kinds of reactions a real cafe does has high ecological validity. Validity is not a single property of a VR environment; it depends on what you want to study or practice.

Do virtual audiences produce real-world speech responses?

Yes. Brundage and Hancock (2015) found that the primary speech measure during a virtual audience speech correlated at rho = 0.99 with the same measure during a live audience speech in adults who stutter. Bettahi et al. (2026) extended this to voice and physiology in 60 university students, finding that virtual audiences produced anticipatory anxiety, heart rate increases, and voice changes comparable to a real audience.

Can the visual environment alone change voice production?

Yes. Daşdöğen and Hitchcock (2026) found that visual distance cues in a virtual room significantly affected vocal intensity and pitch, even when the acoustic environment was held constant. Trained singers adjusted their voice more systematically than untrained speakers. The voice responds to the perceived speaking context, not only to the physical acoustics.

Does the current evidence support VR for everyday speech therapy practice?

The evidence supports VR-based speaking practice as a controlled, graded practice context that engages genuine communicative responses, not performative ones. The evidence base is still small (sample sizes mostly under 20, populations often narrow, long-term transfer largely untested), so VR is best treated as a tool in a kit rather than a standalone treatment. Building generalization into a plan and checking in about real-world experience is a sensible default.

Research

Ecological Validity in VR Speech Therapy: What the Evidence Says

By Gareth Walkom · April 22, 2026 · 9 min read

A grid of twelve VR speaking environments used in clinical practice and research, including cafe, classroom, auditorium, and meeting room.

Key takeaways

Ecological validity is context-specific - a VR environment can be valid for one task (e.g., speaking anxiety) and not another (e.g., voice production).
Across five peer-reviewed studies in stuttering, voice, and social-anxiety contexts, VR speaking environments consistently evoke responses that closely correspond to those in matched real-world situations.
The strongest evidence comes from Brundage & Hancock (2015) - rho = 0.99 between virtual and live audience stuttering frequency in adults who stutter.
Audience behavior (inattention, distraction, eye contact) matters more for evoking real responses than the size or visual realism of the virtual environment.
For clinicians, choose the VR scenario based on the specific behavior or response you want the client to practice - not based on visual fidelity alone.

“Does it feel real enough to matter?” is a reasonable question to ask of any virtual environment used in communication practice. If a virtual cafe does not evoke the reactions that a real cafe evokes, then practice in the virtual one is unlikely to transfer. If it does, it opens up a kind of practice space that is otherwise hard to arrange.

Over the last decade, a small but growing body of peer-reviewed research has tried to answer this question. Not in the abstract - but with measurements of anxiety, heart rate, voice acoustics, and other speech behaviors across matched real and virtual conditions. This post draws together what five of those studies tell us and what the evidence suggests for day-to-day practice.

The question

Ecological validity is the extent to which a research situation produces the behaviors and responses that would happen in the real-world situation it is meant to represent. A virtual cafe that looks plausible but triggers no anxiety at all has low ecological validity for studying speaking anxiety. A virtual cafe that triggers the same kinds of reactions a real cafe does has high ecological validity.

Validity is not a single property of a VR environment. It depends on what you want to study or practice. A VR setup might be ecologically valid for adult public speaking and not valid for child classroom participation, or valid for speaking anxiety and not for voice production, or valid for some people and not others.

What five studies show

The five studies at a glance

Ecological-validity evidence for VR speaking environments

2015n = 10 · within-subjects
Brundage & Hancock - virtual and live audiences produce nearly identical speech
Primary speech measure correlated at rho = 0.99 across virtual and live conditions. Communication apprehension and confidence ratings closely matched.
2026n = 60 · 3-condition
Bettahi et al. - virtual audiences trigger real anxiety and real voice changes
Anticipatory anxiety (SUDS), heart rate, and voice measures (F0, F0 variability) were comparable between real and virtual audience conditions. Higher reported presence = closer responses.
2026n = 8 · within-subjects
Daşdöğen & Hitchcock - virtual distance alone changes vocal behavior
Visual distance cues significantly affected vocal intensity and pitch even with acoustics held constant. Trained singers adjusted more systematically than untrained speakers.
2016n = 6 · pilot
Walkom - early prototype, honest pilot
Anxiety results were mixed across participants - some decreased, some unchanged, some increased. Physiological arousal appeared during exposure; observer noted speech-pattern shifts by session 2. Feasibility, not effect.
2024n = 5 · feasibility
Kumar, Cecil & Tetnowski - feasibility of at-home VR practice
Stuttering frequency dropped from 18.67% to 9.71% of syllables across a week; heart rate dropped too. No comparison condition - feasibility evidence, not causal effect.

Each study is summarized below. Sample sizes are small; the convergence across multiple measure types matters more than any single finding.

Brundage and Hancock, 2015: virtual and real audiences produce nearly identical stuttering responses

Brundage and Hancock (2015) had ten adults who stutter speak in both a live and a virtual audience condition. The primary speech measure showed a near-perfect correlation between virtual and live conditions (rho = 0.99). Communication apprehension and speaker confidence ratings were closely matched across conditions too.

This study is often cited as the foundational demonstration that virtual audiences are ecologically valid for studying stuttering under audience pressure. The sample is small, but the correlation is strong and the design matched individual participants across both conditions.

Bettahi and colleagues, 2026: virtual audiences trigger real anxiety and real voice changes

Bettahi et al. (2026) extended the validation question to voice and physiology. Sixty university students presented to a real audience, a virtual audience, and an empty virtual room. The virtual audience produced anticipatory anxiety (measured with SUDS) and heart rate increases that were comparable to the real audience. Voice measures (fundamental frequency and its variability) were largely equivalent across the real and virtual audience conditions.

A notable finding: participants who reported stronger feelings of presence in VR showed responses closest to their real-audience responses. Presence appears to be one of the variables that determines whether a given environment is ecologically valid for a given person.

Daşdöğen and Hitchcock, 2026: virtual distance alone changes how people use their voice

Daşdöğen and Hitchcock (2026) looked at a different question: whether visual properties of the virtual environment (room size, speaker-to-listener distance) would change vocal behavior even when the acoustic environment was held constant. Using the Rooms situation of Therapy withVR, they found that distance cues significantly affected vocal intensity and pitch. Trained singers adjusted their voice more systematically than untrained speakers.

This is a smaller study (eight adult females) but an important one conceptually. It shows that the visual virtual environment can drive vocal behavior on its own - the voice responds to the perceived speaking context, not only to the physical acoustics.

Walkom, 2016: early prototype, honest pilot

Walkom (2016) was an early-stage bachelor’s-dissertation pilot of a custom VR public-speaking tool with six adults who stutter. Self-reported anxiety results were mixed across participants - some showed decreases, some showed no change, and some showed increases. Physiological arousal appeared during exposure, and the observer noted visible shifts in speech patterns by Session 2. Six participants is not evidence of effect, and results were explicitly varied - but the pilot supported feasibility and raised useful questions for later work.

Kumar and colleagues, 2024: feasibility of at-home VR practice

Kumar, Cecil, and Tetnowski (2024) took the next step of taking VR out of the lab. Five adolescents and young adults who stutter used commercial VR headsets at home for a week with graded speaking scenarios. Stuttering frequency dropped from 18.67% to 9.71% of syllables, and heart rate dropped too. Again, five participants without a comparison condition is not evidence of effect, but the study demonstrates that at-home VR programs are feasible and worth testing at scale.

A forest-style view of the convergence

Forest plot of convergence statistics from the three direct-comparison studies

How closely VR responses tracked real-world equivalents, by reported correlation and effect

The three direct-comparison studies converge: Brundage shows near-perfect rank-order correspondence between VR and live audiences on stuttering and apprehension. Bettahi shows small-to-moderate effects of condition (i.e., real vs VR are similar) on most voice measures. Daşdöğen shows visual cues alone significantly shift voice production even with acoustics held constant. Walkom 2016 (n=6 mixed pilot) and Kumar 2024 (n=5 home feasibility) are not plotted here because they do not test virtual-versus-real audience correspondence directly.

Sources: Brundage & Hancock 2015 (American Journal of Speech-Language Pathology, DOI); Bettahi et al. 2026 (Frontiers in Virtual Reality); Daşdöğen & Hitchcock 2026 (Journal of Voice). Lower partial η² in Bettahi means the VR and real conditions produced more similar responses; for the disfluency measure, the condition effect was non-significant after Bonferroni correction (i.e., comparable performance across conditions). Daşdöğen's significant F-values for listener-distance show that visual distance cues alone reliably shift vocal intensity and pitch. Note: %SS-style frequency counts are reported here as the original studies measured them; the field is increasingly moving toward self-rated confidence, willingness-to-communicate, and participation-oriented measures.

What the evidence suggests

Drawing across these five studies, and the wider Evidence Hub they sit within, several patterns emerge.

Well-designed virtual audiences produce responses that resemble real-audience responses. This is what both Brundage and Hancock and Bettahi and colleagues showed, using different outcome measures (heart rate, voice, anxiety, and behavioral observation). The convergence across measures is more compelling than any single finding.

Presence matters, and it varies between people. Presence is the subjective feeling of being there inside a virtual environment. Higher presence is associated with responses closer to real-world equivalents. This suggests that ecological validity is partly a property of the person using the environment, not only of the environment itself.

Visual context alone can shape vocal and communicative behavior. The Daşdöğen and Hitchcock study shows that people adjust their voice based on the perceived virtual context, even when the acoustics are held constant. This matters for voice work and for any question about how speakers calibrate their output to audiences.

The evidence base is still small. Sample sizes are mostly under twenty. Populations are often non-clinical or narrow. Long-term transfer to everyday situations is largely untested. These are real limitations that should shape how confidently any finding is applied.

What this means for day-to-day practice

A few tentative takeaways for speech-language professionals considering VR practice as part of their work:

VR-based speaking practice appears to engage genuine communicative responses, not just performative ones. A session in a virtual audience is closer to a practice audience than to a role-play.
Individual differences in presence are worth attending to. If a person does not feel immersed, the environment is probably not doing the ecological work you want it to do. Checking in about presence is cheap and informative.
Generalization is not guaranteed. What happens in a virtual cafe happens in a virtual cafe. Whether it carries into a real cafe depends on factors the studies above do not fully answer yet. Building generalization into a plan (practicing the same skill across several situations, checking in about real-world experience) is a sensible default.
VR does not replace other forms of practice. It adds a controlled, graded practice context to the options available. The evidence supports it as a tool in a kit, not as a standalone treatment.

Editorial notes from withVR

The themes in this research shaped the design of Therapy withVR. The Auditorium situation exists because of work like Brundage and Hancock and Bettahi. The Room situation exists because of studies like Daşdöğen and Hitchcock. The Goal feature exists to support the generalization question - letting people rate their own confidence before and after a session, rather than relying on production targets.

None of that means research findings from studies on other VR systems transfer directly to Therapy withVR. They do not. What Therapy withVR tries to do is provide a practice environment consistent with the themes the evidence raises: graded situations, real-time clinician control, self-rated confidence over time, and environments people report feeling present in.

Continue Reading

Research

360° Video vs. Interactive VR: What the Research Actually Says

An objective, research-based comparison of the two kinds of virtual reality - 360-degree video and interactive, computer-generated VR - and what the evidence shows for speech therapy and beyond.

June 9, 2026 · 13 min read Read More

Research

What 20 years of VR social-anxiety research means for speech therapy

Three RCTs and a meta-analysis on VR exposure for social anxiety and public-speaking fear, and what they suggest for speech-language professionals.

April 27, 2026 · 10 min read Read More

Research

VR for Gender-Affirming Voice Training: What the First RCT Found

A 2025 RCT found that VR-based practice in gender-affirming voice training increased willingness to communicate with strangers. What it means for clinicians.

March 18, 2026 · 4 min read Read More

See the Software for Yourself

Whether you have questions, want to see the software, or are ready to get started - help is always available.

Get in Touch

No obligation - see the software before you commit

Ecological Validity in VR Speech Therapy: What the Evidence Says

The question

What five studies show

Brundage and Hancock, 2015: virtual and real audiences produce nearly identical stuttering responses

Bettahi and colleagues, 2026: virtual audiences trigger real anxiety and real voice changes

Daşdöğen and Hitchcock, 2026: virtual distance alone changes how people use their voice

Walkom, 2016: early prototype, honest pilot

Kumar and colleagues, 2024: feasibility of at-home VR practice

A forest-style view of the convergence

What the evidence suggests

What this means for day-to-day practice

Editorial notes from withVR

Further reading

Continue Reading

360° Video vs. Interactive VR: What the Research Actually Says

What 20 years of VR social-anxiety research means for speech therapy

VR for Gender-Affirming Voice Training: What the First RCT Found

See the Software for Yourself