Who participated in this study?

This study involved 31 Vocally healthy adults.

Within-subjects study (n=31): auditory, visual, and audiovisual VR room cues all shift vocal loudness, effort, and output

Daşdöğen Ü et al. · 2023 · Journal of Voice · Experimental · n = 31 · Vocally healthy adults · DOI

Evidence certainty: Moderate certainty

How this was rated

Within-subjects design with 31 vocally healthy adults across 18 carefully crossed conditions (auditory-only, visual-only, audiovisual, ± background noise) - a strong factorial design for hypothesis-testing. Peer-reviewed in Journal of Voice (Elsevier, established peer-reviewed voice journal). Self-reported and objective acoustic measures combined. Limitations: vocally healthy adults only (does not test clinical voice populations); single VR system (research-grade); the 18-condition design optimizes mechanism-testing over clinical-protocol validation. The findings support the realism-and-validity construct for voice-in-VR but do not directly establish therapeutic efficacy in voice patients - that requires follow-on work in clinical populations.

Ratings use a simplified four-tier scheme (High, Moderate, Low, Very Low) informed by the GRADE working group. Learn more about how studies are rated.

Thirty-one vocally healthy men and women were tested under 18 sensory-input conditions in immersive virtual reality - two auditory rooms with different reverberation times, two visual rooms with different volumes, and audiovisual combinations - each with and without background noise. Speakers performed counting, sustained vowels, an all-voiced CAPE-V sentence, and a Rainbow Passage sentence. Self-perceived vocal loudness and effort INCREASED, and self-perceived vocal comfort DECREASED, as room volume, speaker-to-listener distance, audiovisual richness, and background noise increased. Sound pressure level (SPL) and spectral moments (mean, SD, skewness, kurtosis) showed concomitant changes. Visual and audiovisual input - not just auditory - measurably shaped voice production.

Clinical bottom line

A controlled within-subjects experimental study in 31 vocally healthy adults showing that visual and audiovisual room cues in immersive VR - not just acoustic cues - measurably change self-perceived vocal loudness, effort, and comfort, AND change acoustic output (SPL and spectral moments). This is foundational realism-and-validity evidence for using immersive VR in voice therapy: it establishes that the immersive visual context can drive vocal adaptations beyond what acoustic simulation alone produces. Clinicians using or considering immersive VR for voice work should expect the visual environment to be a meaningful therapeutic variable, not a backdrop.

Key findings

31 vocally healthy adults (men and women) tested under 18 sensory-input conditions in immersive VR: 2 auditory rooms (varying reverberation) × 2 visual rooms (varying volume) × audiovisual combinations × with/without background noise
Self-perceived VOCAL LOUDNESS increased as room volume, speaker-to-listener distance, audiovisual richness, and background noise increased
Self-perceived VOCAL EFFORT increased under the same conditions
Self-perceived VOCAL COMFORT decreased - the inverse pattern, consistent with effort-comfort tradeoff
Objective acoustic outputs (sound pressure level [SPL] and spectral moments - mean, SD, skewness, kurtosis) changed in line with the self-reports - speakers automatically adjusted their voice to the perceived room
Visual and audiovisual input - not just auditory cues - measurably shaped voice production. This is the first immersive-VR evidence that the visual environment is a meaningful therapeutic variable in its own right, not just a backdrop for acoustic simulation
Speech tasks spanned counting, sustained vowel phonation, an all-voiced CAPE-V sentence, and the first sentence of the Rainbow Passage - covering both phonation and connected speech

Background

Voice therapy traditionally happens in a clinic room - quiet, acoustically dry, with no visible audience. The voice the client produces in that room is often very different from the voice they produce in the real-world settings where their voice problem actually matters (large rooms, noisy backgrounds, social or performance audiences). Acoustic simulation alone (reverberation, background noise) partly addresses this, but immersive VR offers something acoustic simulation cannot: a synchronised VISUAL environment that the client can see, including room size, perceived listener distance, and ambient context.

Whether the visual environment actually drives measurable vocal adaptations beyond the acoustic environment had not been systematically tested in immersive VR.

What the researchers did

31 vocally healthy adults (men and women) were tested under 18 sensory-input conditions in immersive VR. The 18 conditions were created by crossing:

2 auditory rooms with different reverberation times
2 visual rooms with different volumes
audiovisual combinations of the two
each with and without background noise

Each participant completed all 18 conditions, performing four speech tasks per condition: counting, sustained vowel phonation, an all-voiced CAPE-V sentence, and the first sentence of the Rainbow Passage.

Outcomes were self-perceived vocal loudness, effort, and comfort (each rated 0-100); plus objective acoustic measures - sound pressure level (SPL in dB) and spectral moments (spectral mean and SD in Hz, skewness, kurtosis).

What they found

Self-perceived vocal loudness and effort INCREASED as room volume, speaker-to-listener distance, audiovisual richness, and background noise increased.
Self-perceived vocal comfort DECREASED under the same conditions - the inverse pattern, consistent with the effort-comfort tradeoff.
Objective SPL and spectral moments changed concomitantly - speakers automatically adjusted their voice to match the perceived room.
Visual and audiovisual input - not just auditory cues - measurably shaped voice production. This is the central new finding: the visual environment in immersive VR is a meaningful therapeutic variable.

Why this matters

For voice researchers and clinicians exploring immersive VR, this study establishes that the immersive visual context drives measurable changes in vocal output and self-perceived voice - BEYOND what acoustic-only simulation can achieve. The choice of scenario in a VR environment (small cafe vs. large auditorium vs. noisy classroom) can affect vocal adaptations. The study is foundational evidence for voice-in-VR work that has since proliferated (e.g., Leyns 2025 RCT for gender-affirming voice training, Hoff 2026 voice meditation, Daşdöğen 2026 follow-on).

Limitations

Vocally healthy adults only - clinical efficacy in voice-disordered populations is not directly tested.
Single VR system - generalization to consumer hardware (Meta Quest) requires replication.
18-condition factorial optimizes mechanism-testing over clinical-protocol validation; a clinically-grounded VR voice-therapy protocol is a separate translational step.
No clinical outcomes (vocal endurance, voice handicap, patient-reported outcomes) - this is a controlled experimental study of voice perception and production mechanics.
Sample size n=31 is adequate for within-subjects effects but modest for subgroup analyses (e.g., by sex, by speaking style, by baseline vocal habits).

Implications for practice

For voice clinicians considering immersive VR as a therapy tool: the immersive visual context drives measurable changes in vocal output and self-perceived voice, BEYOND what acoustic-only simulation can achieve. This is foundational evidence that immersive VR has a unique affordance for voice therapy (e.g., training projection to realistic distances, voice-in-noise habituation, ecologically valid environmental cueing for behavioral voice goals). Clinicians using Therapy withVR or similar products for voice work should treat the choice of scenario (cafe vs. auditorium vs. classroom) as a therapeutic decision, not a cosmetic one. The study is in vocally healthy adults, so clinical efficacy in voice-disordered populations still needs direct testing. The same research team (Daşdöğen and colleagues) published a 2026 Journal of Voice paper extending this work; see dasdogen-2026 in this Hub.

Cite this study

If you reference this study in your work, the canonical citation formats are:

APA 7th

Daşdöğen Ü, Awan, S. N., Bottalico, P., Iglesias, A., Getchell, N., & Verdolini Abbott, K. (2023). The Influence of Multisensory Input on Voice Perception and Production Using Immersive Virtual Reality. Journal of Voice. https://doi.org/10.1016/j.jvoice.2023.07.026.

AMA 11th

Daşdöğen Ü, Awan SN, Bottalico P, Iglesias A, Getchell N, Verdolini Abbott K. The Influence of Multisensory Input on Voice Perception and Production Using Immersive Virtual Reality. Journal of Voice. 2023. doi:10.1016/j.jvoice.2023.07.026.

BibTeX

@article{daden2023,
  author = {Daşdöğen Ü and Awan, S. N. and Bottalico, P. and Iglesias, A. and Getchell, N. and Verdolini Abbott, K.},
  title = {The Influence of Multisensory Input on Voice Perception and Production Using Immersive Virtual Reality},
  journal = {Journal of Voice},
  year = {2023},
  doi = {10.1016/j.jvoice.2023.07.026},
  url = {https://withvr.app/evidence/studies/dasdogen-2023}
}

RIS

TY  - JOUR
AU  - Daşdöğen Ü
AU  - Awan, S. N.
AU  - Bottalico, P.
AU  - Iglesias, A.
AU  - Getchell, N.
AU  - Verdolini Abbott, K.
TI  - The Influence of Multisensory Input on Voice Perception and Production Using Immersive Virtual Reality
JO  - Journal of Voice
PY  - 2023
DO  - 10.1016/j.jvoice.2023.07.026
UR  - https://withvr.app/evidence/studies/dasdogen-2023
ER  -

Know of research that should be in this hub? If a relevant peer-reviewed study is not listed here, send the reference to hello@withvr.app. The hub is kept up to date as the literature grows.

Funding & independence

Affiliations: New York University, Orlando, Champaign IL, Newark DE. Funding details and conflict-of-interest disclosures not extracted in the abstract excerpt available for this summary. Open or paywalled status: Journal of Voice (Elsevier). No withVR BV involvement in funding, study design, or authorship. Summary prepared independently by withVR using the published peer-reviewed paper. The immersive VR system used was a research-grade custom configuration, NOT Therapy withVR or Research withVR.

Last reviewed: 2026-05-17 Next review due: 2027-05-17 Reviewed by: Gareth Walkom

Within-subjects study (n=31): auditory, visual, and audiovisual VR room cues all shift vocal loudness, effort, and output

Key findings

Background

What the researchers did

What they found

Why this matters

Limitations

Implications for practice

Related Studies

External attentional focus in VR promotes more flexible speech movement in adults who stutter

In VR, how far away the listener appears drives vocal loudness more than room size does

Virtual room size and listener distance influence how people use their voice

VR-based meditation reduced anxiety before voice therapy in a small exploratory RCT, with lower attrition in the VR arm

VR-based speaking practice increases willingness to communicate in gender-affirming voice training

VR voice therapy with clinician feedback drew out teaching prosody in trainee teachers, but raised vocal discomfort

A VR classroom successfully brings out how teachers really use their voice when teaching

Cite this study

Funding & independence