Within-subjects study in 31 vocally healthy adults: auditory, visual, and audiovisual room cues in immersive VR all measurably change self-perceived vocal loudness, effort, comfort, and acoustic output
How this was rated
Within-subjects design with 31 vocally healthy adults across 18 carefully crossed conditions (auditory-only, visual-only, audiovisual, ± background noise) - a strong factorial design for hypothesis-testing. Peer-reviewed in Journal of Voice (Elsevier, established peer-reviewed voice journal). Self-reported and objective acoustic measures combined. Limitations: vocally healthy adults only (does not test clinical voice populations); single VR system (research-grade); the 18-condition design optimises mechanism-testing over clinical-protocol validation. The findings support the realism-and-validity construct for voice-in-VR but do not directly establish therapeutic efficacy in voice patients - that requires follow-on work in clinical populations.
Ratings use a simplified four-tier scheme (High, Moderate, Low, Very Low) informed by the GRADE working group. Learn more about how studies are rated.
Thirty-one vocally healthy men and women were tested under 18 sensory-input conditions in immersive virtual reality - two auditory rooms with different reverberation times, two visual rooms with different volumes, and audiovisual combinations - each with and without background noise. Speakers performed counting, sustained vowels, an all-voiced CAPE-V sentence, and a Rainbow Passage sentence. Self-perceived vocal loudness and effort INCREASED, and self-perceived vocal comfort DECREASED, as room volume, speaker-to-listener distance, audiovisual richness, and background noise increased. Sound pressure level (SPL) and spectral moments (mean, SD, skewness, kurtosis) showed concomitant changes. Visual and audiovisual input - not just auditory - measurably shaped voice production.
A controlled within-subjects experimental study in 31 vocally healthy adults showing that visual and audiovisual room cues in immersive VR - not just acoustic cues - measurably change self-perceived vocal loudness, effort, and comfort, AND change acoustic output (SPL and spectral moments). This is foundational realism-and-validity evidence for using immersive VR in voice therapy: it establishes that the immersive visual context can drive vocal adaptations beyond what acoustic simulation alone produces. Clinicians using or considering immersive VR for voice work should expect the visual environment to be a meaningful therapeutic variable, not a backdrop.
Key findings
- 31 vocally healthy adults (men and women) tested under 18 sensory-input conditions in immersive VR: 2 auditory rooms (varying reverberation) × 2 visual rooms (varying volume) × audiovisual combinations × with/without background noise
- Self-perceived VOCAL LOUDNESS increased as room volume, speaker-to-listener distance, audiovisual richness, and background noise increased
- Self-perceived VOCAL EFFORT increased under the same conditions
- Self-perceived VOCAL COMFORT decreased - the inverse pattern, consistent with effort-comfort tradeoff
- Objective acoustic outputs (sound pressure level [SPL] and spectral moments - mean, SD, skewness, kurtosis) changed in line with the self-reports - speakers automatically adjusted their voice to the perceived room
- Visual and audiovisual input - not just auditory cues - measurably shaped voice production. This is the first immersive-VR evidence that the visual environment is a meaningful therapeutic variable in its own right, not just a backdrop for acoustic simulation
- Speech tasks spanned counting, sustained vowel phonation, an all-voiced CAPE-V sentence, and the first sentence of the Rainbow Passage - covering both phonation and connected speech
Background
Voice therapy traditionally happens in a clinic room - quiet, acoustically dry, with no visible audience. The voice the client produces in that room is often very different from the voice they produce in the real-world settings where their voice problem actually matters (large rooms, noisy backgrounds, social or performance audiences). Acoustic simulation alone (reverberation, background noise) partly addresses this, but immersive VR offers something acoustic simulation cannot: a synchronised VISUAL environment that the client can see, including room size, perceived listener distance, and ambient context.
Whether the visual environment actually drives measurable vocal adaptations beyond the acoustic environment had not been systematically tested in immersive VR.
What the researchers did
31 vocally healthy adults (men and women) were tested under 18 sensory-input conditions in immersive VR. The 18 conditions were created by crossing:
- 2 auditory rooms with different reverberation times
- 2 visual rooms with different volumes
- audiovisual combinations of the two
- each with and without background noise
Each participant completed all 18 conditions, performing four speech tasks per condition: counting, sustained vowel phonation, an all-voiced CAPE-V sentence, and the first sentence of the Rainbow Passage.
Outcomes were self-perceived vocal loudness, effort, and comfort (each rated 0-100); plus objective acoustic measures - sound pressure level (SPL in dB) and spectral moments (spectral mean and SD in Hz, skewness, kurtosis).
What they found
- Self-perceived vocal loudness and effort INCREASED as room volume, speaker-to-listener distance, audiovisual richness, and background noise increased.
- Self-perceived vocal comfort DECREASED under the same conditions - the inverse pattern, consistent with the effort-comfort tradeoff.
- Objective SPL and spectral moments changed concomitantly - speakers automatically adjusted their voice to match the perceived room.
- Visual and audiovisual input - not just auditory cues - measurably shaped voice production. This is the central new finding: the visual environment in immersive VR is a meaningful therapeutic variable.
Why this matters
For voice clinicians considering immersive VR, this study establishes that the immersive visual context drives measurable changes in vocal output and self-perceived voice - BEYOND what acoustic-only simulation can achieve. Clinically, this means the choice of scenario in a VR voice-therapy session (small cafe vs. large auditorium vs. noisy classroom) is a therapeutic decision affecting expected vocal adaptations. The study is foundational evidence for voice-in-VR work that has since proliferated (e.g., Leyns 2025 RCT for gender-affirming voice training, Hoff 2026 voice meditation, Daşdöğen 2026 follow-on).
Limitations
- Vocally healthy adults only - clinical efficacy in voice-disordered populations is not directly tested.
- Single VR system - generalization to consumer hardware (Meta Quest) requires replication.
- 18-condition factorial optimises mechanism-testing over clinical-protocol validation; a clinically-grounded VR voice-therapy protocol is a separate translational step.
- No clinical outcomes (vocal endurance, voice handicap, patient-reported outcomes) - this is a controlled experimental study of voice perception and production mechanics.
- Sample size n=31 is adequate for within-subjects effects but modest for subgroup analyzes (e.g., by sex, by speaking style, by baseline vocal habits).
Implications for practice
For voice clinicians considering immersive VR as a therapy tool: the immersive visual context drives measurable changes in vocal output and self-perceived voice, BEYOND what acoustic-only simulation can achieve. This is foundational evidence that immersive VR has a unique affordance for voice therapy (e.g., training projection to realistic distances, voice-in-noise habituation, ecologically valid environmental cueing for behavioral voice goals). Clinicians using Therapy withVR or similar products for voice work should treat the choice of scenario (cafe vs. auditorium vs. classroom) as a therapeutic decision, not a cosmetic one. The study is in vocally healthy adults, so clinical efficacy in voice-disordered populations still needs direct testing. The same research team (Daşdöğen and colleagues) published a 2026 Journal of Voice paper extending this work; see dasdogen-2026 in this Hub.
Cite this study
If you reference this study in your work, the canonical citation formats are:
@article{daden2023,
author = {Daşdöğen Ü and Awan, S. N. and Bottalico, P. and Iglesias, A. and Getchell, N. and Verdolini Abbott, K.},
title = {The Influence of Multisensory Input on Voice Perception and Production Using Immersive Virtual Reality},
journal = {Journal of Voice},
year = {2023},
doi = {10.1016/j.jvoice.2023.07.026},
url = {https://withvr.app/evidence/studies/dasdogen-2023}
}TY - JOUR
AU - Daşdöğen Ü
AU - Awan, S. N.
AU - Bottalico, P.
AU - Iglesias, A.
AU - Getchell, N.
AU - Verdolini Abbott, K.
TI - The Influence of Multisensory Input on Voice Perception and Production Using Immersive Virtual Reality
JO - Journal of Voice
PY - 2023
DO - 10.1016/j.jvoice.2023.07.026
UR - https://withvr.app/evidence/studies/dasdogen-2023
ER - Know of research that should be in this hub? If a relevant peer-reviewed study is not listed here, send the reference to hello@withvr.app. The hub is kept up to date as the literature grows.
Funding & independence
Affiliations: New York University, Orlando, Champaign IL, Newark DE. Funding details and conflict-of-interest disclosures not extracted in the abstract excerpt available for this summary. Open or paywalled status: Journal of Voice (Elsevier). No withVR BV involvement in funding, study design, or authorship. Summary prepared independently by withVR using the published peer-reviewed paper. The immersive VR system used was a research-grade custom configuration, NOT Therapy withVR or Research withVR.