In VR, how far away the listener appears drives vocal loudness more than room size does
How this was rated
This certainty rating reflects confidence in the evidence for clinical or therapeutic use - not the robustness of the underlying finding, which is strong and replicates the team's earlier work (Daşdöğen et al., 2023). The study is peer-reviewed in the Journal of Speech, Language, and Hearing Research (ASHA), IRB-approved, and NIH/NIDCD-funded (R21-DC020494), with voice scientist Katherine Verdolini-Abbott as multiple principal investigator. Design strengths: auditory feedback was held constant across all conditions so visual-spatial input was isolated as the cause; the mechanistic question (room size versus speaker-to-listener distance) was separated cleanly, unlike the team's earlier work that varied them together; three speech tasks were used; SPL was externally calibrated; and the analysis is a defensible linear mixed-model frame with Tukey-Kramer post-hoc contrasts. What keeps certainty low for clinical application: small total sample (N = 15) with an unbalanced sex ratio (12 female, 3 male); vocally healthy adults only, so no evidence yet in voice-disordered populations; a single session; an acoustic surrogate outcome (SPL) rather than a clinical voice outcome; simulated environments that do not capture real-world acoustics or live social dynamics; no self-reported vocal effort or perceptual outcomes; and self-reported (not objectively screened) hearing. The study robustly establishes the visual-spatial mechanism; therapeutic efficacy and real-world transfer are separate questions that require larger, multisession studies in clinical populations with control comparators.
Ratings use a simplified four-tier scheme (High, Moderate, Low, Very Low) informed by the GRADE working group. Learn more about how studies are rated.
Using the Room situation in Therapy withVR with the sound kept constant, 15 vocally healthy adults spoke across virtual conditions that varied room size and speaker-to-listener distance. Distance was the main driver of vocal intensity - the farther away the virtual listener appeared, the louder people spoke - while room size acted as a moderator that strengthened the distance effect, especially at the farthest distances. Because only the visuals changed, the study shows that visual-spatial cues alone can scale the voice.
An NIH-funded, peer-reviewed experimental study that isolates the visual drivers of vocal intensity in immersive VR. Across three speech tasks (a sustained vowel, a standard phrase, and spontaneous speech), 15 vocally healthy adults raised their loudness as the virtual listener moved farther away, with room size moderating the effect - strongest in the large room at the far (15 m) distance. Keeping the sound identical across conditions isolated vision as the cause. The statistics are robust (linear mixed models; all main effects and the Room Size x Distance interaction at p < .001). It is limited by a small sample (N = 15), an unbalanced sex ratio (12 female, 3 male), vocally healthy adults only, and a single session - so it establishes the mechanism, not clinical efficacy. The study used the Therapy withVR Room situation, and withVR's founder built the custom environments used in it.
Key findings
- Across all three speech tasks, speaker-to-listener distance was the primary driver of vocal intensity (SPL): the farther the virtual listener appeared, the louder participants spoke (listener-distance main effect p < .0001 for every task)
- Room size had a smaller, secondary effect and acted as a moderator - a significant Room Size x Distance interaction (p < .0001 across tasks) showed the distance effect was strongest in the large room at the far (15 m) listener distance
- For the sustained vowel, the far-distance large-room condition was about 4.4 dB louder than the large room alone, and moving the listener to 15 m in the large room raised intensity by about 2.9 dB
- Listener distance was the primary driver across all three tasks; the effect was larger for the sustained vowel and the read phrase (task types that naturally elicit louder, steadier voicing), while in spontaneous speech the specific 3 m vs 15 m post-hoc contrast did not reach significance
- Auditory feedback was held identical across all conditions, so the vocal changes were driven by visual-spatial input alone, not by any change in what participants heard
- Robust analysis (linear mixed models, Tukey-Kramer post-hoc): for spontaneous speech, listener distance F(3, 112) = 84.6, room size F(2, 112) = 51.6, and the interaction F(3, 112) = 53.0, all p < .0001
Background
When we speak, we adjust our voice to the situation without thinking about it - projecting across a large hall, lifting our volume when a listener is far away, easing off when they are close. A long-standing challenge in voice therapy is that gains made in a quiet clinic room often do not carry over to these real-world settings, where the demands on the voice are completely different.
A growing line of work asks whether immersive virtual reality (IVR) can recreate those demands closely enough to drive - and eventually train - real vocal behavior. Daşdöğen and colleagues had already shown that virtual room size and listener distance can change how loudly people speak, but because they varied both at once, they could not tell which cue was doing the work. This study set out to separate them.
What the researchers did
Fifteen vocally healthy adults completed three speech tasks - a sustained vowel /a/, the standard phrase “We were away a year ago,” and a spontaneous response - across eight immersive-VR conditions delivered through the Room situation in Therapy withVR on an Oculus Quest 3 headset. The conditions varied room size (a small 5 x 4 x 4 m room versus a large 20 x 20 x 20 m room), speaker-to-listener distance (1 m, 3 m, or 15 m), and combinations of the two, plus listener-only conditions with no room.
Crucially, the sound was held identical across every condition - participants wore earplugs and closed-back headphones, and the natural room acoustics were kept constant - so the only thing that changed was what they saw. That design lets the study attribute any change in the voice to visual-spatial input alone. Sound pressure level (SPL) was recorded with a calibrated head-mounted microphone and analyzed with linear mixed models. The work was funded by a US National Institutes of Health grant, with voice scientist Katherine Verdolini-Abbott as multiple principal investigator.
What they found
Speaker-to-listener distance was the main driver of vocal intensity. Across all three tasks, the farther away the virtual listener appeared, the louder participants spoke - a strong, statistically robust effect (p < .0001 in every task). The effect was larger in the structured tasks (the sustained vowel and the read phrase, which naturally elicit louder, steadier voicing); in spontaneous speech the overall distance effect still held, though the specific 3 m vs 15 m contrast did not reach significance.
Room size was a secondary, moderating cue. Room size on its own produced smaller changes, but it strengthened the distance effect: a significant Room Size x Distance interaction showed that the jump in loudness at the far (15 m) distance was largest in the big room. In the authors’ framing, distance is the behaviorally relevant constraint, and room size is a contextual “gain” factor that turns that constraint up or down.
Because the sound never changed, these vocal adjustments were produced by vision alone.
Why this matters
This study pins down a clean, controllable lever for voice work: the apparent distance of a listener. A clinician can dial that distance up or down in VR to elicit graded changes in vocal projection, with room size available as a secondary cue to amplify the demand - all without leaving the clinic, and all measurable. The precise control the platform offers is what made it possible to isolate one variable at a time, which is not feasible in a real room.
It also adds rigorous, NIH-funded evidence to the broader case for ecologically valid practice in voice and speech rehabilitation: the contexts a person needs their voice in can be recreated well enough to change real vocal behavior, which is the foundation for practicing in those contexts rather than in a stripped-down room.
Limitations
The sample was small (15 participants) and not sex-balanced (12 female, three male), and all were vocally healthy - so the findings do not yet speak to people with voice disorders, where the clinical value would lie. It was a single session, and only objective acoustic measures were collected, with no self-reported vocal effort, comfort, or perceived-distance data to link the SPL changes to experience. Holding the sound constant was essential for isolating vision, but it also removed the multisensory and social complexity of real communication, and the simulated scenes do not capture live, responsive listeners. The study establishes the visual-spatial mechanism; whether distance-cued practice in VR transfers to everyday voice use remains to be tested.
Implications for practice
For voice clinicians using or evaluating immersive VR: this study shows that a single, quantifiable visual parameter - how far away the listener appears - reliably scales vocal intensity, even when the person knows the scene is simulated and even when the sound never changes. That makes virtual speaker-to-listener distance a clean, controllable dial for graded voice-projection practice (for example, building loudness for a far listener, or rehearsing comfortable projection across distances) without leaving the clinic. Room size is best understood as a secondary, context-strengthening cue rather than the main lever. Distance scaled the voice across all three task types, with the largest and most cleanly separated effects in the structured tasks (a sustained vowel and a read phrase) - a practical starting point for graded drills, though the study was not designed to compare tasks. The findings sit comfortably with the social model of communication: the demands that shape voice live in the contexts where voice is used, and rehearsing in those contexts - rather than in a stripped-down clinic room - is what the evidence supports. The work is in vocally healthy adults, so direct testing in people with voice differences is still needed before clinical-efficacy claims.
Implications for research
Replication and extension are needed in: (a) larger, sex-balanced samples powered for individual-differences analysis; (b) voice-disordered populations (e.g., presbyphonia, muscle tension dysphonia, Parkinson's hypophonia), where the clinical payoff would lie; (c) multisession protocols that test learning, retention, and transfer of distance-cued vocal scaling to real-world speaking, since generalization is the central unmet need; (d) designs that link the SPL changes to perceptual and self-report outcomes (vocal effort, comfort, perceived distance) within the same trials; and (e) conditions that reintroduce auditory and social complexity (noise, reverberation, responsive live listeners) to test how the isolated visual effect holds up in ecologically richer scenes.
Where this connects to Therapy withVR
The study above is independent research and does not endorse any product. The notes below are commentary from withVR on how the themes in this research relate to features of Therapy withVR. The research findings are not claims about Therapy withVR.
Avatar Distance Controls
This study found speaker-to-listener distance to be the primary driver of vocal intensity - Therapy withVR lets you move the listening avatar nearer or farther to create the same graded distance cues for voice-projection practice.
Room Situation with Custom Dimensions
The study modeled a small (5 x 4 x 4 m) and a large (20 x 20 x 20 m) room - Therapy withVR's Room situation lets you set width, length, and height to reproduce the spatial context that moderated the distance effect.
Lighting Controls
Adjust overall and per-position lighting to build the varied, controlled visual scenes this study used to isolate visual-spatial influences on the voice.
Cite this study
If you reference this study in your work, the canonical citation formats are:
@article{daden2026,
author = {Daşdöğen Ü and Hitchcock, J. and Ahn, S. and Ng, B. B. and Verdolini-Abbott, K.},
title = {Visual–Spatial Influences on Vocal Intensity: Effects of Speaker-to-Listener Distance and Room Size in Immersive Virtual Reality},
journal = {Journal of Speech, Language, and Hearing Research},
year = {2026},
doi = {10.1044/2026_JSLHR-25-00798},
url = {https://withvr.app/evidence/studies/dasdogen-2026-distance}
} TY - JOUR
AU - Daşdöğen Ü
AU - Hitchcock, J.
AU - Ahn, S.
AU - Ng, B. B.
AU - Verdolini-Abbott, K.
TI - Visual–Spatial Influences on Vocal Intensity: Effects of Speaker-to-Listener Distance and Room Size in Immersive Virtual Reality
JO - Journal of Speech, Language, and Hearing Research
PY - 2026
DO - 10.1044/2026_JSLHR-25-00798
UR - https://withvr.app/evidence/studies/dasdogen-2026-distance
ER - Know of research that should be in this hub? If a relevant peer-reviewed study is not listed here, send the reference to hello@withvr.app. The hub is kept up to date as the literature grows.
Funding & independence
This study was funded by a US National Institutes of Health / NIDCD grant (R21-DC020494, awarded to Ümit Daşdöğen and Katherine Verdolini-Abbott as multiple principal investigators). It used the Room situation in Therapy withVR, and withVR's founder, Gareth Walkom, built the custom virtual environments used in the study. The research is independent of withVR BV - the company did not fund, design, or author it, and the authors declared no competing interests. See the publication for the authors' full disclosure.