What did Bartyzel et al. (2025) find?

Engineering + user-reception study published in Computers & Graphics special section on XRIOS 2024 Speech-controlled VR system: virtual characters respond DYNAMICALLY to the speaker's voice parameters (pitch, timbre, speech rate) in real time Speech recordings corpus: 529 utterances given during presentations by 15 students Voice parameters extracted using speech processing methods: pitch, timbre, speech rate - then mapped to real-time animation control of virtual characters Six expert annotators evaluated stress levels present in each presentation - mixed-methods feature for stress-modulated character response Polish-British international collaboration: AGH University of Science and Technology (Krakow), SWPS University (Warsaw), Polish Academy of Science (Krakow), Kielce University of Technology, University of Cambridge Emerging VR affordance illustrated: virtual characters that respond DYNAMICALLY to speaker behavior - moving beyond pre-recorded audience animations toward true responsive social-VR systems

Who participated in this study?

This study involved 15 Non-clinical adult speakers for speech-corpus methodology.

Speech-controlled VR system for voice and public-speaking training, evaluated for user reception with 15 students

Bartyzel P et al. · 2025 · Computers & Graphics · Experimental · n = 15 · Non-clinical adult speakers for speech-corpus methodology · DOI

Evidence certainty: Low certainty

How this was rated

Engineering / user-reception study with a 15-participant speech corpus and 6 expert annotators. Peer-reviewed in Computers & Graphics (Elsevier, special section on XRIOS 2024). The paper's contribution is system design and user-reception evaluation, not clinical efficacy. Limitations: not a clinical trial; speech corpus is small for generalizability to clinical populations; voice parameters extracted are technical-engineering features (pitch, timbre, rate) rather than clinically validated voice-handicap measures.

Ratings use a simplified four-tier scheme (High, Moderate, Low, Very Low) informed by the GRADE working group. Learn more about how studies are rated.

An engineering and user-reception study published in Computers & Graphics special section on XRIOS 2024. Polish-British collaboration (AGH Krakow, SWPS Warsaw, Polish Academy of Science, Kielce University of Technology, University of Cambridge). The system is built on a speech recordings corpus of 529 utterances during presentations by 15 students. Voice parameters extracted: pitch, timbre, speech rate. Six expert annotators evaluated stress levels per presentation. The multi-parameter analysis selects features for real-time animation of virtual characters responding dynamically to speech changes. The contribution is design and user-reception evaluation rather than clinical efficacy.

Clinical bottom line

An engineering / user-reception study of a speech-controlled VR system for voice and public-speaking training. The contribution is design methodology (speech corpus, parameter extraction, real-time animation control) rather than clinical evidence. For voice clinicians and researchers, this paper illustrates an emerging affordance in VR: virtual characters that respond DYNAMICALLY to the speaker's voice parameters in real time. Not appropriate as clinical efficacy citation; useful as methodology and design reference for next-generation VR voice-training systems.

Key findings

Engineering + user-reception study published in Computers & Graphics special section on XRIOS 2024
Speech-controlled VR system: virtual characters respond DYNAMICALLY to the speaker's voice parameters (pitch, timbre, speech rate) in real time
Speech recordings corpus: 529 utterances given during presentations by 15 students
Voice parameters extracted using speech processing methods: pitch, timbre, speech rate - then mapped to real-time animation control of virtual characters
Six expert annotators evaluated stress levels present in each presentation - mixed-methods feature for stress-modulated character response
Polish-British international collaboration: AGH University of Science and Technology (Krakow), SWPS University (Warsaw), Polish Academy of Science (Krakow), Kielce University of Technology, University of Cambridge
Emerging VR affordance illustrated: virtual characters that respond DYNAMICALLY to speaker behavior - moving beyond pre-recorded audience animations toward true responsive social-VR systems

Background

Most VR voice and public-speaking training systems use pre-recorded audience animations - the virtual audience does not respond to what the speaker actually says or how they say it. Real-time, speech-controlled virtual characters that respond to the speaker’s voice parameters and stress levels are a next-generation design direction. By 2024-2025, the engineering pipeline for this was maturing.

What they did and found

A speech-controlled VR system was built on a corpus of 529 presentation utterances from 15 students. Voice parameters (pitch, timbre, speech rate) were extracted using speech processing methods. Six expert annotators evaluated stress levels. The multi-parameter analysis selected features for real-time animation control of virtual characters that respond dynamically to speech changes. User-reception evaluation followed.

Why it matters

For voice clinicians and SLP researchers, this paper illustrates the engineering trajectory toward responsive virtual characters in VR voice-training contexts. Methodology and design reference for next-generation clinical VR systems.

Limitations

Not a clinical trial. Small speech corpus. Engineering-feature voice parameters rather than clinically validated voice-handicap measures.

Implications for practice

For voice clinicians and SLP researchers, this paper illustrates the engineering trajectory toward VR systems with virtual characters that respond DYNAMICALLY to the speaker's voice and stress parameters. This is a meaningful design direction for next-generation VR voice and public-speaking training systems - moving beyond static or pre-recorded virtual audiences toward responsive social-VR contexts. Not appropriate as clinical efficacy evidence; use as methodology reference for clinical-engineering collaboration. For Therapy withVR product design, the speech-parameter-to-character-animation pipeline is a relevant emerging affordance.

Cite this study

If you reference this study in your work, the canonical citation formats are:

APA 7th

Bartyzel, P., Igras-Cybulska, M., Hekiert, D., Majdak, M., Łukawski, G., Bohné, T., & Tadeja, S. (2025). Exploring user reception of speech-controlled virtual reality environment for voice and public speaking training. Computers & Graphics. https://doi.org/10.1016/j.cag.2024.104104.

AMA 11th

Bartyzel P, Igras-Cybulska M, Hekiert D, Majdak M, Łukawski G, Bohné T, Tadeja S. Exploring user reception of speech-controlled virtual reality environment for voice and public speaking training. Computers & Graphics. 2025. doi:10.1016/j.cag.2024.104104.

BibTeX

@article{bartyzel2025,
  author = {Bartyzel, P. and Igras-Cybulska, M. and Hekiert, D. and Majdak, M. and Łukawski, G. and Bohné, T. and Tadeja, S.},
  title = {Exploring user reception of speech-controlled virtual reality environment for voice and public speaking training},
  journal = {Computers & Graphics},
  year = {2025},
  doi = {10.1016/j.cag.2024.104104},
  url = {https://withvr.app/evidence/studies/bartyzel-2025}
}

RIS

TY  - JOUR
AU  - Bartyzel, P.
AU  - Igras-Cybulska, M.
AU  - Hekiert, D.
AU  - Majdak, M.
AU  - Łukawski, G.
AU  - Bohné, T.
AU  - Tadeja, S.
TI  - Exploring user reception of speech-controlled virtual reality environment for voice and public speaking training
JO  - Computers & Graphics
PY  - 2025
DO  - 10.1016/j.cag.2024.104104
UR  - https://withvr.app/evidence/studies/bartyzel-2025
ER  -

Know of research that should be in this hub? If a relevant peer-reviewed study is not listed here, send the reference to hello@withvr.app. The hub is kept up to date as the literature grows.

Funding & independence

Affiliations: AGH University Krakow, SWPS University Warsaw, Polish Academy of Science, Kielce University of Technology, University of Cambridge. Funding sources reported in published article. Peer-reviewed in Computers & Graphics (Elsevier). No withVR BV involvement.

Last reviewed: 2026-05-17 Next review due: 2027-05-17 Reviewed by: Gareth Walkom

Speech-controlled VR system for voice and public-speaking training, evaluated for user reception with 15 students

Key findings

Background

What they did and found

Why it matters

Limitations

Implications for practice

Related Studies

External attentional focus in VR promotes more flexible speech movement in adults who stutter

Within-subjects study (n=31): auditory, visual, and audiovisual VR room cues all shift vocal loudness, effort, and output

Virtual room size and listener distance influence how people use their voice

Seven-year case study co-designing a VR kitchen for speech-language pathology and aging-in-place

VR-based meditation reduced anxiety before voice therapy in a small exploratory RCT, with lower attrition in the VR arm

VR-based speaking practice increases willingness to communicate in gender-affirming voice training

Cite this study

Funding & independence