What did Vona et al. (2023) find?

All five VR scenarios successfully provoked measurable physiological stress responses The job interview scenario was the most stressful for every participant Every detected stuttering moment co-occurred with periods classified as high stress Speech emotion recognition identified fear as the predominant emotion across scenarios Individual emotion profiles varied meaningfully across participants and scenarios

Who participated in this study?

This study involved 5 Male adolescents and young adults who stutter (ages 15-19).

Stuttering Speaking Anxiety

VR public speaking tool tracks stress and emotion in real time

Vona F et al. · 2023 · Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems · Experimental · n = 5 · Male adolescents and young adults who stutter (ages 15-19) · DOI

Evidence certainty: Very low certainty

How this was rated

Small experimental study (n=5) in male adolescents and young adults who stutter. Useful as early evidence; sample size limits conclusions.

Ratings use a simplified four-tier scheme (High, Moderate, Low, Very Low) informed by the GRADE working group. Learn more about how studies are rated.

Researchers developed 'Speak in Public,' combining VR scenarios with wearable biosensors and speech emotion recognition for people who stutter. Testing with five young people showed every stuttering moment coincided with biosensor-identified stress, and emotion profiles varied meaningfully across scenarios.

Clinical bottom line

A five-person experimental study suggesting that male adolescents and young adults who stutter respond measurably to VR speaking situations; sample size is too small for effect claims.

Key findings

All five VR scenarios successfully provoked measurable physiological stress responses
The job interview scenario was the most stressful for every participant
Every detected stuttering moment co-occurred with periods classified as high stress
Speech emotion recognition identified fear as the predominant emotion across scenarios
Individual emotion profiles varied meaningfully across participants and scenarios

Background

People who stutter often experience heightened stress and anxiety in social speaking situations, but the relationship between those internal states and moments of disfluency can be difficult to observe from the outside. Traditional approaches rely heavily on subjective self-reports or clinician observation, both of which can miss important detail. The “Speak in Public” project aimed to build a multimodal system that captures what is happening physiologically and emotionally in real time while a person speaks in a virtual environment.

What the researchers did

The team created a VR application featuring five progressively challenging social scenarios: reading aloud alone, reading to a small group, delivering a presentation, having a conversation, and completing a job interview. Each participant wore a biosensor wristband that continuously recorded electrodermal activity, heart rate, and skin temperature. Simultaneously, a speech emotion recognition module analyzed vocal characteristics to classify the speaker’s emotional state. Five young males who stutter (ages 15 to 19) completed all five scenarios.

What they found

Every scenario produced measurable physiological arousal, confirming the VR environments felt socially meaningful. The job interview consistently generated the highest stress readings across all participants. Critically, each moment of stuttering that occurred during the sessions aligned with a period the biosensors had classified as high stress, reinforcing the link between physiological arousal and disfluency. The emotion recognition system identified fear as the most common emotional state, though individual profiles varied considerably - some participants showed more anger or sadness in specific scenarios, highlighting the personal nature of these experiences.

Why this matters

This study demonstrates that combining VR with wearable sensors and voice analysis can produce a rich, objective, real-time picture of how a person who stutters responds to different social pressures. Rather than relying solely on what someone reports afterward, clinicians could see exactly when stress spikes, which scenarios are most challenging for a given individual, and how emotional responses shift across contexts. This kind of data could support highly personalized planning for graduated exposure work.

Limitations

The sample was very small - just five participants, all male and within a narrow age range - so the findings cannot be generalized broadly. The study demonstrated that the system works technically but did not measure whether using it over time leads to meaningful changes in anxiety or communication confidence. The emotion recognition component, while promising, has known accuracy limitations and may not capture the full complexity of what someone is feeling.

Implications for practice

Fusing physiological stress markers with emotion analytics gives clinicians a richer, more objective picture of individual reactions to social challenges, enabling more personalized support planning.

Editorial notes from withVR

Where this connects to Therapy withVR

The study above is independent research and does not endorse any product. The notes below are commentary from withVR on how the themes in this research relate to features of Therapy withVR. The research findings are not claims about Therapy withVR.

Session Logging

This study integrated biosensors with VR - Therapy withVR automatically logs every session event (sentences spoken, emotions used, timing) for clinician and participant reference.

Graded Exposure Across Environments

Move systematically through increasingly challenging environments as biometric data and clinical observation indicate readiness - similar to the adaptive structure this study explored.

Cite this study

If you reference this study in your work, the canonical citation formats are:

APA 7th

Vona, F., Pentimalli, F., Catania, F., Patti, A., & Garzotto, F. (2023). Speak in Public: an Innovative Tool for the Treatment of Stuttering through Virtual Reality, Biosensors, and Speech Emotion Recognition. Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3544549.3585612.

AMA 11th

Vona F, Pentimalli F, Catania F, Patti A, Garzotto F. Speak in Public: an Innovative Tool for the Treatment of Stuttering through Virtual Reality, Biosensors, and Speech Emotion Recognition. Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 2023. doi:10.1145/3544549.3585612.

BibTeX

@article{vona2023,
  author = {Vona, F. and Pentimalli, F. and Catania, F. and Patti, A. and Garzotto, F.},
  title = {Speak in Public: an Innovative Tool for the Treatment of Stuttering through Virtual Reality, Biosensors, and Speech Emotion Recognition},
  journal = {Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems},
  year = {2023},
  doi = {10.1145/3544549.3585612},
  url = {https://withvr.app/evidence/studies/vona-2023}
}

RIS

TY  - JOUR
AU  - Vona, F.
AU  - Pentimalli, F.
AU  - Catania, F.
AU  - Patti, A.
AU  - Garzotto, F.
TI  - Speak in Public: an Innovative Tool for the Treatment of Stuttering through Virtual Reality, Biosensors, and Speech Emotion Recognition
JO  - Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems
PY  - 2023
DO  - 10.1145/3544549.3585612
UR  - https://withvr.app/evidence/studies/vona-2023
ER  -

Know of research that should be in this hub? If a relevant peer-reviewed study is not listed here, send the reference to hello@withvr.app. The hub is kept up to date as the literature grows.

Funding & independence

Study conducted at Politecnico di Milano (authors Vona, Pentimalli, Catania, Patti, Garzotto); participants recruited with the support of CRC Balbuzie, Rome (specialized stuttering center). Published as Extended Abstracts at CHI 2023, Hamburg, Germany (23-28 April 2023). No withVR BV involvement in funding, study design, or authorship. Summary prepared independently by withVR using the published paper.

Last reviewed: 2026-05-12 Next review due: 2027-05-12 Reviewed by: Gareth Walkom