How many participants are needed for a VR speech therapy study to be useful?

There is no absolute rule, but as a rough guide: five participants is a pilot, fifteen is a small study, fifty starts to be a study whose findings can generalize. The clinically important question is whether the population matches the people you see in clinic. If a study recruited non-clinical university students and you work with adults who stutter, the findings do not necessarily transfer.

What study designs are used in VR speech therapy research?

The main designs are within-subjects (every participant does every condition), between-subjects (different participants in different conditions), pre-post (measured before and after an intervention), and randomized controlled trials. RCTs are the strongest design for causal claims but are rarer in early-stage VR work. The key question to hold: if the intervention had no effect at all, are there other reasons the outcomes might have changed?

What outcome measures should I look for in VR speech therapy studies?

The most convincing studies combine measures: self-report (questionnaires, SUDS, confidence ratings), observed behavior (conversational turns, speaking time), physiological (heart rate, skin conductance), and acoustic (fundamental frequency, intensity). If anxiety goes up on SUDS and heart rate and voice measures shift consistently, that is stronger evidence than any single measure alone. Studies that report only one type of measure tell a partial story.

Why are effect sizes more important than p-values?

A finding can be statistically significant and practically meaningless. Statistical significance depends on sample size: a tiny difference will be statistically significant if the sample is large enough. Effect sizes (Cohen's d, correlation r, partial eta-squared) tell you whether the effect is actually large. Cohen's d of 0.2 is small, 0.5 is medium, 0.8 is large. If a paper reports only p-values without effect sizes, that is a weakness.

What is the difference between a feasibility study and an effect study?

A feasibility study asks: can this be done at all? Will participants tolerate it? Does the equipment work? It does not test whether the intervention works. A feasibility study with five participants showing anxiety decreased across a week tells you a week of practice is feasible; it does not tell you VR caused the change. When you see a small-sample pre-post VR study with favorable results, ask whether it is a pilot pointing toward a bigger study, or whether it is being presented as evidence of effect.

How do I tell whether a VR study generalizes to real-world speaking?

Most VR studies measure responses inside the virtual environment. Fewer measure whether gains transfer to real-world situations. Hold three questions: did the study measure anything outside the VR setting? Were there follow-up measurements after the VR sessions ended? Did participants report changes in their everyday speaking experiences? If none of these are present, the study cannot tell you much about real-world transfer.

Tips for SLPs

How to Read a VR Speech Therapy Study: A Guide for Clinicians

By Gareth Walkom · April 22, 2026 · 8 min read

Speech-language clinicians at a workshop discussing how to evaluate a VR-in-speech-therapy research study.

A paper lands in your inbox. Someone on your team says “look at this VR study, it sounds useful.” You want to know what to make of it before your next session or your next commissioning meeting. Where do you even start?

This is a short guide to reading a VR speech therapy study with a critical eye. Not a research methods course. Not a statistics primer. Just a practical set of questions a speech-language professional can hold in mind to tell the difference between a study that supports a clinical decision and a study that is interesting but not ready to change what you do.

Start with who, not what

Before anything else, read the Participants section. Who was in this study?

How many participants? Five is a pilot. Fifteen is a small study. Fifty is starting to be a study whose findings generalize. Not an absolute rule, but a useful rough guide.
What population? Non-clinical university students? Adults who stutter recruited from a clinic? Children with language differences? The population shapes what the findings can tell you.
Were participants paid, recruited, or volunteers? How were they selected?

If the population in the study is very different from the people you see in clinic, the findings do not necessarily transfer. This is not a criticism of the study. It is a reminder that no single study answers every question, and evidence needs to be matched to the population you care about.

Understand what they actually compared

The next section worth reading is Design. What did the researchers compare?

Within-subjects: every participant did every condition. Good for controlling individual differences. Can be exhausting for participants.
Between-subjects: different participants did different conditions. Needs larger samples. Random assignment is important.
Pre-post: participants measured before and after an intervention. Useful but vulnerable to practice effects, expectation effects, and regression to the mean unless there is a control.
Randomized controlled trial: participants randomly assigned to intervention or control. Strongest design for causal claims, but rarer in early-stage work.

Ask yourself: if the intervention had no effect at all, is there any other reason the outcomes might have changed across conditions? If the answer is “yes, many reasons,” then the design is weak for a causal claim. A good study design rules out most alternatives.

Look at what they measured

The Outcome measures section tells you what the researchers decided counted as evidence. This matters because different measures tell different stories.

Self-report (questionnaires, SUDS ratings, confidence ratings) captures the participant’s experience. High ecological meaning, but sensitive to expectations and demand characteristics.
Observed behavior (conversational turns, speaking time) is closer to objective but still requires interpretation and often relies on human raters.
Physiological (heart rate, skin conductance) is harder to fake but does not always map neatly onto felt experience.
Acoustic (fundamental frequency, intensity, variability) measures voice signal properties directly, independent of self-report.

The most convincing VR validation studies combine measures. If anxiety goes up on SUDS and heart rate and voice measures shift consistently, that is stronger evidence than any single measure alone. Watch for studies that report only one type of measure - they tell a partial story.

Check whether the effect is actually large

A finding can be statistically significant and practically meaningless. This is a hard lesson. It happens because statistical significance depends on sample size: a tiny difference will be statistically significant if the sample is large enough.

What you want is an effect size. Common ones in this literature:

Cohen’s d: roughly, 0.2 is small, 0.5 is medium, 0.8 is large. Tiny d values (< 0.1) mean the effect is barely there even if “significant.”
Correlation r: 0.1 small, 0.3 medium, 0.5 large. Values above 0.7 are striking.
Partial eta squared (η²ₚ): 0.01 small, 0.06 medium, 0.14 large.

If a paper reports only p-values without effect sizes, that is a weakness. If it reports effect sizes, check them. A large p-value with a small effect size can still be clinically uninteresting even if the statistics are legitimate.

Read the limitations section (seriously)

Authors know their own studies’ limitations better than you do. Read what they say. A good limitations section will tell you:

What the sample size limits
What the population limits (who the findings may not apply to)
What the design cannot rule out
What the follow-up period does or does not tell you about long-term effects

If a paper’s limitations section is a single throwaway paragraph, treat the findings cautiously. If the authors have thought carefully about what their study can and cannot tell us, give the paper more weight.

Distinguish feasibility from effect

A lot of early VR research is about feasibility rather than effect. A feasibility study asks: “can this be done at all? Will participants tolerate it? Does the equipment work as intended?” These are legitimate research questions, and the findings can be informative - but they are not evidence that the intervention works.

A feasibility study with five participants showing anxiety decreased across a week tells you that a week of practice is feasible. It does not tell you that VR caused the change. Other things could have - practice effects, expectation, the researcher’s attention, regression to the mean.

When you see a small-sample pre-post VR study with favorable results, ask: “is this a pilot telling me the idea is worth a bigger study, or is this being presented as evidence of effect?” The first is useful. The second would be overclaiming.

Ask about generalization honestly

Most VR studies measure responses inside the virtual environment. Fewer measure whether gains transfer to real-world situations. And yet what clients usually want is change in real life, not in a virtual room.

Questions to hold:

Did the study measure anything outside the VR setting?
Were there follow-up measurements after the VR sessions ended?
Did participants report changes in their everyday speaking experiences?

If none of these are present, the study cannot tell you much about real-world transfer. That is not a flaw - it is a limitation of scope. But it matters when you are deciding what a study supports.

Check who funded the study

The Funding and Conflicts of Interest declarations are worth reading. Independent funding from research councils, universities, or government bodies is different from industry funding or a study conducted by a company on its own product.

Neither kind of funding automatically invalidates a study. But knowing who paid for it and who has a financial stake in its results helps you weigh the findings. A study on virtual audiences funded by a research council carries different weight than a study on a specific VR product conducted by that product’s company.

A short checklist

If a VR speech therapy study comes across your desk, these six questions will get you most of the way:

The 6-question checklist

Reading a VR speech therapy study with a critical eye

Print or save this card. None of these questions requires a statistics background - they ask what the paper itself usually answers in plain language.

None of this requires a statistics background. It requires slowing down and asking the questions authors usually answer in plain language somewhere in the paper.

Continue Reading

Tips for SLPs

Exposure Therapy for Social Anxiety: A Practical Guide to Running Graded Exposure

How exposure therapy for social anxiety actually works, the real-world problems of grading, repeating, and assigning exposure, and where controllable VR practice fits, written for the clinicians who run it.

June 18, 2026 · 11 min read Read More

Tips for SLPs

Carryover in Speech Therapy: Why Skills Don't Reach Real Life, and What Helps

Why hard-won speech therapy gains stall outside the therapy room, and practical, social-model ways to build carryover into the real-world situations a client actually wants to take part in.

June 16, 2026 · 9 min read Read More

Tips for SLPs

Is Therapy withVR a Medical Device? The Honest Answer

Therapy withVR is not a medical device, not FDA-cleared, and not CE-marked - and that is a deliberate, honest choice for a clinician-controlled practice tool, not a gap. Here is why, in plain language.

June 16, 2026 · 7 min read Read More

See the Software for Yourself

Whether you have questions, want to see the software, or are ready to get started - help is always available.

Get in Touch

No obligation - see the software before you commit

How to Read a VR Speech Therapy Study: A Guide for Clinicians

Start with who, not what

Understand what they actually compared

Look at what they measured

Check whether the effect is actually large

Read the limitations section (seriously)

Distinguish feasibility from effect

Ask about generalization honestly

Check who funded the study

A short checklist

Further reading

Continue Reading

Exposure Therapy for Social Anxiety: A Practical Guide to Running Graded Exposure

Carryover in Speech Therapy: Why Skills Don't Reach Real Life, and What Helps

Is Therapy withVR a Medical Device? The Honest Answer

See the Software for Yourself