The inter-rater reliability of stroboscopy evaluations

Professorship/Faculty: TRAc - Projects  
Authors: Nawka, Tadeus; Konerding, Uwe
Title of the Journal: Journal of voice : official journal of the Voice Foundation
ISSN: 0892-1997
Corporate Body: Elsevier
Year of publication: 2012
Volume: 26
Issue: 6
Pages / Size: 471 - 483
Language(s): English
Licence: German Act on Copyright 
DOI: 10.1016/j.jvoice.2011.09.009
Document Type: Article

To investigate the interrater reliability of stroboscopy evaluations assessed using Poburka's Stroboscopy Evaluation Rating Form (SERF).

Single-factor experiment with repeated measures on the same element.

Evaluations of nine experts pertaining to 68 stroboscopy recordings and 16 SERF variables were analyzed. For the 14 SERF variables possessing interval scale level, interrater reliability was investigated using the intraclass correlations for absolute agreement (ICC-a) and consistency (ICC-c). ICCs-c were computed for both original values and values standardized with respect to raters' means and standard deviations (ipsative values). For the two nominally scaled SERF variables, "vertical level" and "glottal closure" interrater reliability was investigated using kappa coefficients.

For evaluations of single raters, ICCs-a ranged from 0.32 to 0.71, ICCs-c for original values from 0.41 to 0.72, and ICCs-c for ipsative values from 0.43 to 0.72. For mean evaluations of two raters, the corresponding values were 0.48 to 0.83 for ICCs-a, 0.58 to 0.84 for ICCs-c for original values, and 0.60 to 0.84 for ICCs-c for ipsative values. The interval scale variables with the lowest interrater reliabilities were phase closure, phase symmetry, and regularity. The kappa coefficients for vertical level and glottal closure were 0.15 and 0.38, respectively.

The interrater reliabilities for vertical level, glottal closure, phase closure, phase symmetry, and regularity are so low that these variables should not be assessed via stroboscopy. For the remaining variables, adequate reliability can be obtained by aggregating evaluations from at least two raters.
Keywords: Stroboscopy–Laryngeal, examination–Voice, diagnostics–Interrater, reliability–Intraclass correlation
Release Date: 17. July 2014