Objective: This study combined a learning outcomes-based checklist and salient characteristics derived from wisdom-of-crowds theory to test whether differing groups of judges (diversity maximized versus expertise maximized) would be able to appropriately assess videotaped, manikin-based simulation scenarios.
Methods: Two groups of 3 judges scored 9 videos of interns managing a simulated cardiac event. The first group had a diverse range of knowledge of simulation procedures, while the second group was more homogeneous in their knowledge and had greater simulation expertise. All judges viewed 3 types of videos (predebriefing, postdebriefing, and 6 month follow-up) in a blinded fashion and provided their scores independently. Intraclass correlation coefficients (ICCs) were used to assess the reliability of judges as related to group membership. Scores from each group of judges were averaged to determine the impact of group on scores.
Results: Results revealed strong ICCs for both groups of judges (diverse, 0.89; expert, 0.97), with the diverse group of judges having a much wider 95% confidence interval for the ICC. Analysis of variance of the average checklist scores indicated no significant difference between the 2 groups of judges for any of the types of videotapes assessed (F = 0.72, p = .4094). There was, however, a statistically significant difference between the types of videos (F = 14.39, p = .0004), with higher scores at the postdebrief and 6-month follow-up time periods.
Conclusions: Results obtained in this study provide optimism for assessment procedures in simulation using learning outcomes-based checklists and a small panel of judges.
This abstract is reproduced with the permission of the publisher. Click on the above link for free full text.