OBJECTIVE: The purpose of this project was to determine if G-Theory could be successfully applied to a high-stakes, licensure OSCE as part of its normal administrative procedures and whether the analysis could yield useful information with regard to sources of variance.
METHODS: The anonymous data received from the Canadian Chiropractic Examining Board for its June 2005 Clinical Skills Examination were analyzed with G-Theory. Variance components were estimated with SPSS 11.5 as partially nested data. The data included 182 candidates, 43 raters, 40 standardized patient actors, and 18 individual cases. The examinations were administered twice per day for two days, with 4 parallel tracks of 10 stations each. Raters filled in checklists in each station using three ratings scales: fourteen to sixteen 3-point rating scales (0=not performed, 1=performed but not correctly, 2=performed correctly), a 5-point rating scale on professionalism, and a 10-point rating scale on overall technique. Both the professionalism scale and overall technique scale were anchored at the borderline pass and borderline fail levels.
RESULTS: Both days of testing resulted in high measures of internal consistency (Alpha day 1=0.86, day 2=0.91). Internal consistency measures for each individual station averaged Alpha=0.68 for day 1 and 0.74 for day 2. Generalizability coefficients for the day 1 stations averaged 0.63 and the generalizability coefficient for the day 1 examination was 0.65. Generalizability coefficients for the day 2 stations averaged 0.74 and the generalizability coefficient for the day 2 examination was 0.42. On day 1, the raters contributed 7% of the variance of candidate measures, and the standardized patients contributed 1%. On day 2, the raters contributed 8% of the variance of candidate measure, and the standardized patient measures could not be estimated.
DISCUSSION: The application of G-Theory in the naturalistic environment can contribute to the understanding of sources of variance, and provide direction for the improvement of individual stations. D-Studies can be used to determine the effect on reliability of using multiple raters in a room. The size of the rater variance in a station may also indicate the need for increased training in that station or the need to make the scoring checklist more clear and definitive. G-theory, however, must be cautiously applied, and requires careful selection of the floating raters and vigorous training of the raters in each station.
This abstract is reproduced with the permission of the publisher.