Evaluation of Performance of Predictive Classification of Sentinel Events in Public Hospitals in Hong Kong by Large Language Model: A Cross-sectional Retrospective Study

This abstract has open access

Abstract Description

Submission ID :

HAC274

Submission Type

HA Staff

Authors (including presenting author) :

Wong WHR(1), Leung PNC(1), Ho YHS(1)

Affiliation :

(1) Cluster Quality & Safety Office, Hong Kong East Cluster

Introduction :

In response to the latest “Safe Care for All” report of the review committee on the management of the public hospital system highlighting the importance in timeliness of identification of critical incidents, including sentinel events (SEs), and the explosive development of large language models (LLMs) in recent years, this study evaluates the performance of LLM in predictive classification of SEs in public hospitals in Hong Kong.

Objectives :

1) To assess the predictive classification capabilities of LLMs for SEs by comparing the model's classifications with pre-labeled categories from published data from the HA Sentinel & Serious Untoward Events Annual Report.

2) To explore the potential of its clinical utilization in real-world settings

Methodology :

A cross-sectional retrospective analysis was conducted using a dataset with pre-labeled and categorized SEs from 2010 to 2020. The LLM was tested on unseen data from 2021 to 2023 to evaluate its classification accuracy by comparing the true categories of unseen cases against the classifications made by the LLM. Performance metrics including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated. A chi-square test was performed to determine the statistical significance of the LLM's classification accuracy, with the significance level set at p < 0.05.

Result & Outcome :

The LLM demonstrated a sensitivity of 87.5% and specificity of 84.7%. The PPV was 85.1%, while the NPV was 87.1%. The chi-square test yielded a p-value of < 0.05, indicating a statistically significant relationship between the model's classifications and the true categories of SEs.