Albert Atserias, Anuj Dawar, et al.
Journal of the ACM
Clinical databases are essential for clinical and translational research. Traditionally, curating a clinical database involves manually collecting data from free text notes within the electronic medical record (EMR), but this process is time-consuming and error prone. Recently, Large Language Models (LLMs) such as OpenAI's ChatGPT and Google's Gemini have demonstrated impressive semantic understanding of free text, and could be used to automate the free text data extraction tasks that once could only be done using human experts and trainees. Unfortunately, these free text notes often contain protected health information, and moreover embody a valuable asset, leading health systems to restrict their transfer to entities like the third party AI providers mentioned above. The goal of this study is to evaluate the feasibility of avoiding data transfer by using an open source AI model to generate a clinical database of kidney cancer patients from free text radiology, pathology, and operative notes.
Albert Atserias, Anuj Dawar, et al.
Journal of the ACM
Imran Nasim, Melanie Weber
SCML 2024
Saeel Sandeep Nachane, Ojas Gramopadhye, et al.
EMNLP 2024
Giuseppe Romano, Aakrati Jain, et al.
ECTC 2025