Wednesday, June 14th 2023
14:00-17:00 Pre-conference workshop:
Annotation of clinical NLP tools in HDR UK Gateway
Day 1: Thursday, June 15th 2023
10:00-10:30 Registration
10:30-10:50 Welcome

  • Goran Nenadic, HealTAC 2023 conference chair
  • Niels Peek, Director of the Pankhurst Institute for Healthcare Research and Innovation
10:50-11:50 PhD forum session
11:50-12:00 Break
12:00-13:00 PPIE panel: Co-production of clinical NLP applications
13:00-13:15 Open community forum and discussions: session 1
13:15-14:00 Lunch
14:00-14:45 Keynote: Dr Angus Roberts (King’s College London).
From regular expressions to pre-trained language models – 14 years of applying NLP at the Maudsley Biomedical Research Centre
14:45-15:00 Coffee break
15:00-16:00 Posters and demos: session 1
16:00-17:00 Panel: Annotation guidelines: from clinical needs to textual annotations
17:00-18:00 Birds of feather meetings: session
18:30-22:00 Drinks reception (18:30) and conference dinner (from 19:00) in The Hyatt Hotel
Day 2: Friday, June 16th 2023
09:15-09:20 Introduction to Day 2
09:20-10:50 Papers and presentations: session
10:50-11:00 Break
11:00-12:00 Posters and demos: session 2
12:00-13:00 Panel: Towards evaluation guidelines for clinical NLP applications
13:00-13:15 Open community forum and discussions: session 2
13:15-14:00 Lunch
14:00-14:45 Keynote: Dr Yonghui Wu (University of Florida)
Opportunities and Challenges of Conversational Artificial Intelligence and Large Language Models in Healthcare
14:45-15:00 Break
15:00-16:00 Industry forum: Translational opportunities and challenges of healthcare generative models
16:00-16:15 Final remarks and close



The keynotes this year will naturally focus on the impact and promises of large healthcare language models. We will hear from two experts that are involved in large centres that work with clinical free-text data in the UK and the US.

Dr Angus Roberts, King's College London

From regular expressions to pre-trained language models – 14 years of applying NLP at the Maudsley Biomedical Research Centre

Abstract of the talk
Bio of the speaker
Dr Yonghui Wu, University of Florida

Opportunities and Challenges of Conversational Artificial Intelligence and Large Language Models in Healthcare

Abstract of the talk
Bio of the speaker


Panels and Forums

Four panels will discuss the main challenges in processing healthcare free-text:

  • PPIE forum: Co-production of clinical NLP applications with patients and public

    This forum will discuss the engagement of patients and members of public in co-production during the entire clinical NLP application lifecycle: from conception and development, to evaluation and deployment. It will focus on putting this approach into practice, identifying the opportunities and challenges. The panel will be moderated by Dr Liz Ford (Brighton and Sussex Medical School), in collaboration with the Co-production Collective.

  • Industry forum: Translational opportunities and challenges of healthcare generative models

    Industry panel will discuss the opportunities and challenges of generative language models in healthcare, and the roles that industry, academia and NHS would play when these models are trained, evaluated and used to support various activities. How would these models change the landscape of healthcare text analytics? What new challenges such models bring to industry involvement? How will we ensure privacy and fairness, in particular if the models are not tuned on local data? The forum will be moderated by Dr Dan Schofield (NHS England), with panelists from industry, academia, governance and patient communities.

  • Annotation guidelines: from clinical needs to textual annotations

    The panel will discuss the practice of producing effective clinical annotation guidelines that both capture clinical intention and provide usable recipes for textual annotation. What are common principles and steps in developing clinical annotation guidelines? How can we re-purpose and customise annotation schemas? What lessons have we learnt so far? How can we make clinical annotation guidelines FAIR (Findabile, Accessibile, Interoperable, and Reusable)? The panel will be moderated by Prof Rob Stewart, King’s College London.

  • Towards evaluation guidelines for clinical NLP applications

    This panel will focus on exploring how clinical NLP applications should be evaluated in terms of their fitness for purpose: what steps and metrics are needed in the process of evaluation of NLP software for use with real-world data? How to estimate the quality and utility of the software: is comparison to a gold standard enough? If so, what is the right sample size? If not, what else needs to be evaluated? How to evaluate scalability and architectural fit? How the evaluation outcomes are to be communicated to stakeholders? The panel will be moderated by Prof James Teo, King’s College London.


PhD forum

  • Matúš Falis. Can ChatGPT Generate and Code Discharge Summaries?
  • Nastazja Laskowski. Data Transparency and Anonymization when Sharing Clinical Study Reports: An exploration of Natural Language Processing and Statistical Disclosure Control
  • Ratchakrit Arreerard. Feasibility of Emotions as Features for Suicide Ideation Detection in Social Media


Research presentations

  • Daphne Chopard, Padraig Corcoran and Irena Spasic. Word Sense Disambiguation of Acronyms in Clinical Narratives
  • Arlene Casey, Emma Davidson, Claire Grover, Richard Tobin, Andreas Grivas, Huayu Zhang, Patrick Schrempf, Alison Q. O’Neil, Liam Lee, Michael Walsh, Frey Pellie, Karen Ferguson, Vera Cvero, Honghan Wu, Heather Whalley, Grant Mair, William Whiteley and Beatrice Alex. Understanding performance and reliability of NLP tools: A comparison of four NLP tools predicting stroke phenotypes in radiology reports
  • Ghada Alfattni, Niels Peek, Anthony Wilson and Goran Nenadic. Integrating Patients’ Medication Histories from Structured and Unstructured Data
  • Emma Davidson, Arlene Casey, Claire Grover, Beatrice Alex, Honghan Wu, Archie Campbell, Fionna Chalmers, Mark Adams, Matthew Iveson, Andrew M Macintosh, Emily Ball, Kristiina Rannikmae, Heather Whalley and William Whiteley. The epidemiological characteristics of stroke phenotypes defined with ICD-10 and free-text: a cohort study linked to electronic health records
  • Siqu Long, Shuang Zhang, Feiqi Cao, Josiah Poon and Soyeon Han. Suicide-NLU: Suicidality Detection by using Joint Intent Classification and Slot Filling
  • Anna-Grace Linton, Vania Dimitrova, Amy Downing, Richard Wagland and Adam Glaser. Weakly Supervised Text Classification on Free Text Comments in Patient-Reported Outcome Measures



  • James Brandreth, Jennifer Jiang and Anoop Shah. MiADE (Medical information AI Data Extractor): Natural language processing at the point of care
  • Maksim Belousov, Vladislav Yotkov. Can Digital Humans transform healthcare? (DEMO from Re:course)



  • Jaya Chaturvedi, Diana Shamsutdinova, Felix Zimmer, Sumithra Velupillai, Daniel Stahl, Robert Stewart and Angus Roberts. Sample Size in Natural Language Processing within Healthcare Research
  • Sophie Gibbons, Panagiota Kontari, Simon Pillinger, Elizabeth Ford and Ben Fell. Patient and Public Involvement Co-Development of a Route to Record-Level Data Access to Akrivia Health’s Secondary Healthcare Dataset
  • Arooj Hussain, Haifa Alrdahi, Hendrik Šuvalov, Lifeng Han, Goran Nenadic, Nenadic, Will Dixon and Meghna Jani. M3: Manchester Medication Mining – Extracting medication and related attributes from outpatient letters
  • Malik Ahmed. Combining Rule-Based Techniques and GPT-4 for Clinical Drug Information Extraction from SmPC Documents: A Natural Language Processing Approach to Developing Accessible and Up-to-Date Drug Databases
  • Thanh-Tung Nguyen, Viktor Schlegel, Abhinav Ramesh Kashyap, Hao Li and Stefan Winkler. Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification

  • Ceyda Uysal, Gloria Roque, Tarso Franarin, Luke Brydon, Sophie Gibbons and Benjamin Fell. Contextual Classification of Substance Use in Electronic Health Records
  • Areej Alhassan, Viktor Schlegel, Monira Aloud, Riza Batista-Navarro and Goran Nenadic. Towards Recognising Discontinuous Named Entities in Clinical Text Using a Seq2Seq Prompt-guided Model
  • Yang Cui, Lifeng Han and Goran Nenadic. Prompt-based Temporal Classification of Treatment Events from Discharge Summaries
  • Oli Delgaram-Nejad, Dawn Archer and Gerasimos Chatzidamianos. Developing the 4SC: a Small, Specialised, Spoken, Schizophrenia corpus
  • Matthew Coole, Paul Rayson, Paul Marshall and Fiona Lobban. iPOF: Improving Peer Online Forums Project Progress
  • Bernadeta Griciūtė, Lifeng Han, Hao Li and Goran Nenadic. Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method
  • Zhen Zhu. Insights From Diabetes-Related Food Reviews on an E-Commerce Platform: A Text Analytics Approach
  • Ivo Fins, Heather Davies, Sean Farrell and Peter-John Noble. Can ChatGPT help to tackle canine obesity? – Uncovering Body Condition Score measurements and overweight companion animals described in veterinary clinical narratives using ChatGPT. Performance comparison with a regular expression-based tool
  • Sean Farrell, Charlotte Appleton, Peter-John Mäntylä Noble and Noura Al Moubayed. PetBERT: Automated ICD-11 Syndromic Disease Coding for Outbreak Detection in First Opinion Veterinary Electronic Health Records
  • Shuai Niu and Xian Yang. Enhancing Clinical Decision Making with Interpretable Evidence for Personalized Disease Risk Prediction
  • Tomas Goldsack, Zhihao Zhang, Chenghua Li and Carolina Scarton. Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature
  • F. Dalla Serra, G. Jacenków, F. Deligianni, J. Dalton, A Q. O’Neil. Improving Image Representations via MoCo Pre-Training for Multimodal CXR Classification
  • Reem Bin-Hezam and Mark Stevenson. Improving Stopping in Technology Assisted Reviews

  • Xu Wang, Edward Meinert, Andrea Preston and Shang-Ming Zhou. Identification of Influential Factors in Bladder Cancer: A Co-Designed Study by Utilizing Epidemiology and Machine Learning Framework on Large Electronic Health Records Cohort
  • Antanas Kascenas, Nicolas Pugeault and Alison Q. O’Neil. Denoising Autoencoders for Unsupervised Anomaly Detection in Brain MRI


Pre-conference events (June 14)

  • Workshop: Annotation of clinical NLP tools in HDR UK Gateway

    This session will discuss the requirements and implementation of meta-data to describe clinical NLP tools and resources in the HDR UK Gateway, as well as how the Gateway search capabilities can be improved using NLP. In the first part of the workshop, we will discuss the requirements and plans for the Gateway. In the second part, there will be two hands-on streams: one to discuss description of clinical NLP tools, and the other to demonstrate indexing of the existing Gateway data. The workshop is co-organised with HDR UK. If you are interested in taking part in this workshop, please contact us on


– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –