Programme - Healtex

Wednesday, June 14th 2023

14:00-17:00

Pre-conference workshop:
Annotation of clinical NLP tools in HDR UK Gateway

Day 1: Thursday, June 15th 2023

10:00-10:30	Registration
10:30-10:50	Welcome Goran Nenadic, HealTAC 2023 conference chair Niels Peek, Director of the Pankhurst Institute for Healthcare Research and Innovation

10:50-11:50	PhD forum session Chairs: Riza Batista-Navarro (University of Manchester) and Denis Newman-Griffis (University of Sheffield) Matúš Falis. Can ChatGPT Generate and Code Discharge Summaries? Nastazja Laskowski. Data Transparency and Anonymization when Sharing Clinical Study Reports: An exploration of Natural Language Processing and Statistical Disclosure Control Ratchakrit Arreerard. Feasibility of Emotions as Features for Suicide Ideation Detection in Social Media Panel: Angus Roberts (University of Sheffield), Yonghui Wu (University of Florida), Sarah Markham (patient representative), Anoop Shah (University College London, UCLH)

11:50-12:00	Break

12:00-13:00	PPIE panel: Co-production of clinical NLP applications Chair: Dr Liz Ford (Brighton and Sussex Medical School) This panel will discuss how patients and members of public can be involved in co-production during the entire clinical NLP application lifecycle: from conception of the idea to the deployment and monitoring of its efficacy and impact. The panellists include: Niccola Hutchinson-Pascal: Niccola is part of Co-Production Collective, a co-produced community working to support the authentic co-production (not faux-production!) of research, service and policy development. Everyone is welcome. Together we learn, connect, and champion co-production for lasting change. Niccola has worked for a wide variety of organisations across health, well-being and physical activity, from charities, to government related bodies, to large agencies. All of these roles have had a focus on culture change and involved her working closely with the public, patients and local community members. She is passionate about co-production, about all parties communicating on a level playing field, sharing power and decision making and about ensuring organisations are aware of the value gained from this way of working. Debbie Keatley: Debbie is a cancer survivor who believes in the patient’s right to be an equal partner in treatment and care decisions and in the meaningful involvement of the public and patients in research. Impatient for data resources held to be used to benefit patients, her ambition is for people across the UK to be able to access and understand their own health records and to make informed choices about how their data is used in research, with confidence that regulatory institutions will respect and uphold those decisions. Involved with many organisations who recognise data’s importance in research to improve outcomes for patients, including use MY data, CRUK’s Data Advisory Board and Clinical Research Committee, the office of the National Data Guardian and Northern Ireland Cancer Registry and a former member of HDRUK’s Public Advisory Board, she is actively involved in steering groups of clinical studies across Northern Ireland, the UK and Europe and in the work of Ireland’s All-Island Cancer Research Institute. Sophie Gibbons: Sophie is a Research Scientist at Akrivia Health, has a core role in empowering Akrivia’s NHS partners to unlock insights more easily from their existing data. Additional to this, Sophie works on: ensuring the research utility and transparency of AI approaches to deriving structured data from the free-text portion of electronic health records (namely, the application and communication of Akrivia’s Natural Language Processing); co-development of approaches to mental health data access with Akrivia’s in-house Patient and Public Involvement (PPI) team; and expansion of Akrivia’s research network such that more high quality research can leverage Akrivia’s valuable data asset. Sophie will talk on Patient involvement in developing steps to access mental health medical data.

13:00-13:15	Open community forum and discussions: session 1 Chair: Goran Nenadic This is an open slot for colleagues to briefly inform the community about any ongoing or future activities, initiatives, projects, etc. It can be used to invite collaborations, highlight opportunities and challenges, etc. Every speaker will have 3 minutes.

13:15-14:00	Lunch

14:00-14:45	Keynote: Dr Angus Roberts (King’s College London). From regular expressions to pre-trained language models – 14 years of applying NLP at the Maudsley Biomedical Research Centre The Maudsley Biomedical Research Centre has been developing and applying natural language processing methods to identify a variety of clinical variables in clinical free-text notes. This talk will review the NLP developments, from dictionary and rule-based approaches to training, fine-tuning and evaluation of pre-trained language models. Chair: Honghan Wu

14:45-15:00	Coffee break

15:00-16:00	Posters and demos: session 1 James Brandreth, Jennifer Jiang and Anoop Shah. MiADE (Medical information AI Data Extractor): Natural language processing at the point of care (DEMO) Jaya Chaturvedi, Diana Shamsutdinova, Felix Zimmer, Sumithra Velupillai, Daniel Stahl, Robert Stewart and Angus Roberts. Sample Size in Natural Language Processing within Healthcare Research Sophie Gibbons, Panagiota Kontari, Simon Pillinger, Elizabeth Ford and Ben Fell. Patient and Public Involvement Co-Development of a Route to Record-Level Data Access to Akrivia Health’s Secondary Healthcare Dataset Bernadeta Griciūtė, Lifeng Han, Hao Li and Goran Nenadic. Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method Malik Ahmed. Combining Rule-Based Techniques and GPT-4 for Clinical Drug Information Extraction from SmPC Documents: A Natural Language Processing Approach to Developing Accessible and Up-to-Date Drug Databases Areej Alhassan, Viktor Schlegel, Monira Aloud, Riza Batista-Navarro and Goran Nenadic. Towards Recognising Discontinuous Named Entities in Clinical Text Using a Seq2Seq Prompt-guided Model Sean Farrell, Charlotte Appleton, Peter-John Mäntylä Noble and Noura Al Moubayed. PetBERT: Automated ICD-11 Syndromic Disease Coding for Outbreak Detection in First Opinion Veterinary Electronic Health Records Matthew Coole, Paul Rayson, Paul Marshall and Fiona Lobban. iPOF: Improving Peer Online Forums Project Progress Tomas Goldsack, Zhihao Zhang, Chenghua Lin and Carolina Scarton. Making Science Simple: Corpora for the Lay Summarisation of Biomedical Literature F. Dalla Serra, G. Jacenków, F. Deligianni, J. Dalton and A Q. O’Neil. Improving Image Representations via MoCo Pre-Training for Multimodal CXR Classification Reem Bin-Hezam and Mark Stevenson. Improving Stopping in Technology Assisted Reviews

16:00-17:00	Panel: Annotation guidelines: from clinical needs to textual annotations Chair: Rob Stewart (KCL) The panel will discuss the practice of producing effective clinical annotation guidelines that both capture clinical intention and provide usable recipes for textual annotation. What are common principles and steps in developing clinical annotation guidelines? What lessons have we learnt so far? How can we make clinical annotation guidelines FAIR (Findabile, Accessibile, Interoperable, and Reusable)? Panel: Ben Fell (Akrivia Health), Warren Del-Pinto (University of Manchester), Eulàlia Farré-Maduell (Barcelona Supercomputing Center), Imane Guellil (University of Edinburgh)

17:00-18:00	Birds of feather meetings: session Space will be available for colleagues to self-organise and run birds-of-feather or specific project meetings. The following groups will meet: Standards for data modelling and representation (OMOP, FHIR) Towards guidelines for clinical annotation guidelines PPIE Clinical NLP governance

18:30-22:00	Drinks reception (18:30) and conference dinner (from 19:00) in The Hyatt Hotel

Day 2: Friday, June 16th 2023

09:15-09:20	Introduction to Day 2

09:20-10:50	Papers and presentations: session Chair: Honghan Wu Daphne Chopard, Padraig Corcoran and Irena Spasic. Word Sense Disambiguation of Acronyms in Clinical Narratives Arlene Casey, Emma Davidson, Claire Grover, Richard Tobin, Andreas Grivas, Huayu Zhang, Patrick Schrempf, Alison Q. O’Neil, Liam Lee, Michael Walsh, Frey Pellie, Karen Ferguson, Vera Cvero, Honghan Wu, Heather Whalley, Grant Mair, William Whiteley and Beatrice Alex. Understanding performance and reliability of NLP tools: A comparison of four NLP tools predicting stroke phenotypes in radiology reports Emma Davidson, Arlene Casey, Claire Grover, Beatrice Alex, Honghan Wu, Archie Campbell, Fionna Chalmers, Mark Adams, Matthew Iveson, Andrew M Macintosh, Emily Ball, Kristiina Rannikmae, Heather Whalley and William Whiteley. The epidemiological characteristics of stroke phenotypes defined with ICD-10 and free-text: a cohort study linked to electronic health records Anna-Grace Linton, Vania Dimitrova, Amy Downing, Richard Wagland and Adam Glaser. Weakly Supervised Text Classification on Free Text Comments in Patient-Reported Outcome Measures Ghada Alfattni, Niels Peek, Anthony Wilson and Goran Nenadic. Integrating Patients’ Medication Histories from Structured and Unstructured Data Siqu Long, Shuang Zhang, Feiqi Cao, Josiah Poon and Soyeon Han. Suicide-NLU: Suicidality Detection by using Joint Intent Classification and Slot Filling

10:50-11:00	Break

11:00-12:00	Posters and demos: session 2 Maksim Belousov, Vladislav Yotkov. Can Digital Humans transform healthcare? (DEMO from Re:course) Thanh-Tung Nguyen, Viktor Schlegel, Abhinav Ramesh Kashyap, Hao Li and Stefan Winkler. Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification Ceyda Uysal, Gloria Roque, Tarso Franarin, Luke Brydon, Sophie Gibbons and Benjamin Fell. Contextual Classification of Substance Use in Electronic Health Records Yang Cui, Lifeng Han and Goran Nenadic. Prompt-based Temporal Classification of Treatment Events from Discharge Summaries Oli Delgaram-Nejad, Dawn Archer and Gerasimos Chatzidamianos. Developing the 4SC: a Small, Specialised, Spoken, Schizophrenia corpus Arooj Hussain, Haifa Alrdahi, Hendrik Šuvalov, Lifeng Han, Goran Nenadic, Nenadic, Will Dixon and Meghna Jani. M3: Extracting medication and related attributes from outpatient letters Zhen Zhu. Insights From Diabetes-Related Food Reviews on an E-Commerce Platform: A Text Analytics Approach Ivo Fins, Heather Davies, Sean Farrell and Peter-John Noble. Can ChatGPT help to tackle canine obesity? – Uncovering Body Condition Score measurements and overweight companion animals described in veterinary clinical narratives using ChatGPT. Performance comparison with a regular expression-based tool Shuai Niu and Xian Yang. Enhancing Clinical Decision Making with Interpretable Evidence for Personalized Disease Risk Prediction Xu Wang, Edward Meinert, Andrea Preston and Shang-Ming Zhou. Identification of Influential Factors in Bladder Cancer: A Co-Designed Study by Utilizing Epidemiology and Machine Learning Framework on Large Electronic Health Records Cohort Antanas Kascenas, Nicolas Pugeault, Alison Q. O’Neil. Denoising Autoencoders for Unsupervised Anomaly Detection in Brain MRI


12:00-13:00	Panel: Towards evaluation guidelines for clinical NLP applications Chair: Prof James Teo (KCL) This panel will focus on exploring how clinical NLP applications should be evaluated in terms of their fitness for purpose: what steps and metrics are needed in the process of evaluation of NLP software for use with real-world data? How to estimate the quality and utility of the software: is comparison to a gold standard enough? If so, what is the right sample size? If not, what else needs to be evaluated? How to evaluate scalability and architectural fit? How the evaluation outcomes are to be communicated to stakeholders? Panel: Will Dixon (University of Manchester), Darren Lunn (Medicines and Healthcare products Regulatory Agency), Rob Brisk (Eolas Medical), Arlene Casey (University of Edinburgh)

13:00-13:15	Open community forum and discussions: session 2 Chair: Goran Nenadic This is an open slot for colleagues to briefly inform the community about any ongoing or future activities, initiatives, projects, etc. It can be used to invite collaborations, highlight opportunities and challenges, etc. Every speaker will have 3 minutes.

13:15-14:00	Lunch
14:00-14:45	Keynote: Dr Yonghui Wu (University of Florida) Opportunities and Challenges of Conversational Artificial Intelligence and Large Language Models in Healthcare The conversational artificial intelligence (AI) from large language models (LLMs) such as ChatGPT has surprised the world with good abilities not only in communicating with humans but also in generating good quality textual content such as presentations, emails, articles, and even computer source codes, approaching human-like language processing. People are curious about its potential utility for healthcare. LLMs bring substantial opportunities and challenges not only to clinical natural language processing but also to artificial intelligence (AI) in electronic health record (EHR) systems and healthcare. Based on the experience in developing and applying clinical LLMs, this keynote talk will overview the recent progress of LLMs, identify opportunities in using LLMs for intelligent electronic health record systems and better healthcare, and examine potential risks, bias, and disparities of this disruptive AI technology. Chair: Richard Dobson

14:45-15:00	Break

15:00-16:00	Industry forum: Translational opportunities and challenges of healthcare generative models Chair: Dr Dan Schofield (NHS England) Industry panel will discuss the opportunities and challenges of generative language models in healthcare, and the roles that industry, academia and NHS would play when these models are trained, evaluated and used to support various activities. How would these models change the landscape of healthcare text analytics? What new challenges such models bring to industry involvement? How will we ensure privacy and fairness, in particular if the models are not tuned on local data? The panellists will be from industry, academia, governance and patient communities. Panel: Yonghui Wu (University of Florida), Debbie Keatley (useMyData), Ben Fell (Akrivia), Maksim Belousov (Re:course AI), Richard Dobson (King’s College London)

16:00-16:15	Final remarks and close

Keynotes

The keynotes this year will naturally focus on the impact and promises of large healthcare language models. We will hear from two experts that are involved in large centres that work with clinical free-text data in the UK and the US.

Dr Angus Roberts, King's College London

From regular expressions to pre-trained language models – 14 years of applying NLP at the Maudsley Biomedical Research Centre

Abstract of the talk
Bio of the speaker

Dr Yonghui Wu, University of Florida

Opportunities and Challenges of Conversational Artificial Intelligence and Large Language Models in Healthcare

Abstract of the talk
Bio of the speaker

Panels and Forums

Four panels will discuss the main challenges in processing healthcare free-text:

PPIE forum: Co-production of clinical NLP applications with patients and public
This forum will discuss the engagement of patients and members of public in co-production during the entire clinical NLP application lifecycle: from conception and development, to evaluation and deployment. It will focus on putting this approach into practice, identifying the opportunities and challenges. The panel will be moderated by Dr Liz Ford (Brighton and Sussex Medical School), in collaboration with the Co-production Collective.
Industry forum: Translational opportunities and challenges of healthcare generative models
Industry panel will discuss the opportunities and challenges of generative language models in healthcare, and the roles that industry, academia and NHS would play when these models are trained, evaluated and used to support various activities. How would these models change the landscape of healthcare text analytics? What new challenges such models bring to industry involvement? How will we ensure privacy and fairness, in particular if the models are not tuned on local data? The forum will be moderated by Dr Dan Schofield (NHS England), with panelists from industry, academia, governance and patient communities.
Annotation guidelines: from clinical needs to textual annotations
The panel will discuss the practice of producing effective clinical annotation guidelines that both capture clinical intention and provide usable recipes for textual annotation. What are common principles and steps in developing clinical annotation guidelines? How can we re-purpose and customise annotation schemas? What lessons have we learnt so far? How can we make clinical annotation guidelines FAIR (Findabile, Accessibile, Interoperable, and Reusable)? The panel will be moderated by Prof Rob Stewart, King’s College London.
Towards evaluation guidelines for clinical NLP applications
This panel will focus on exploring how clinical NLP applications should be evaluated in terms of their fitness for purpose: what steps and metrics are needed in the process of evaluation of NLP software for use with real-world data? How to estimate the quality and utility of the software: is comparison to a gold standard enough? If so, what is the right sample size? If not, what else needs to be evaluated? How to evaluate scalability and architectural fit? How the evaluation outcomes are to be communicated to stakeholders? The panel will be moderated by Prof James Teo, King’s College London.

PhD forum

Matúš Falis. Can ChatGPT Generate and Code Discharge Summaries?
Nastazja Laskowski. Data Transparency and Anonymization when Sharing Clinical Study Reports: An exploration of Natural Language Processing and Statistical Disclosure Control
Ratchakrit Arreerard. Feasibility of Emotions as Features for Suicide Ideation Detection in Social Media

Research presentations

Daphne Chopard, Padraig Corcoran and Irena Spasic. Word Sense Disambiguation of Acronyms in Clinical Narratives
Arlene Casey, Emma Davidson, Claire Grover, Richard Tobin, Andreas Grivas, Huayu Zhang, Patrick Schrempf, Alison Q. O’Neil, Liam Lee, Michael Walsh, Frey Pellie, Karen Ferguson, Vera Cvero, Honghan Wu, Heather Whalley, Grant Mair, William Whiteley and Beatrice Alex. Understanding performance and reliability of NLP tools: A comparison of four NLP tools predicting stroke phenotypes in radiology reports
Ghada Alfattni, Niels Peek, Anthony Wilson and Goran Nenadic. Integrating Patients’ Medication Histories from Structured and Unstructured Data
Emma Davidson, Arlene Casey, Claire Grover, Beatrice Alex, Honghan Wu, Archie Campbell, Fionna Chalmers, Mark Adams, Matthew Iveson, Andrew M Macintosh, Emily Ball, Kristiina Rannikmae, Heather Whalley and William Whiteley. The epidemiological characteristics of stroke phenotypes defined with ICD-10 and free-text: a cohort study linked to electronic health records
Siqu Long, Shuang Zhang, Feiqi Cao, Josiah Poon and Soyeon Han. Suicide-NLU: Suicidality Detection by using Joint Intent Classification and Slot Filling
Anna-Grace Linton, Vania Dimitrova, Amy Downing, Richard Wagland and Adam Glaser. Weakly Supervised Text Classification on Free Text Comments in Patient-Reported Outcome Measures

Demos

James Brandreth, Jennifer Jiang and Anoop Shah. MiADE (Medical information AI Data Extractor): Natural language processing at the point of care
Maksim Belousov, Vladislav Yotkov. Can Digital Humans transform healthcare? (DEMO from Re:course)

Posters

Jaya Chaturvedi, Diana Shamsutdinova, Felix Zimmer, Sumithra Velupillai, Daniel Stahl, Robert Stewart and Angus Roberts. Sample Size in Natural Language Processing within Healthcare Research
Sophie Gibbons, Panagiota Kontari, Simon Pillinger, Elizabeth Ford and Ben Fell. Patient and Public Involvement Co-Development of a Route to Record-Level Data Access to Akrivia Health’s Secondary Healthcare Dataset
Arooj Hussain, Haifa Alrdahi, Hendrik Šuvalov, Lifeng Han, Goran Nenadic, Nenadic, Will Dixon and Meghna Jani. M3: Manchester Medication Mining – Extracting medication and related attributes from outpatient letters
Malik Ahmed. Combining Rule-Based Techniques and GPT-4 for Clinical Drug Information Extraction from SmPC Documents: A Natural Language Processing Approach to Developing Accessible and Up-to-Date Drug Databases
Thanh-Tung Nguyen, Viktor Schlegel, Abhinav Ramesh Kashyap, Hao Li and Stefan Winkler. Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification
Ceyda Uysal, Gloria Roque, Tarso Franarin, Luke Brydon, Sophie Gibbons and Benjamin Fell. Contextual Classification of Substance Use in Electronic Health Records
Areej Alhassan, Viktor Schlegel, Monira Aloud, Riza Batista-Navarro and Goran Nenadic. Towards Recognising Discontinuous Named Entities in Clinical Text Using a Seq2Seq Prompt-guided Model
Yang Cui, Lifeng Han and Goran Nenadic. Prompt-based Temporal Classification of Treatment Events from Discharge Summaries
Oli Delgaram-Nejad, Dawn Archer and Gerasimos Chatzidamianos. Developing the 4SC: a Small, Specialised, Spoken, Schizophrenia corpus
Matthew Coole, Paul Rayson, Paul Marshall and Fiona Lobban. iPOF: Improving Peer Online Forums Project Progress
Bernadeta Griciūtė, Lifeng Han, Hao Li and Goran Nenadic. Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method
Zhen Zhu. Insights From Diabetes-Related Food Reviews on an E-Commerce Platform: A Text Analytics Approach
Ivo Fins, Heather Davies, Sean Farrell and Peter-John Noble. Can ChatGPT help to tackle canine obesity? – Uncovering Body Condition Score measurements and overweight companion animals described in veterinary clinical narratives using ChatGPT. Performance comparison with a regular expression-based tool
Sean Farrell, Charlotte Appleton, Peter-John Mäntylä Noble and Noura Al Moubayed. PetBERT: Automated ICD-11 Syndromic Disease Coding for Outbreak Detection in First Opinion Veterinary Electronic Health Records
Shuai Niu and Xian Yang. Enhancing Clinical Decision Making with Interpretable Evidence for Personalized Disease Risk Prediction
Tomas Goldsack, Zhihao Zhang, Chenghua Li and Carolina Scarton. Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature
F. Dalla Serra, G. Jacenków, F. Deligianni, J. Dalton, A Q. O’Neil. Improving Image Representations via MoCo Pre-Training for Multimodal CXR Classification
Reem Bin-Hezam and Mark Stevenson. Improving Stopping in Technology Assisted Reviews
Xu Wang, Edward Meinert, Andrea Preston and Shang-Ming Zhou. Identification of Influential Factors in Bladder Cancer: A Co-Designed Study by Utilizing Epidemiology and Machine Learning Framework on Large Electronic Health Records Cohort
Antanas Kascenas, Nicolas Pugeault and Alison Q. O’Neil. Denoising Autoencoders for Unsupervised Anomaly Detection in Brain MRI

Pre-conference events (June 14)

Workshop: Annotation of clinical NLP tools in HDR UK Gateway
This session will discuss the requirements and implementation of meta-data to describe clinical NLP tools and resources in the HDR UK Gateway, as well as how the Gateway search capabilities can be improved using NLP. In the first part of the workshop, we will discuss the requirements and plans for the Gateway. In the second part, there will be two hands-on streams: one to discuss description of clinical NLP tools, and the other to demonstrate indexing of the existing Gateway data. The workshop is co-organised with HDR UK. If you are interested in taking part in this workshop, please contact us on contact@healtex.org.

– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –