Programme - Healtex

Wednesday, June 16th 2021

All times are GMT+1/BST.

12:30-14:30	Tutorial 1: MedCAT/CogStack This session will introduce the MedCAT/CogStack framework, which is a stack of software that facilitates user-friendly concept annotation and information extraction from clinical text. The tutorial will start by describing different components of the software, and finish with an application of a text mining model to data. Attendees by the end will have been: Introduced to the MedCAT approach, explored the main components. Trained a model unsupervised on some texts Validated a MedCAT model, collected some supervised training data. Extracted/linked text spans with SNOMED-CT codes. Organised the final output into a table for further analysis. Instructors: Thomas Searle and colleagues from King’s College London.

14:30-15:00	Break

15:00-17:00	Tutorial 2: Exploring Text-derived Patient Phenotype Profiles This tutorial will introduce Komenti, and discuss the construction and use of text-derived patient phenotype profiles. This will consist in learning definitions, building phenotype profiles from clinical text, and using semantic methods to employ these profiles for ranking/classification outcomes, e.g. differential diagnosis. Attendees by the end will: Be able to create patient phenotype profiles from text using Komenti Understand the major concepts behind semantic similarity, as it applies to ranking and outcome classification with patient phenotype profiles Be able to create, view, and evaluate differential diagnoses for patient phenotype profiles from textual phenotype descriptions Have an understanding of how to start to apply these concepts to different problems and contexts Instructors: Luke Slater, Andreas Karwath, John A Williams (University of Birmingham).

Thursday, June 17th 2021

All times are GMT+1/BST.

The zoom chat from the conference is available here.

10:00-10:10	Welcome Angus Roberts, HealTAC 2021 conference chair Ben Gordon, Executive Director HDR UK (Hubs and Data Improvement)

10:10-10:50	PhD forum: session 1 Chair: Anoop Shah S. Pendleton: The patient voice as a form of clinical narrative: expanding ClinicalBERT with patient forum conversation (slides) M. Falis: Towards Better Use of Ontological Structure in the Evaluation of Automated ICD Coding Panelists: Anoop Shah (University College London), Rob Stewart (King’s College London), Nick Cummins (King’s College London)

10:50-11:00	Coffee break

11:00-11:50	Research presentations: session 1 Chair: Paul Rayson Aurelie Mascio, Robert Stewart, Riley Botelle, Marcus Williams, Luwaiza Mirza, Rashmi Patel, Thomas Pollak, Richard Dobson and Angus Roberts: Cognitive impairments in Schizophrenia: a study in a large clinical sample using Natural Language Processing (slides) Laurens Rook, Maria Chiara Mazza, Iulia Lefter and Frances Brazier: Towards Personalized Linguistic Anxiety Recognition (slides) Fahrurrozi Rahman and Juliana Küster Filipe Bowles: Semantic Labelling for Processing Clinical Guidelines Kristof Anetta: Standardizing a Baseline for Mining Health Records in Low-Resourced Languages (A Case Study on Polish)

11:50-12:00	Break

12:00-13:00	Panel: Breaking the deadlock: working towards better access to clinical free text data for research Chair: Elizabeth Ford Better access to unstructured text stored in electronic health records is a priority to enable research questions for patient benefit that rely on such data to be answered. The text analytics community has begun to address key requirements including ongoing engagement with patients, development of a national framework for governance, and safeguards around use of free text for research. Recently, the UK Health Data Research Alliance was established to ensure best practice for ethical use of data in research and a new review into use of health data for research was launched. However, despite this, it remains a major challenge to obtain approvals from data regulators to access unstructured data outside the NHS due to concerns around re-identification of patients. This special session will feature a panel discussion with key stakeholders to identify crucial next steps to break the current deadlock in access to UK clinical free text for research. Panelists: Patient/lay representative: Debbie Keatley, HDR UK Public Advisory Board member and use MY data member NLP researcher: Rebecca Bendayan, MRC/NIHR Research Fellow, King’s College London medConfidential: Phil Booth, Coordinator Research Ethics Committee panel member: Hugh Davies, Chair, South Central – Oxford REC committee Data custodian: Gianpiero Celino, Clinical Director, Cegedim UK group – THIN Database Organisers: Natalie K. Fitzpatrick (University College London, HDR UK), Elizabeth Ford (Brighton and Sussex Medical School), Jean Gallagher (HDR UK Phenomics for People PPI Advisory Group), Lamiece Hassan (University of Manchester), Kerina Jones (Swansea University), Amanda Roberts (HDR UK Phenomics for People PPI Advisory Group, Nottingham Support Group for Carers of Children with Eczema), Anoop D. Shah (UCL,University College London Hospitals NHS Trust), Ayath Ullah (HDR UK Phenomics for People PPI Advisory Group), Colin Wilkinson (HDR UK Phenomics for People PPI Advisory Group)

13:00-13:15	Open community forum and discussions: session 1 Chair: Angus Roberts This is an open slot for colleagues to briefly inform the community about any ongoing or future activities, initiatives, projects, etc. It can be used to invite collaborations, highlight opportunities and challenges, etc. Every speaker will have 3 minutes.

13:15-14:00	Lunch

14:00-14:45	Keynote: Prof Maria Liakata , Queen Mary University, The Alan Turing Institute Opportunities, challenges and progress in longitudinal natural language processing for mental health There has been increasing interest in processing user generated content such as language in social media posts collected over time to make predictions about individuals’ state of mind. I will give an overview of the state-of-the-art in this area, describe the challenges involved and present work in progress on developing sensors for capturing digital biomarkers from language and heterogeneous user generated content to understand the evolution of an individual over time.

14:45-15:00	Coffee break

15:00-16:00	Posters and demos: session 1 Chair: Angus Roberts Lama Alsudias and Paul Rayson: Detecting COVID-19 Misinformation in Arabic Tweets (slides) Thomas Searle, Zina Ibrahim and Richard Dobson: The Effectiveness of Online Learning for Validation and Improvement of NER+L Clinical Models Tim Liu, Mauricio Barahona, Robert Peach, Emma Lawrance, Ariele Noble and Mark Ungless: DeTECT: Deep Text Embeddings for Mental Health Crises using Transformer Models Lumina Wang and Jie Yang: Mining Twitter to investigate symptom characteristics of COVID-19 Madalina Moise, Megna Jani and Goran Nenadic: Using Twitter to Analyse Opioid Use during the Covid-19 Pandemic Zhuoyu Li, Filip Makraduli, Cheng Yeung, Nicholas McQuibban, Casiana Popovici, Shujian Sun, Yan Hu, Thomas Rowlands, Joram Posma and Tim Beck: Auto-CORPus: Automated and Consistent Outputs from Research Publications Heather Davies, David Killick, Gina Pinchbeck and P-J Noble: Expanding adverse drug reaction terminology using Word2Vec Beata Fonferko-Shadrach, Huw Strafford, Cathy White, Arron Lacey and William Owen Pickrell: Using natural language processing to extract features and results from free text electroencephalography (EEG) reports Anna-Grace Linton, Vania Dimitrova, Amy Downing, Richard Wagland and Adam Glaser: Themes in Free Text Comments from Patient Reported Outcomes in the Setting of Chronic Illnesses: Scoping Review Hannah Caulfield: Can NLP be applied to Covid 19 data sets for the autonomous summarisation of scientific literature curated by REDASA? Álvaro Abella, Marc Asenjo, Alejandro Castrelo, Paula Chocron, Flavius Nicu and Gabriel de Maeztu: ixchel: IOMED’s framework to conduct text mining on hospital narratives (demo)

16:00-17:00	Panel: Healthcare Speech Analytics: challenges and opportunities Chair: Nick Cummins Embedded alongside the linguistic content of a speech signal, is a rich array of health information. The analysis of the acoustic and prosodic properties of speech can aid the detection of a range of conditions from muscular disorders through to mental health and neurodegenerative disorders. This panel will discuss the latest research in this growing and fascinating field of speech research. Topics covered will include advantages and challenges associated with acoustic speech analysis in healthcare, and how acoustic and linguistic analysis can complement each other in future research applications. Panelists: Dr Thanasis Tsanas (University of Edinburgh), Prof Mark Huckvale (UCL), Dr Heidi Christensen (University of Sheffield), Dr Simone Graetzer (University of Salford).

17:00-18:00	Virtual reception with prize giving

Friday, June 18th 2021

All times are GMT+1/BST.

The zoom chat from the conference is available here.

10:00-10:15	Introduction to Day 2

10:15-11:15	PhD forum: session 2 Chair: Beatrice Alex AG Linton: The Role of Free Text Comments in Adding Value to Patient-Reported Outcome Measures of Cancer Patients J. Chaturvedi: Combining empirical and knowledge-based methods for clinical modelling of electronic health record text D. Harvey: The Language of Risk-Taking in Bipolar Disorder Panelists: Beatrice Alex (University of Edinburgh), Aurélie Névéol (Université Paris-Saclay), Rob Stewart (King’s College London)

11:15-11:30	Break

11:30-12:15	Research presentations: session 2 Chair: Nick Cummins Jaya Chaturvedi, Aurelie Mascio, Sumithra Velupillai and Angus Roberts: Development of a Lexicon for Pain (slides) Tanjeb Tawhid, Philipp Cimiano and Matthias Hartung: Intensity Prediction over Health-related Quality-of-Life Variables Extracted from Self-reported Patient Narratives Luke Slater, Andreas Karwath, Robert Hoehndorf and Georgios Gkoutos: Effects of Negation and Uncertainty Stratification on Text-derived Patient Profile Similarity

12:15-12:30	Break

12:30-13:00	Research presentations: session 3 Chair: Fabio Rinaldi Dmitri Roussinov, Andrew Conkie, Andrew Patterson and Christopher Sainsbury: Predicting Clinical Events Based on Raw Text: from Bag of Words to Attention-Based Transformers (slides) Siyue Song, Tianhua Chen and Grigoris Antoniou: Sentiment classification and feature extraction by using multi-sense word embedding models Malek Djelassi, Li Yan and Aron Lagerberg: Ontology Driven Bootstrapping of Classification Models in Electronic Patient-Authored Text

13:00-13:15	Open community forum and discussions: session 2 Chair: Goran Nenadic This is an open slot for colleagues to briefly inform the community about any ongoing or future activities, initiatives, projects, etc. It can be used to invite collaborations, highlight opportunities and challenges, etc. Every speaker will have 3 minutes.

13:15-14:00	Lunch
14:00-14:45	Keynote: Dr Aurélie Névéol, Université Paris-Saclay, CNRS, LIMSI Responsible NLP in the making: contributions from ethics and reproducibility This presentation will explore responsible science principles as they are currently applied in the field of biomedical Natural Language Processing. I will show how concerns for ethics and reproducibility have been contributing to the quality and vitality of research in our field. As a community, it is important to leverage this experience and to continue supporting responsible NLP research. (slides).

14:45-15:00	Break

15:00-16:00	Posters and demos: session 2 Chair: Angus Roberts Maryam Abdollahyan, Carol Dezateux, Louise Jones and Claude Chelala: Natural Language Processing on Breast Cancer Imaging Reports for Modelling Prognosis of Recurrence Minghui Li, Jie Yang, Yanhui Liao and Ling Wang: Public Mental Health Monitoring on Twitter During the COVID-19 Micheal Abaho, Danushka Bollegala, Paula Williamson and Susanna Dodd: Detecting health outcomes from medical text records Marc Asenjo, Paula Chocron, Álvaro Abella, Flavius Nicu, Alejandro Castrelo and Gabriel de Maeztu: Multilingual transfer learning for medical entity contexts Ghada Alfattni, Niels Peek and Goran Nenadic: Integrating Structured and Unstructured Sources for Temporal Representation of Patients’ Medication Histories Paula Chocron, Marc Asenjo, Álvaro Abella, Flavius Nicu, Alejandro Castrelo and Gabriel de Maeztu: A Bert-based approach to data augmentation for medical NER Ewart J Sheldon, Anthony Shek, Mohammed Al-Agil, Vlad Dinu, Sophie E Maxey, Clodagh H McGuire, Phil Davidson and James Th Teo: Using Bidirectional Encoder Representations from Transformers (BERT) to remove personal data from healthcare records Hang Dong, Victor Suarez-Paniagua, Huayu Zhang, Minhong Wang, Emma Whitfield and Honghan Wu: Free texts vs. ICD codes for rare disease cohort identification: a case study of MIMIC-III discharge summaries (slides) Ivo Fins, Lucy Bunker, Alan Radford, Alex German and Peter-John Noble: Tackling canine obesity: development of a regular expression-based tool for uncovering overweight and obese canine patients from veterinary clinical narratives – a pilot study Álvaro Samuel Dobbie, Huw Strafford, Owen Pickrell, Beata Fonferko-Shadrach, Carys Jones, Ashley Akbari, Simon Thompson and Arron Lacey: Markup: A Web-based annotation tool powered by Active Learning (demo)

16:00-17:00	The 5^th Healtex Industry Forum Chair: Arron Lacey, Swansea University This is the 5^th industry forum that will discuss the successes and challenges of healthcare text analytics in practice, aiming to understand how the community can work together to support efficient translation of research into deployable systems that can support clinical practice and research. Panelists: Swapnil Gadgil (Therapy Box), David Milward (Linguamatics), Dennis Kehoe (AIMES), Azad Dehghan (Deep Cognito) and Álvaro Abella Bascarán (IOMED).

17:00-17:15	Final remarks and close

Keynote speakers

Dr Aurélie Névéol, Université Paris-Saclay & CNRS

Aurélie is a CNRS Researcher at LISN (formerly, LIMSI) working on clinical and biomedical Natural Language Processing. Her research interests include information extraction and knowledge representation in specialized domains. Her research addresses both methods and applications of biomedical text analysis, ranging from explorations of representation models and their cross-language or cross-domain adaptability, to the integration of representation frameworks to extract new medical knowledge from clinical text.

Prof. Maria Liakata, Queen Mary University & The Alan Turing Institute

Maria Liakata is a Turing AI fellow and Professor in Natural Language Processing (NLP) at the School of Electronic Engineering and Computer Science, Queen Mary University of London and the Department of Computer Science, University of Warwick. At the Turing she founded and co-leads the NLP and data science for mental health special interest groups and supervises PhD students. Maria is in receipt of a five year EPSRC/UKRI Turing AI Fellowship. Her fellowship is on Creating time sensitive sensors from user-generated language and heterogeneous content, and involves developing new methods for NLP and multi-modal data to allow the creation of longitudinal personalized language monitoring. She is also the co-PI of projects on Mobile Sensing of Altered EveryDay Function in Early Alzheimer’s Disease (MEDEA), “Language sensing for dementia monitoring & diagnosis”, “Opinion summarization from social media, PANACEA: An AI-enabled evidence-driven framework for claim veracity assessment during pandemics. She leads a team of 5 postdocs and 7 PhD students.

Oral presentations

- Jaya Chaturvedi, Aurelie Mascio, Sumithra Velupillai and Angus Roberts: Development of a Lexicon for Pain
- Aurelie Mascio, Robert Stewart, Riley Botelle, Marcus Williams, Luwaiza Mirza, Rashmi Patel, Thomas Pollak, Richard Dobson and Angus Roberts: Cognitive impairments in Schizophrenia: a study in a large clinical sample using Natural Language Processing
- Tanjeb Tawhid, Philipp Cimiano and Matthias Hartung: Intensity Prediction over Health-related Quality-of-Life Variables Extracted from Self-reported Patient Narratives
- Luke Slater, Andreas Karwath, Robert Hoehndorf and Georgios Gkoutos: Effects of Negation and Uncertainty Stratification on Text-derived Patient Profile Similarity
- Laurens Rook, Maria Chiara Mazza, Iulia Lefter and Frances Brazier: Towards Personalized Linguistic Anxiety Recognition
- Siyue Song, Tianhua Chen and Grigoris Antoniou: Sentiment classification and feature extraction by using multi-sense word embedding models
- Fahrurrozi Rahman and Juliana Küster Filipe Bowles: Semantic Labelling for Processing Clinical Guidelines
- Malek Djelassi, Li Yan and Aron Lagerberg: Ontology Driven Bootstrapping of Classification Models in Electronic Patient-Authored Text
- Kristof Anetta: Standardizing a Baseline for Mining Health Records in Low-Resourced Languages (A Case Study on Polish)
- Dmitri Roussinov, Andrew Conkie, Andrew Patterson and Christopher Sainsbury: Predicting Clinical Events Based On Raw Text: From Bag of Words to Attention-Based Transformers

Posters

- Thursday, June 17th 2021
  - Lama Alsudias and Paul Rayson: Detecting COVID-19 Misinformation in Arabic Tweets
  - Thomas Searle, Zina Ibrahim and Richard Dobson: The Effectiveness of Online Learning for Validation and Improvement of NER+L Clinical Models
  - Tim Liu, Mauricio Barahona, Robert Peach, Emma Lawrance, Ariele Noble and Mark Ungless: DeTECT: Deep Text Embeddings for Mental Health Crises using Transformer Models
  - Lumina Wang and Jie Yang: Mining Twitter to investigate symptom characteristics of COVID-19
  - Madalina Moise, Megna Jani and Goran Nenadic: Using Twitter to Analyse Opioid Use during the Covid-19 Pandemic
  - Zhuoyu Li, Filip Makraduli, Cheng Yeung, Nicholas McQuibban, Casiana Popovici, Shujian Sun, Yan Hu, Thomas Rowlands, Joram Posma and Tim Beck: Auto-CORPus: Automated and Consistent Outputs from Research Publications
  - Heather Davies, David Killick, Gina Pinchbeck and P-J Noble: Expanding adverse drug reaction terminology using Word2Vec
  - Beata Fonferko-Shadrach, Huw Strafford, Cathy White, Arron Lacey and William Owen Pickrell: Using natural language processing to extract features and results from free text electroencephalography (EEG) reports
  - Anna-Grace Linton, Vania Dimitrova, Amy Downing, Richard Wagland and Adam Glaser: Themes in Free Text Comments from Patient Reported Outcomes in the Setting of Chronic Illnesses: Scoping Review
  - Hannah Caulfield: Can NLP be applied to Covid 19 data sets for the autonomous summarisation of scientific literature curated by REDASA?
- Friday, June 18th 2021
  - Maryam Abdollahyan, Carol Dezateux, Louise Jones and Claude Chelala: Natural Language Processing on Breast Cancer Imaging Reports for Modelling Prognosis of Recurrence
  - Minghui Li, Jie Yang, Yanhui Liao and Ling Wang: Public Mental Health Monitoring on Twitter During the COVID-19
  - Micheal Abaho, Danushka Bollegala, Paula Williamson and Susanna Dodd: Detecting health outcomes from medical text records (poster)
  - Marc Asenjo, Paula Chocron, Álvaro Abella, Flavius Nicu, Alejandro Castrelo and Gabriel de Maeztu: Multilingual transfer learning for medical entity contexts
  - Ghada Alfattni, Niels Peek and Goran Nenadic: Integrating Structured and Unstructured Sources for Temporal Representation of Patients’ Medication Histories
  - Paula Chocron, Marc Asenjo, Álvaro Abella, Flavius Nicu, Alejandro Castrelo and Gabriel de Maeztu: A Bert-based approach to data augmentation for medical NER
  - Ewart J Sheldon, Anthony Shek, Mohammed Al-Agil, Vlad Dinu, Sophie E Maxey, Clodagh H McGuire, Phil Davidson and James TH Teo: Using Bidirectional Encoder Representations from Transformers (BERT) to remove personal data from healthcare records (slides)
  - Hang Dong, Victor Suarez-Paniagua, Huayu Zhang, Minhong Wang, Emma Whitfield and Honghan Wu: Free texts vs. ICD codes for rare disease cohort identification: a case study of MIMIC-III discharge summaries
  - Ivo Fins, Lucy Bunker, Alan Radford, Alex German and Peter-John Noble: Tackling canine obesity: development of a regular expression-based tool for uncovering overweight and obese canine patients from veterinary clinical narratives – a pilot study

Demos

Thursday, June 17th 2021
- Álvaro Abella, Marc Asenjo, Alejandro Castrelo, Paula Chocron, Flavius Nicu and Gabriel de Maeztu: ixchel: IOMED’s framework to conduct text mining on hospital narratives
- Stuart Gough: LanguageExplorer from TherapyBox
Friday, June 18th 2021
- Álvaro Samuel Dobbie, Huw Strafford, Owen Pickrell, Beata Fonferko-Shadrach, Carys Jones, Ashley Akbari, Simon Thompson and Arron Lacey: Markup: A Web-based annotation tool powered by Active Learning

PhD forum

S. Pendleton: The patient voice as a form of clinical narrative: expanding ClinicalBERT with patient forum conversation
M. Falis: Towards Better Use of Ontological Structure in the Evaluation of Automated ICD Coding
AG Linton: The Role of Free Text Comments in Adding Value to Patient-Reported Outcome Measures of Cancer Patients
J. Chaturvedi: Combining empirical and knowledge-based methods for clinical modelling of electronic health record text
D. Harvey: The Language of Risk-Taking in Bipolar Disorder