HDR UK National Text Analytics Workshop (March 2021)

On 12 March 2021, more than 85 people from across the UK text analytics community came together to discuss challenges of accessing unstructured text for research, new opportunities, and see how existing text analytics and NLP data extraction tools could be used in their research.

The event was organised by by the HDR UK National Text Analytics Project, funded by Health Data Research UK (HDR UK) and led by Prof Richard Dobson and Dr Angus Roberts, which is helping to build the UK’s natural language processing (NLP) community for healthcare by making available shared tools, methods and datasets across the NHS, creating richer, more useful clinical information to improve healthcare.

The full report from the event, including the list of speakers and presented tools, and discussion summaries and next steps, is available here.


Post-HealTAC 2020 publications (January 2021)

Following the HealTAC 2020 conference, an open call for contribution to the Research Topic on “Healthcare Text Analytics: Unlocking the Evidence from Free Text” is now available in Frontiers in Digital Health. It includes the following papers:

  • Walsh et al: “Spontaneously generated online patient experience of Modafinil: A qualitative and NLP analysis”
  • Ford et al: “Toward an Ethical Framework for the Text Mining of Social Media for Health Research: A Systematic Review”
  • Karystianis: “Utilizing text mining, data linkage and deep learning in police and health records to predict future offences in family and domestic violence”
  • Ford et al: “The potential of research drawing on clinical free text to bring benefits to patients in the United Kingdom: A systematic review of the literature”
  • Sarker et al: “A Light-Weight Text Summarization System for Fast Access to Medical Evidence”
  • Zhang et al: “Applying Artificial Intelligence Methods for the Estimation of Disease Incidence: The Utility of Language Models”
  • Newman-Griffis, Eric Fosler-Lussier: “Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health”



Hands-on MedCAT/CogStack training course (September 2020)

On 10 and 11 September 2020, University College London and King’s College London jointly ran an introduction to MedCAT and CogStack training course for UK data scientists. The event was initiated through the discussions between Healtex and the Health Data Research UK (HDR UK). The event was supported by funding from HDR UK, the Maudsley Biomedical Research Centre and the Maudsley Charity.

The Medical Concept Annotation Toolkit (MedCAT) can be used to extract information from electronic health records (EHR) and link it to biomedical ontologies like SNOMED-CT and UMLS. CogStack is an information retrieval and extraction platform that implements open-source enterprise search, natural language processing, analytics, and visualisation technologies to unlock EHR and assist in clinical decision-making and research. Eighteen people attended the training which covered an introduction on how to use CogStack and MedCAT to organise, structure and analyse EHR, how to set up a CogStack and MedCAT environment, configure pipelines and carry out simple data exploration and presentation, and a number of use cases for MedCAT and Cogtack were demonstrated. The training will be run again in the Spring 2021 and details will be advertised on the IHI website.



New publications Healtex (June 2020)

Here are some new publications from the Healtex feasibility studies and working groups:

  • Ive J, Viani N, Kam J, et al. Generation and evaluation of artificial mental health records for Natural Language Processing. NPJ Digit Med. 2020;3:69. doi:10.1038/s41746-020-0267-x (link)
  • Spasic I, Nenadic G: Clinical Text Data in Machine Learning: Systematic Review. JMIR Med Inform. 2020;8(3):e17984. doi:10.2196/17984 (link)
  • Ford E, Oswald M, Hassan L, Bozentko K, Nenadic G, Cassell J: Should free-text data in electronic medical records be shared for research? A citizens’ jury study in the UK. Journal of Medical Ethics 2020;46:367-377 (link)
  • Jones KH, Ford EM, Lea N, Griffiths L, Hassan L, Heys S, Squires E, Nenadic G: Towards the development of data governance standards for using clinical free-text data in health research: a position paper. Journal of Medical Internet Research. 23/03/2020:16760, DOI: 10.2196/16760 (link)



Healtex newsletter (December 2019)

A new Healtex newsletter features the following stories:

  • HealTAC 2020
  • HDR UK – National Text Analytics Project
  • Healtex early career research network – supporting future leaders
  • Call for dissemination and outreach events (31st January 2020)
  • New Healtex feasibility studies
  • Past events



Manchester Digital Epidemiology Summer School (July 2018)

Please note that this event is now postponed to November 2020.

The Centre for Epidemiology Versus Arthritis, one of the Healtex partners, is hosting the Manchester Digital Epidemiology Summer School, which is an exciting 3-day course where participants will learn all about how to capture and use digital health data to support high-quality epidemiological research.

The programme covers opportunities, challenges and methods across a variety of data types: from data in electronic health records (including free-text data) to data collected through smartphones and wearable devices. An internationally renowned and multidisciplinary faculty will deliver interactive seminars on cutting-edge methods for collecting and analysing digital health data in the context of epidemiological studies. They will use real-life examples to demonstrate these methods in action.

The summer school is intended for clinical epidemiologists, health informaticians, health data scientists, and clinicians with an interest in epidemiology and/or data science. Delegates from academic, industry and other backgrounds are welcome.

More details and registration here.


Healtex newsletter (April 2019)

A new Healtex newsletter features the following stories:

  • HealTAC 2019
  • Call for dissemination and outreach events (15th May 2019)
  • New Healtex feasibility studies
  • Past events
  • Healtex early career research network – supporting future leaders



Un-tapping key patient information from free-text data (April 2019)

In a new article published in Open Access Government, Goran Nenadic argues for using patient information stored in routinely collected healthcare free-text data.


Sharing your free-text healthcare data safely (March 2019)

Clinical information written in patients’ medical notes contains a wealth of information which could be unlocked for research to improve health and wellbeing. Researchers are often refused access to this written text because of concerns about privacy breaches. Healtex, the research network for healthcare text analytics, are seeking to draw up a national framework for data governance and safeguards for the acceptable use of medical free-text data from patient records within research for public benefit.

This workshop focuses on gaining feedback from the public on a range of safeguards which could be put in place to enable your medical data to be shared for research, whilst keeping your identity and privacy safe.

The workshop is co-organised by The Alan Turing Institute and Healtex in London on March 28th, 2019. Applications and details are available here.



Citizens’ jury on using free-text healthcare data for research (April 2019)

HealtexBrighton and Sussex Medical School and Citizens’ Juries c.i.c. organised a three-day citizens’ jury in June 2018 to understand whether, and under what conditions, the public would accept medical free text data being used for research. The citizens’ jury with 18 people, comprising a cross-section of the public, have learnt from expert witnesses over three days about structured and unstructured health data, and the challenges, needs and opportunities in processing free text data. The jury deliberated together, exploring the complexities and potential trade-offs between privacy and the public good, and reached reasoned conclusions about whether and under what circumstances use of free-text data for medical research can be justified. This was the first study to ask the public what they think about de-identified clinical narrative being used for secondary purposes.

The outcomes of the jury are available here and include a short and full reports, video and a brief leaflet.



Working group: Text Mining Radiology Reports (March 2019)

At the beginning of March, we held a Healtex working group meeting for Text Mining Radiology Reports in the Bayes Centre in Edinburgh. This was an excellent opportunity to bring together teams with a shared focus on improving health services through a better understanding of radiology reports. All teams brought different insights and thoughts to the table about challenges they face. We discussed several topics, including standardising reports and their annotation, linking annotations to ontologies, e.g. UMLS or SNOMED CT, comparing rule-based, machine learning and deep learning methods, data governance protocols, working with systems on locked down infrastructure and availability of different tools and knowledge resources. More details are available here.


Processing veterinary electronic health record – the SAVSNET project (February 2019)

Our partners from SAVSNET and HeRC have released a short video explaining how they use veterinary electronic health records for monitoring disease trends over time, and provide data resources for academics and others to improve health of our pets.



Working group: NLP for mental health research (January 2019)

The first meeting of the Working group on Natural Language Processing (NLP) for Mental Health Research was held in London on Jan. 15th 2019. In this meeting, the group discussed ongoing projects in this area and identified challenges and potential collaboration points for advancing the field. In the UK, there are several research groups working in this area, and we hope for this working group to become a place for sharing experiences and enabling further cross-institutional collaborations. More details are available here.



Healtex article (December 2018)

A new article summarising the needs for processing free-text data in healthcare and aims of Healtex is out.



Healtex newsletter (November 2018)

A new Healtex newsletter features the following stories:

  • HealTAC 2019 – keynote speakers and call for contributions
  • Call for dissemination and outreach events (14th January 2019)
  • New Healtex feasibility studies
  • Past events
  • Healtex early career research network – supporting future leaders



New feasibility studies (November 2018)

We are very pleased to announce successful proposals from the second round of feasibility studies funded by Healtex. We have received eight proposals on a range of relevant topics. The panel (with representatives from academia, NHS and industry) decided to support the following studies

  • Conceptualising and Quantifying Social Media Signal Strength Relating to Non-Adherence in the Treatment of Depression (A. Belz, E. Ford, D. Weir, H. van Marwijk, J. Cassell )
  • Towards Shareable Data in Clinical Natural Language Processing: Generating Synthetic Electronic Health Records (J. Ive, S. Velupillai, N. Viani, A. Roberts, R. Stewart, S. Puntis, W.O. Pickrell, R.N. Cardinal)
  • AuTomated prioRitisation and categorIsation of sAfety and PharmacoviGilance Events in CTIMPs (TRIAGE) (I. Spasic, M. Busse, A. Balinsky, D. Owen, C. Johnson)
  • Developing data governance for using free-text data in research (TexGov) (K.H. Jones, E. Ford, N. Lea, D. Ford, S. Thompson, A. Lacey)
  • Automated coding of Free Text to Clinical Ontologies (A. Lacey, S. Thompson, O. Pickrell, B. Fonferko-Shadrach, S. Dobbie, A. Roberts)



Savsnet’s 10th birthday (October 2018)

To mark ten years of The Small Animal Veterinary Surveillance Network (SAVSNET), an event was organised to showcase the research being done using electronic health data from veterinary practices and laboratories, and how it is informing primary veterinary practice. A lot of data that SAVSNET collects is in free text – that’s why text analytics is an integral part of their processing workflow. Details of the event are available here and HeRC blog is here.



Healtex datathon on ADR identification from Social Media (Sept 2018)

Healtex and HealthUnlocked co-organised a datathon on adverse drug reaction mining from social media post, which took place in London on September 28th, 2018. A team of 17 participants from the universities of Manchester, Sheffield, Brighton and Sussex along with the hosts from HealthUnlocked spent the full day analysing and discussing the current state-of-the-art and challenges in the identification of key elements of automated pharmacovigilance using social media.


The datathon was part of a Healtex feasibility study that’s looking into how much information required for Yellow Card reports is present in social media posts. Yellow Card is a scheme used by the Medicines and Healthcare products Regulatory Agency (MHRA) to monitor the safety of all healthcare products in the UK.

The participants, including clinicians, text and data miners, qualitative researchers and user engagement experts, worked with an annotated dataset of 200 posts from five communities present in HealthUnlocked to establish the feasibility of automated extraction of the core Yellow Card data (e.g. treatment indication, drug, side effect, severity, outcome etc.). After an exciting and exhausting day of coding and discussions, we were able to identify a number of challenges that would require additional work to make the results of text mining of patient-generated data applicable in real-world applications. The participants also looked at a wider context of how to support better and more efficient reporting of ADRs to regulatory bodies.

Here is a brief video that reflects on the datathon aims.




HealTAC 2018 (April 2018)

HealTAC 2018 was a huge success – we had almost 100 attendees gathered for a busy 2-day event at the Manchester Conference Centre. The conference featured two excellent keynotes from leading experts in healthcare text analytics, nine research paper presentations, 15 posters, two panels (gaining public trust in healthcare text analytics and mining veterinary clinical records), an industry forum (with key players from the UK and internationally) with seven demo sessions for various software solutions. We also had a PhD forum where early career researchers presented their projects and received feedback from an expert panel and the audience. The forum was followed by an excellent career talk by Prof Wendy Chapman.

See what people were saying on Twitter, blogs and blogs.



Healtex newsletter (April 2018)

A new Healtex newsletter features the following stories:

  • Second call for feasibility studies (deadline: 1st June 2018)
  • Healtex Governance Working Group – call for governance case studies
  • Healtex early career research network – supporting future leaders
  • Sharing clinical text analytics methods and algorithms: a GATE-based hackathon (May 16-17, 2018)
  • Datathon on Identification of Adverse Drug Reactions from Social Media (July 11-12, 2018 – tbc)



Healtex early career researcher (HECR) network (February 2018)

Healtex aims to bring together early career researchers (PhD students, post-doctoral researchers, early career fellows, etc.) whose research is in any aspect of healthcare text mining in an informal network. We will provide logistic support to the network’s activities, which might include hackathons, support for sharing tools/data, fellowship grant preparation, etc. We’d also like to better understand the scope of current PhD projects in the community by presenting relevant projects and future leaders.

Please join the activities by filling in a brief questionnaire at Register your interests.



Medical Confidentiality: When is it OK to use your Patient Records in Research (January 2018)

Healtex and the Brighton and Sussex Medical School organised a discussion event to discuss the following questions with the members of public: Do you know what’s in your medical record? Have you ever thought how information about your care and outcomes might help others? Where should we draw the lines between privacy and making use of data for better care? Some data is harder to make anonymous such as letters written between GPs and specialists, scan reports, and doctor’s notes about patients’ symptoms, and so is often not included in a data set used by researchers. We wanted to know what the public views on medical data that is harder to make anonymous being used for research. The event involved learning more about research using NHS patient data through presentations from medical researchers, doctors, law and public engagement specialists, as well as roundtable discussions.



Feasibility studies (November 10 2017)

We are very pleased to announce successful proposals from the first round of feasibility studies funded by Healtex. We have received five proposals on a range of relevant topics. The panel (with representatives from academia, NHS and industry) decided to support two excellent studies

  • A citizens’ jury study to understand whether, and under what conditions, the public would accept medical free text being used for research (E. Ford, M. Oswald, L. Hassan, J. Cassell, J. Stockdale)
  • Feasibility of text-mining to support nudging of real-time side effect reporting to drug regulators within the online health social network ‘HealthUnlocked’ (W. Dixon, G. Nenadic, A. Bulcock, M. Evans, A. Anand)

The second call for feasibility studies will be announced early in 2018, with a deadline for applications in May 2018.



Workshop: “Extracting evidence from clinical free text: opportunities and challenges” (April 25 2017)

As part of the Informatics for Health 2017 conference, we have organise quite a successful workshop with around 70 attendees to discuss the opportunities and challenges in clinical free text analytics. The workshop started with four short presentations that introduced examples of previous/current projects, followed by discussions in smaller groups. The groups then feedback on key opportunities and challenges in clinical text analytics, focusing on technical, ethical and legal barriers and unmet needs, aligning those with other efforts in healthcare sciences. The workshop also discussed the state of clinical text analytics across the world.

  • Introduction
  • Goran Nenadic; University of Manchester, Health e-Research Centre, Farr Institute
  • Robert Stewart; King’s College London
  • Johannes Starlinger; Humboldt-Universität zu Berlin
  • Sumithra Velupillai; KTH, Stockholm and King’s College, London


Tutorial on “Healthcare Text Analytics – Analysing Free-Text Health Data” (April 22 2017)

A group of 20 clinicians, NLP researchers and epidemiologists attended the tutorial, organised as a pre-conference event for the Informatics for Health 2017 conference in Manchester. It explained the main steps in text mining and discussed challenges and opportunities for healthcare text analytics.


Contact: G. Nenadic or G. Demetriou (University of Manchester)


Ready to start – Healtex tutorial at the Informatics for Health 2017

Veterinary text Mining workshop (25TH NOVEMBER 2016)

The workshop focused on the state-of-the-art in veterinary text mining. It was co-organised by the Centre for Evidence-based Veterinary Medicine @ Nottingham, SAVSNET and Healtex.



Launch of Healtex (14th November 2016)

The UK healthcare text analytics network (Healtex) was launched on November 14th by an event that was held in the Manchester Museum of Science and Technology. The event was attended by over 60 participants from a number of universities, NHS trusts, industry, regulators and funders.

The press release is available here.

launch-group-photo small


Unmet needs and challenges in clinical text analytics
(15TH NOVEMBER 2016)

The UK healthcare text analytics network (Healtex) organised a workshop on unmet needs challenges and unmet needs in clinical text analytics. The event was organised in Farr@HeRC and was attended by over 40 participants from a number of universities, NHS trusts, industry, regulators and funders.