Healtex is a UK multi-disciplinary research network that aims to explore the barriers to effectively utilising healthcare narrative text data, road-map research efforts and principles for sharing text data and text analytics methods between academia, NHS and industry. It is funded as part of the EPSRC Healthcare Technologies Grand Challenges theme.

The majority of concerted research efforts focused on real-time processing and integration of structured data streams coming from clinical coding, diagnostic tests, sensor measurements, questionnaires, etc. to support timely clinical interventions and facilitate patients’ self-management. Nonetheless, natural language remains the main means of communication within healthcare with its written accounts becoming increasingly available in an electronic form, including free text data embedded within electronic health records (e.g. referral letters, case notes, pathology reports, hospital discharge summaries, etc.), patient-reported outcome measures (e.g. questionnaires, diaries, etc.) or unsolicited informal feedback shared openly on the Web 2.0 (e.g. social media, fora, etc.).

Unfortunately, the capacity to effectively utilise information from unstructured text data on a big scale is lagging behind its structured counterpart. The aim of our network is to enable research that will deploy healthcare narratives as real-time sensors and integrate them with the structured data streams into a patient-focused collaborative ecosystem.


  • Establish a strong, collaborative and sustainable UK community in healthcare text analytics by bringing together partners from academia, industry, healthcare services and policy makers, with collaborative links internationally, and integrate it into the growing health informatics community.
  • Identify and understand clinical, technical, ethical and legal barriers, challenges and unmet needs through a forum of clinicians, information guardians, patients, carers and health informaticians, and align those with other efforts in healthcare sciences, in particular in the area of actionable data analytics.
  • Identify key underpinning engineering research challenges, and facilitate road-mapping through a series of feasibility studies, datathons and hackathons.
  • Explore and build the means to share data, methodologies and tools, through a secure health informatics research platform, building on existing developments in healthcare data science research such as developed within the Farr Institute of Health Informatics Research.
  • Organise a series of project proposal workshops across the UK by bringing together the healthcare, industry and academic sectors to address key issues in harnessing and making use of healthcare text data. These workshops will result in a number of EPSRC, MRC, NIHR project and discipline-bridging fellowship proposals.


Challenge streams

The challenges in healthcare text analytics span several engineering and healthcare areas. Given the current state-of-the-art, we can identify five streams of work needed to facilitate research in this area.

A. User requirements, needs and barriers

Different communities and sectors in healthcare undoubtedly have different practices in producing narrative data, and diverse text analytics needs will certainly be triggered by various clinical research questions. Within the network, we will work with several user groups that will involve clinicians, patients/carers and information custodians. We will also further consider GP produced free-text data as available in the CPRD database, based on several pilot studies already curried out by the network members. By looking across different conditions, we will be better positioned to understand what the user requirements are and what models of involvement is required.

B. Knowledge-intensive text mining and NLP

Due to the lack of reliable and large enough training datasets, current systems for clinical healthcare often rely on knowledge-based approaches, and specifically on numerous terminological resources. However, one of the main bottlenecks is that those resources are designed to support other healthcare processes and needs, rather than to analyse healthcare narrative.

C. Data-driven text mining and NLP

Statistical and machine-learning approaches have been successful in many domains, but the lack of shared, annotated datasets in healthcare is hindering the use of adaptable data-driven approaches. This stream will look at the barriers in building data-driven models of healthcare languages and how existing solutions can be efficiently adapted to new environments using an agile approach. We will also explore annotation standards that training data should follow.

D. Privacy, confidentiality and data availability

While there is a clear need and demand for sharing healthcare data, there is also a real concern about privacy disclosure of Protected Health Information (PHI) from clinical/care records and patient-generated data. PHI, often defined differently by different organisations, usually includes not only personal information, but also individual-level health status, family context and/or details of care. De-identification of unstructured data in particular is challenging, as PHI can appear virtually anywhere in a clinical narrative or letter, and thus sharing healthcare narrative datasets is a key legal and ethical bottleneck for healthcare text analytics and its use in health data science. This stream aims to identify barriers for text data sharing and explore what research and legislative work is needed.

E. Actionable healthcare analytics: integration of narratives with other data streams

Actionable healthcare analytics aims to use routinely collected data from EHRs to identify missed opportunities and learn care patterns, which can be then used for better decision-making by clinicians, patients and commissioners. Information extracted from healthcare narrative brings added value to the clinical/care ecosystem only once integrated with other data streams. We will coordinate this theme with the network partners from across the Farr Institute to co-develop a research roadmap on how to interface with other areas and data types to support healthcare discovery science.

The current team behind Healtex:

Name Affiliation Role Contact
Dr Goran Nenadic Reader in Text Analytics, University of Manchester and
PI; theme D co-lead gnenadic@manchester.ac.uk
Prof Robert Stewart Professor of Psychiatric Epidemiology & Clinical Informatics, King’s College London CI; theme A co-lead
Prof Robert Gaizauskas Professor in Computer Science, University of Sheffield CI; theme B co-lead
Prof Irena Spasic Profesor of Computer Science, University of Cardiff CI; theme C co-lead
Prof David Robertson Chair of Applied Logic, University of Edinburgh CI; theme E co-lead
Prof Jackie Cassell Professor of Primary Care Epidemiology , Brighton and Sussex Medical School theme A co-lead
Prof Jane Kaye Professor of Health, Law and Policy, University of Oxford theme D co-lead
Dr Nigel Collier Director of Research, University of Cambridge theme C co-lead
Dr Claire Grover Senior Research Fellow, University of Edinburgh theme B co-lead
Dr Niels Peek Reader in Health Informatics, University of Manchester, Farr@HeRC theme E co-lead
Dr Azad Dehghan Healtex Network Manager, University of Manchester Network Manager