M16 - From Language to Information: Natural Language Processing

Type of Course - Dates - Venue - Description - Target audience - Exam - IMPORTANT: Incorporation in DTP and reimbursement by DS
Course prerequisites - Teacher - Course material - Fees - Enrol

Type of course

 This is an on campus course, with blended learning options.


Three full days during the Easter Holiday: Monday April 11, Tuesday April 12 and Wednesday April 13, 2022, from 9 am to 12 pm and from 1 pm to 4 pm
Please note: The deadline for UGent PhD students who want a refund to open a dossier on the DS website (Application for Recognition) is March 11, 2022.


Faculty of Science, Site Sterre, Krijgslaan 281, building S9, 9000 Gent.


In many sources of data, relevant information is conveyed by free text: this is the case for instance when analyzing the contents of patient records, scientific publications, social media, etc. Because of the non-formal nature of human language, contrary for instance to programming languages, computer-based extraction of structured information from natural language text is challenged by the high variation in expression and the importance of context for correct interpretation. Natural Language Processing aims to design methods that address these challenges, using human knowledge or data-driven methods. This course aims to bring participants to the level where they can independently perform text classification and extract data from text for further data processing and analysis.

The course provides an introduction to Natural Language Processing, including how to handle language units such as words, phrases, sentences, and additional information such as part-of-speech and syntactic structure. The most common applications of supervised machine learning to text analytics will be introduced, such as text classification, sequence labelling for information extraction, focusing on entity recognition and classification, as well as the creation and use of word embeddings and neural classifiers. The course will take biomedical text as illustration, supported by a short introduction to the representation and processing of biomedical terminology.

Content structure:

  • Introduction to Natural Language Processing
  • Basic Natural Language Processing tools
  • Machine learning for text classification
  • Sequence labelling for information extraction
  • Biomedical terminology for entity recognition
  • Word embeddings and neural classifiers for entity recognition

Target audience

This course is aimed at professionals and investigators from diverse areas, who need to analyze information conveyed by texts. It is of particular interest to researchers, graduate students or postdocs in health-related specialities who need to analyze information conveyed by patient records, scientific publications, social media, etc.


Participants can, if they wish, take part in an exam. Upon succeeding in this test a certificate from Ghent University will be issued.
The exam consists of a take home project assignment. Students are required to write a report by a set deadline.

Incorporation in DTP and reimbursement from DS for UGent PhD students

As a UGent PhD student, to be able to incorporate this 'specialist course' in your Doctoral Training Program (DTP) and get a reimbursement of the registration fee from your Doctoral School (DS) you need to follow strict rules: please take the necessary action in time. The deadline to open a dossier on the DS website (Application for Recognition) for this course is March 11, 2022. Please note that opening a dossier does not mean that you are enrolled. You still need to enrol via the registration form on this site.

Please note: For UGent PhD students it is no longer necessary to participate/succeed in this exam to be able to incorporate the course in the DTP.

Course prerequisites

Participants are expected to be familiar with the Python programming language. Some knowledge of supervised machine learning is considered a plus.


Foto Pierre ZweigenbaumProf. dr. Pierre Zweigenbaum, PhD, FACMI, FIAHSI, is a Senior Researcher at LISN (Orsay, France), a laboratory of the French National Research Council (CNRS) and Université Paris-Saclay, where he led the ILES Natural Language Processing group for seven years. Before CNRS he was a researcher at Paris Public Hospitals and a part-time professor at the National Institute for Oriental Languages and Civilizations. His research focus is Natural Language Processing, with medicine as a main application domain. His research interests are in Information Extraction in multilingual settings, and he is the author or co-author of methods and tools to detect various types of medical entities, expand abbreviations, resolve co-references, detect relations. He has also designed methods to acquire linguistic knowledge automatically from corpora and thesauri, to help extend monolingual and bilingual lexicons and terminologies, using parallel and comparable corpora. He graduated from École Polytechnique and Télécom Paris, holds a PhD in Computer Science from Télécom Paris (1985) and an habilitation in Computer Science from Université Paris-Nord.

Course material

Slides, lab session configuration instructions and notebooks, data files; book chapter 'Advanced Literature-Mining Tools'.


A different price applies, depending on your main type of employment.

Employment Module 16 Exam
Industry/Private sector1 1110 30
Non-profit, government, higher education staff2 835 30
(Doctoral) students, retired, unemployed2 375 30

1 If two or more employees from the same company enrol simultaneously for this course a reduction of 20% on the module price is taken into account starting from the second enrolment.

2 UGent-staff and UGent doctoral students who pay internally via SAP or internal transfer can participate at these special rates.

Enrol for this course