Introduction to Machine Learning

27th November - 27th November
8 half-day Sessions

8 half-days (9.30 am to 1.00 pm)
Registration is now closed.

Fundamentals of Machine Learning for Health
and Care Using R

Summary of the course

In this course we will introduce the basic ideas and algorithms of supervised learning and we will implement them using R programming language (you will need to be comfortable with using base R and the tidyverse). A brief theoretical overview of the so-called learning setting will be provided, then the main focus will be on showing practical analysis and modelling of data related to healthcare.

Penny and Filippo were very good course instructors, explained everything clearly, and made some of the more complex elements easier to understand. I found the course very enjoyable and application of the insight gained will definitely be of value to my organisation.

Learning outcomes

  • To understand concepts of machine learning for healthcare and compare and test a range of techniques.
  • To classify features of data sources, analysing and interpreting the outputs of machine learning techniques in the context of practical solutions in the area of healthcare.


Session 1: Introduction

  • What is machine learning?
  • Types of machine learning.
  • Classification and regression.
  • Training and test sets.
  • Model evaluation.
  • Over-fitting.
  • Overview of Machine Learning Algorithms.
  • No free lunch theorem.
  • Cross validation.

Session 2: Introduction to Regression

  • Simple and multivariate linear regression.
  • Polynomial regression.
  • Parameter estimates.
  • Residual analysis.
  • Metrics for model evaluation.
  • Plots and predictions.
  • Feature selection.

Session 3: Data Preparation

Data analysis and pre-processing, exploratory data analysis, handling missing data.

Session 4: Feature Engineering

Feature engineering techniques including but not limited to: transformations, feature extraction, reduction and selection.

Session 5: Classification

Logistic Regression:

  • why logistic regression;
  • logistic function;
  • simple logistic regression;
  • multinomial logistic regression (tentative);
  • ROC curve;
  • feature interpretation;
  • predictions using logistic regression.

Session 6: Classification using Tree Models

Decision Trees:

  • classification using decision trees;
  • understanding and visualising decision trees;
  • advantages and disadvantages of decision trees;
  • predictions.

Random Forests:

  • from decisions trees to random forests;
  • training and tuning random forests;
  • predictions.

Session 7: Regression using Tree Models

Using decision trees and random forests for regression.

Variable importance.

Session 8: Introduction to Regularisation

  • Regularisation and over-fitting.
  • Ridge penalty and LASSO penalty.
  • Elastic Nets.
  • Tuning regularised models.


Your training will be led by:

  • Filippo Cavallari, Data Science Lecturer, Data Science Campus, Office for National Statistics | Swyddfa Ystadegau Gwladol and
  • Penny Holborn, Head of Faculty, Data Science Campus, Office for National Statistics | Swyddfa Ystadegau Gwladol

To do this course you will need to be comfortable with using base R and the tidyverse.

This course is free and available to all those working in health and care in the Midlands, e.g. NHS, Public Health, Local Authority, ICBs, ICSs etc

8 half-days (9.30 am to 1.00 pm) September to November 2023 

Online – delivered via Zoom with a combination of delivery styles.


  • 15/09/23 (please note change in starting date)
  • 22/09/23
  • 13/10/23
  • 20/10/23
  • 27/10/23
  • 10/11/23
  • 17/11/23
  • 27/11/23

Registration is now closed.

For more information about this course, please contact:

Training & Development Operational Lead, Rachel Caswell