Statistics Seminars are held on Wednesdays 14:00 – 15:00. Everyone is welcome!  We gather for coffee/tea and biscuits around 15 minutes before the seminar begins.

The organiser is Dr Giorgos Minas.  Please contact Giorgos to find out more about the seminars, to suggest a future seminar speaker, or to request joining seminars online.

Statistics seminar dates alternate with the CREEM seminar series. Some seminars are joint with CREEM.

Some of the seminars this year will be held in-person and some online. The in-person seminars will be held in either the Mathematical Institute or the Observatory seminar room. Please see below for more details.

Forthcoming statistics seminars

We are currently preparing the seminar series for academic year 2023-24 – watch this space!
The following seminars were postponed from last academic year, and will be rescheduled:
  • Blanca Sarzo Carles, Valencia University (joint CREEM/Stats Seminar)
  • Maria Kiladi, UCL (Maths History interest)
  • Catriona Keerie, University of Edinburgh Medical School

Past seminars


  • Wed, Apr-26, 2-3pm: Magnus Rattray, University of Manchester
Title: Gaussian process methods for modeling temporal and spatial gene expression changes
Abstract: Gaussian process (GP) inference provides a flexible nonparametric probabilistic modelling framework that is well suited to modelling spatial and temporal data. I will provide a brief tutorial introduction to GP inference and present some applications to problems in single-cell and spatial transcriptomics data analysis and modelling. GPs can be used to incorporate prior knowledge into trajectory inference from single-cell data, e.g. to model periodic trajectories or to include capture time labels into trajectory
inference. GPs also provide a natural framework for modelling branching trajectories and can be used to infer the ordering of gene branching events along pseudotime. RNA-Seq data are typically summarised as counts and we have implemented a negative binomial likelihood that can be used to improve the performance of GP inference methods.
[1] BinTayyash, N., Georgaka, S., John, S. T., Ahmed, S., Boukouvalas, A., Hensman, J., & Rattray, M. (2021). Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments. Bioinformatics, 37(21), 3788-3795.
[2] Ahmed, S., Rattray, M., & Boukouvalas, A. (2019). GrandPrix: scaling up the Bayesian GPLVM for single-cell data. Bioinformatics, 35(1), 47-54.
[3] Boukouvalas, A., Hensman, J., & Rattray, M. (2018). BGP: identifying gene-specific branching dynamics from single-cell data with a branching Gaussian process. Genome biology, 19, 1-15.
About the speaker: Magnus Rattray is Professor of Computational & Systems Biology, and Director of the Institute for Data Science & Artificial Intelligence at the University of Manchester. He is co-president of the Machine Learning for Computational Systems Biology (MLCSB) COSI of the major international bioinformatics conference ISMB and is a Fellow of the European Laboratory of Learning and Intelligent Systems (ELLIS) associated with the ELLIS Health Programme. His research group is funded by the Wellcome Trust, UKRI, CRUK and EU. Magnus uses probabilistic modelling and Bayesian inference techniques to study biological systems across a broad range of temporal and spatial scales, from gene expression in single cells to longitudinal population health data. Recent work includes methods to uncover oscillations from single-cell imaging time course data and the development of scalable Gaussian process models for pseudotime and branching process inference using single-cell omics data
  • Wed, Mar-1, 2-3pm: Rachel McRea, University of Lancaster 

Title: Statistical models for conservation translocations

Collaborative work with: Katie Bickerton (University of Kent), Fay Frost (Lancaster University), Stefano Canessa (Bern University), John Ewen (ZSL)

Abstract: Conservation translocations are being increasingly used in the conservation of threatened species and as part of ecological restoration programmes (Bickerton et al, 2022). Robust estimates of abundance are essential for meaningful conservation decision-making and the impact of translocations on source populations needs to be understood.  Within this talk I will present a new capture-recapture model for translocated populations and will then present a modelling framework where capture-recapture is combined with removal/depletion methodology (Zhou et al, 2019).  I will demonstrate that an exact likelihood is possible when individual level information is available, and I will show how a standard integrated population modelling approach (Frost, et al, 2022), which assumes independence between component data sets, can be adapted to provide an approximate likelihood when individual level data is not available.  This approach, as well as providing a valuable tool for estimating the abundance of source populations post translocation, also motivates a new direction of research for overcoming issues of dependence of data within a standard integrated population modelling framework.

Bickerton, K., Ewen, J.G., Canessa, S., Cole, N., Frost, F., Mootoocurpen, R. and McCrea, R.S. (2022) Estimating population size of translocated populations: a modification of the Jolly-Seber model. Submitted.

Frost, F., McCrea, R.S., King, R., Gimenez, O. and Zipkin, E. (2022) Integrated population models: Achieving their potential.Journal of Statistical Theory and Practice. In press.

Zhou, M., McCrea, R.S., Matechou, E., Cole. D.J. and Griffiths, R.A., (2019) Removal models accounting for temporary emigration. Biometrics. 75, 24-35.

About the speaker: Rachel McCrea is a Professor of Statistics at Lancaster University. She is currently the Director of the National Centre for Statistical Ecology. She was also elected Fellow of the Learned Society of Wales.

Professor McCrea’s research includes developing new methods for model selection and diagnostic testing for a class of models which are fitted to capture-recapture data. Capture-recapture studies involve the capture and unique marking of wild animals, which are then released back into the population; subsequently attempts are made to recapture them. Models can be used to estimate demographic parameters which are essential to understand the viability of the populations and drivers of population change.

More recently Professor McCrea’s research has moved into the related field of multiple systems estimation and she has worked on problems such as estimating the number of victims of human trafficking. She is the Sciences Theme Lead for the University of Lancaster Migration and Movement Signature Research Theme.

  • Wed, Feb-22, 3-4pm: Victor Elvira, University of Edinburgh 

Title: State-Space Models as Graphs

Abstract: Modeling and inference in multivariate time series is central in statistics, signal processing, and machine learning. A fundamental question when analyzing multivariate sequences is the search for relationships between their entries (or the modeled hidden states), especially when the inherent structure is a directed (causal) graph. In such context, graphical modeling combined with parsimony constraints allows to limit the proliferation of parameters and enables a compact data representation which is easier to interpret in applications, e.g., in inferring causal relationships of physical processes in a Granger sense. In this talk, we present a novel perspective consisting on state-space models being interpreted as graphs. Then, we propose two novel algorithms that exploit this new perspective for the estimation of the linear matrix operator in the state equation of a linear-Gaussian state-space model. Finally, we discuss the extension of this perspective for the estimation of other model parameters in more complicated models.

  • Wednesday, Jan-18, 2-3pm, Dr Chih-Li Sung, Assistant Professor, Department of Statistics and Probability, Michigan State University
Title: When epidemic models meet statistics: understanding COVID-19 outbreak
Abstract: As the coronavirus disease 2019 (COVID-19) has shown profound effects on public health and the economy worldwide, it becomes crucial to assess the impact on the virus transmission and develop effective strategies to address the challenge. A new statistical model derived from the SIR epidemic model with functional parameters is proposed to understand the impact of weather and government interventions on the virus spread in the presence of asymptomatic infections among eight metropolitan areas in the United States. The model uses Bayesian inference with Gaussian process priors to study the functional parameters nonparametrically, and sensitivity analysis is adopted to investigate the main and interaction effects of these factors. This analysis reveals several important results including the potential interaction effects between weather and government interventions, which shed new light on the effective strategies for policymakers to mitigate the COVID-19 outbreak. paper
Bio: Chih-Li Sung is an Assistant Professor in the Department of Statistics and Probability at Michigan State University. His research interests include computer experiment, uncertainty quantification, machine learning, big data, and applications of statistics in engineering. He was awarded Statistics in Physical Engineering Sciences (SPES) Award from ASA in 2019. He is currently an associate editor for Technometrics and Computational Statistics & Data Analysis (CSDA). His research is supported by NSF DMS 2113407.
Chih-Li Sung received a Ph.D. at the Stewart School of Industrial & Systems Engineering at Georgia Tech in 2018. He was jointly advised by Profs. C. F. Jeff Wu and Benjamin Haaland. He also received a B.S in applied mathematics and an M.S. in statistics from National Tsing Hua University in Taiwan in 2008 and 2010, respectively.
  • 16-Nov, in-person talk, Dr Pantelis Samartsidis, Investigator Statistician, MRC Biostatistics Unit, University of Cambridge

Title: A Bayesian multivariate factor analysis model for causal inference using time-series observational data on mixed outcomes

Abstract: Assessing the impact of an intervention by using time-series observational data on multiple units and outcomes is a frequent problem in many fields of scientific research. Here, we propose a novel Bayesian multivariate factor analysis model for estimating intervention effects in such settings and develop an efficient Markov chain Monte Carlo algorithm to sample from the high-dimensional and non-tractable posterior of interest. The proposed method is one of the few that can simultaneously deal with outcomes of mixed type (continuous, binomial, count), increase efficiency in the estimates of the causal effects by jointly modelling multiple outcomes affected by the intervention (as shown via a simulation study), and easily provide uncertainty quantification for all causal estimands of interest. Using the proposed approach, we evaluate the impact that Local Tracing Partnerships had on the effectiveness of England’s Test and Trace programme for COVID-19.

  • 14-Sept, Online talk, Dr Dennis Prangle, Senior Lecturer in Statistics, School of Mathematics, University of Bristol
Title: Distilling importance sampling for likelihood-free inference
Abstract: Likelihood-free inference involves inferring parameter values given observed data and a simulator model. The simulator is computer code taking the parameters, performing stochastic calculations, and outputting simulated data. In this work, we view the simulator as a function whose inputs are (1) the parameters and (2) a vector of pseudo-random draws, and attempt to infer all these inputs. This is challenging as the resulting posterior can be high dimensional and involve strong dependence.

We approximate the posterior using normalizing flows, a flexible parametric family of densities. Training data is generated by ABC importance sampling with a large bandwidth parameter. This is “distilled” by using it to train the normalising flow parameters. The process is iterated, using the updated flow as the importance sampling proposal, and slowly reducing the ABC bandwidth until a proposal is generated for a good approximation to the posterior. Unlike most other likelihood-free methods, we avoid the need to reduce data to low dimensional summary statistics, and hence can achieve more accurate results.

  • 28-Sept, In-person talk, Rachel Phillip, Medical Statistician, Clinical Trials Research Unit, University of Leeds

Rachel is an alumnus of our School and the talk should be of interest to both students and staff.

Title: Working as a statistician on Phase I cancer clinical trials

Abstract: Clinical trials are research studies that are conducted in people in order to study and test new medical treatments. Trials are usually conducted in phases that build on each other, with Phase I trials being the first steps of testing new treatments in people. There is often limited safety information on new treatments, so the primary aims of Phase I studies are to ascertain the safety profile of the intervention and to determine the highest dose that can be given safely without severe side effects that can be taken forward for further investigation in future studies. This talk will provide an introduction to the different areas that a statistician works on in clinical trials, the common statistical designs of Phase I studies as well as talking about CONCORDE – an innovative phase I platform trial testing different drug-radiotherapy combinations.

  • 05-Oct, Online talk (attending from Maths Tutorial Room 1A), Prof Alexandros Beskos, Professor in Statistics, UCL
Title: Manifold Markov chain Monte Carlo methods for Bayesian inference in diffusion models

Abstract: Bayesian inference for nonlinear diffusions, observed at discrete times, is a challenging task that has prompted the development of a number of algorithms, mainly within the computational statistics community. We propose a new direction, and accompanying methodology, borrowing ideas from statistical physics and computational chemistry, for inferring the posterior distribution of latent diffusion paths and model parameters, given observations of the process. Joint configurations of the underlying process noise and of parameters, mapping onto diffusion paths consistent with observations, form an implicitly defined manifold. Then, by making use of a constrained Hamiltonian Monte Carlo algorithm on the embedded manifold, we are able to perform computationally efficient inference for a class of discretely observed diffusion models. Critically, in contrast with other approaches proposed in the literature, our methodology is highly automated, requiring minimal user intervention and applying alike in a range of settings, including: elliptic or hypo-elliptic systems; observations with or without noise; linear or non-linear observation operators. Exploiting Markovianity, we propose a variant of the method with complexity that scales linearly in the resolution of path discretisation and the number of observation times. The talk is based on a forthcoming JRSSB paper:

  • 19-Oct, In-person talk joint with CREEM, Dr Ben Swallow, Lecturer in Statistics, University of St Andrews

Title: Bayesian causal inference for zero-inflated GLMs using a potential outcomes framework’

Abstract: We propose a method for conducting Bayesian causal inference under a generalised linear model potential outcomes framework, for data where there are many more zeros than would naturally be expected. We develop an approach using both semi-continuous and fully continuous probability distributions and apply the approach to both simulated data and ornithological citizen science data in the UK, comparing the results to purely observational studies. Further analyses of the contrasting GLMs are also discussed.

  • 26-Oct, JJ Valletta Memorial lecture: Dr TJ McKinley, Lecturer in Mathematical Biology, Department of Mathematics and Statistics, University of Exeter.  In person in Lecture Theatre D, Mathematical Institute.

Title: Emulation-driven inference for complex spatial meta-population models

Abstract: Calibration of complex stochastic infectious disease models is challenging. These often have high-dimensional input spaces, with the models exhibiting complex, non-linear dynamics. Coupled with this is a paucity of necessary data, resulting in a large number of hidden states that must be handled by the inference routine. Likelihood-based approaches to this missing data problem are very flexible, but challenging to scale due to having to monitor and update these hidden states. Methods based on simulating the hidden states directly from the model-of-interest have the advantage that they are often much more straightforward to code, and thus are easier to implement and adapt to changing model structures. However, they often require very large numbers of simulations in order to adequately explore the input space, which can render them infeasible for many large-scale problems.

This seminar will be given in the memory of our colleague JJ Valetta, who suddenly passed away while hillwalking in October 2020.  The seminar will be followed by a reception in the Mathematical Institute Common Room (on the ground floor).  All are welcome, both to the lecture and the reception!

  • 02-Nov, joint with CREEM, Dr Wei Zhang, Lecturer in Statistics, School of Mathematics and Statistics, University of Glasgow
Title: A flexible and efficient Bayesian implementation of point process models for spatial capture-recapture data

Abstract: Spatial capture-recapture (SCR) is now routinely used for estimating abundance and density of wildlife populations. A standard SCR model includes sub-models for the distribution of individual activity centres and for individual detections conditional on the locations of these activity centres. Both sub-models can be expressed as point processes taking place in continuous space, but there is a lack of accessible and efficient tools to fit such models in a Bayesian paradigm. In this talk, I will describe a set of custom functions and distributions to achieve this. Our work allows for more efficient model fitting with spatial covariates on population density, offers the option to fit SCR models using the semi-complete data likelihood (SCDL) approach instead of data augmentation, and better reflects the spatially continuous detection process in SCR studies that use area searches. In addition, the SCDL approach is more efficient than data augmentation for simple SCR models while losing its advantages for more complicated models that account for spatial variation in either population density or detection. I will present the model formulation, test it with simulations, quantify computational efficiency gains, and conclude with a real-life example using non-invasive genetic sampling data for an elusive large carnivore, the wolverine (Gulo gulo) in Norway.

  • 09-Nov, online talk, Prof Chris Holmes, Professor in Biostatistics at the Departments of Statistics and the Nuffield Department of Medicine, University of Oxford

Title: Bayesian Predictive inference

Abstract: De Finetti promoted the importance of predictive models for observables as the basis for Bayesian inference. The assumption of exchangeability, implying aspects of symmetry in the predictive model, motivates the usual likelihood-prior construction and with it the traditional learning approach involving a prior to posterior update using Bayes’ rule. We discuss an alternative approach, treating Bayesian inference as a missing data problem for observables not yet obtained from the population needed to estimate a parameter precisely or make a decision correctly. This motivates the direct use of predictive models for inference, relaxing exchangeability to start modelling from the data in hand (with or without a prior). Martingales play a key role in the construction. This is joint work with Stephen Walker and Edwin Fong, based on the paper “Martingale Posteriors” to appear with discussion JRSS Series B.