Statistics Seminars are held on Wednesdays 14:00 – 15:00. Everyone is welcome!  We gather for coffee/tea and biscuits around 15 minutes before the seminar begins.

The organiser is Dr Giorgos Minas.  Please contact Giorgos to find out more about the seminars, to suggest a future seminar speaker, or to request joining seminars online.

Statistics seminar time-slots are shared with the CREEM seminar series.

Most of the seminars this year will be held in-person and few online. The in-person seminars will be held at the Observatory seminar room. Please see below for more details.

Forthcoming statistics seminars 2024-25

We are currently planning the seminar programme to start in September 2024.

Past seminars


  • Wed, March 27th, 2-3pm: Professor Paul Kirk (MRC Biostatistics Unit, University of Cambridge)
    Title: Large scale outcome-guided Bayesian mixture models for cluster analysis of EHR datasets

    Abstract: Motivated by the analysis of electronic health records (EHRs) for characterising patterns of multi-morbidity in UK populations, we present a divide-and-conquer approach for using Dirichlet process mixtures to model large-scale datasets. We review existing approaches, and consider the general challenge of matching cluster labels across data shards. We present a novel technique for cluster matching and assessing cluster stability, and introduce a recursive refinement approach to ensure that smaller clusters are not lost. We illustrate its application in the context of outcome-guided clustering to define patient populations that have similar patterns of co-occurring long-term conditions.
    Keywords: Divide-and-conquer, Bayesian mixture models, cluster analyses.
  • Wed, March 20th, 2-3pm: Fanny Empacher (University of St Andrews) – joint talk with CREEM

Title: Efficient Methods for Fitting Nonlinear Non-Gaussian State Space Models of Wildlife Population Dynamics

Abstract: State-space models (SSMs) are a popular and flexible framework for modelling time series due to their ability to separate changes in the underlying state of a system from the noisy observations made on these states. However, fitting these models can sometimes be challenging. In my thesis, I explored methods for fitting these models in non-linear and non-Gaussian Bayesian SSMs, using a case study of the UK grey seal population. Here, I give an introduction to these methods and summarise the findings of my thesis. These include a comparison of sequential Monte Carlo methods with the much faster Kalman filter which uses a linear and normalized approximation, and how estimating likelihood components separately can lead to a 5-fold increase in speed.

  • Wed, Mar-6, 2-3pm: Professor Jason Mathiopoulos (University of Glasgow)

Title: Defining, estimating and understanding the fundamental niches of complex animals in heterogeneous environments

Abstract: During the past century, the fundamental niche, the complete set of environments that allow an individual, population, or species to persist, has shaped ecological thinking. It is a crucial concept connecting population dynamics, spatial ecology and evolutionary theory, and a prerequisite for predictive ecological models at a time of rapid environmental change. Yet, its properties have eluded quantification, particularly for mobile, cognitively complex organisms. These difficulties are mainly a result of the separation between niche theory and field data, and the dichotomy between environmental and geographical spaces. Here, I combine recent mathematical and statistical results linking habitats to population growth, to achieve a quantitative and intuitive understanding of the fundamental niches of animals. I trace the development of niche ideas from the early steps of ecology to their use in modern statistical and conservation practice. I examine, in particular, how animal mobility and behaviour may blur the division between geographical and environmental space. I discuss how the fundamental models of population and spatial ecology lead to a concise mathematical equation for the fundamental niche of animals and demonstrate how fitness parameters can be understood and directly estimated by fitting this model simultaneously to field data on population growth and spatial distributions. I illustrate these concepts and methods using both simulation and real animals and, in this way, confirm ideas that had been anticipated in the historical niche literature. Specifically, within traditionally defined environmental spaces, habitat heterogeneity and behavioural plasticity make the fundamental niche more complex and malleable than was historically envisaged. However, once examined in higher-dimensional spaces, the niche is more predictable, than recently suspected. This re-evaluation quantifies how organisms might buffer themselves from change by bending the boundaries of viable environmental space, and offers a framework for designing optimal habitat interventions to protect biodiversity or obstruct invasive species. It therefore promotes the fundamental niche as a key theoretical tool for understanding animal responses to changing environments and a central tool for environmental management. To this end, ecological mechanism (dispersal, density dependence, community effects and individual variation), integrated inference, and ecosystem optimization are the key future areas of development.

  • Wed, Feb-14, 2-3pm: Professor Andrew Golightly (University of Durham)

Title: Bayesian inference for partially observed Markov process (POMP) models: challenges and developments

Abstract: We consider the problem of parameter inference for partially observed Markov process models using data at discrete times that may be incomplete and subject to measurement error. This is a particularly challenging problem due to the intractability of the underlying Markov process, and in turn, the observed data likelihood. We therefore integrate over uncertainty in the latent process between observation times via a state-of-the-art correlated pseudo-marginal Metropolis-Hastings algorithm, that aims to improve mixing of the parameter chains by inducing positive correlation between successive estimates of the observed data likelihood. However, unless the measurement error or the dimension of the latent process is small, correlation can be eroded by the resampling steps in the particle filter. We therefore propose a novel augmentation scheme, that allows for conditioning on values of the latent process at the observation times, completely avoiding the need for resampling steps. We illustrate the resulting methodology in the context of nonlinear multivariate diffusion processes and find that our approach offers substantial increases in overall efficiency, compared to some competing methods.  

  • Wed, Feb-7, 2-3pm: Dr Ben Swallow (University of St Andrews) – joint seminar with CREEM

Title: Hierarchical GAMs for studying the irruptive migration of crossbill species in northern Europe

Abstract: Irruptions by seed-eating birds are assumed to be driven by the production of seeds and fruits, whose crops are highly variable between years. Using data from Sweden, Finland and UK, we tested a variety of assumptions about synchrony in coniferous seed crops and whether irruptions of crossbills Loxia sp. were correlated with seed production further afield. In a second set of analyses, we developed hierarchical generalised additive models to study when irruptions into norther Europe took place. The models indicate that the incidental co-occurrence of low seed production of Norway spruce and Scots pine in a given year, after a year of high seed production, may result in an irruption. The seed production of Norway spruce and Scots pine in Sweden was correlated with production by the same species in Finland, indicating widespread synchrony of cropping across northern Europe.

  • Wed, Jan-31, 2-3pm: Professor Sofia Dias (University of York)
Title: Network meta-analysis for decision making: making best use of relevant evidence
Abstract: Meta-analyses are typically used to pool evidence from multiple studies in order to decide which treatment is most effective or cost-effective, out of several alternatives. When deciding which treatments to recommend for use in a national health service, we typically start with a well-defined decision problem specifying the patient population, interventions and outcomes of interest. A search of the literature for randomised controlled trials (RCTs) comparing the interventions of interest then follows, where evidence is collected and assessed for quality and relevance to the decision problem.
Network meta-analysis (NMA) extends the idea of pairwise meta-analysis to pool evidence on more than one intervention, allowing for multiple treatments to be compared simultaneously and indirect evidence on treatment comparisons to be incorporated. Whilst standard NMA methods are now well established, some recent extensions allow pooling of additional data, potentially reducing uncertainty.
After briefly introducing the principles of meta-analysis and NMA, the extension of NMA models to incorporate dose-response relationships will be described. An example will illustrate how evidence on different doses of interventions can be combined to strengthen inferences and how key modelling assumptions can be checked. Further extensions will also be briefly discussed.
  • Wed, Jan-24, 2-3pm: Dr Fergus Chadwick (University of St Andrews) – joint seminar with CREEM

Title:  Do identification guides hold the key to species misclassification by citizen scientists?

Abstract: Citizen science data often contain high levels of species misclassification that can bias inference and conservation decisions. Current approaches to address mislabelling rely on expert taxonomists validating every record. This approach makes intensive use of a scarce resource and reduces the role of the citizen scientist. Species, however, are not confused at random. If two species appear more similar, it is probable they will be more easily confused than two highly distinctive species. Identification guides are intended to use these patterns to aid correct classification, but misclassifications still occur due to user-error and imperfect guidebook design. Statistical models should be able to exploit this non-randomness to learn confusion patterns from small validation data-sets provided by expert taxonomists, yielding a much-needed reduction in expert workload. Here, we use a variety of Bayesian hierarchical models to probabilistically classify species based on the species-label provided by the citizen scientist. We also explore the utility of guidebooks provided by the citizen science schemes as a prior for species similarity, and hence draw conclusions for their future improvement. We find that the species-label assigned to a record by a citizen scientist, even when incorrect, contains useful information about the true species-identity. The citizen scientists correctly identify the species in around 58% of records. Using models trained on only 10% of these records (validated by experts), we can correctly predict species-identity for 69 (90%CI: 64-73)% of records when the guidebook is used, vs 64 (58-69)% for models that do not use the guidebook. The fact that misclassifications can be predicted systematically indicates that improvements could be made to the guidebook to reduce misclassification. By using Bayesian, hierarchical models we can greatly reduce the workload for experts by providing a probabilistic correction to citizen science records, rather than requiring manual review. This is increasingly important as the number of citizen science schemes grows and the relative number of taxonomists shrinks. By learning confusion patterns statistically, we open up future avenues of research to identify what causes these confusions and how to better address them

  • Wed, Jan-17, 2024, 2-3pm: ProfessorTheodore Kypraios (University of Nottingham)

Title: Bayesian nonparametric inference for stochastic infectious disease models

Abstract: Infectious disease transmission models require assumptions about how the
pathogen spreads between individuals. These assumptions may be somewhat
arbitrary, particularly when it comes to describing how transmission
varies between individuals of different types or in different locations
and may in turn lead to incorrect conclusions or policy decisions.

In this talk, we will present a novel and general Bayesian nonparametric
framework for transmission modelling which removes the need to make such
specific assumptions with regards to the infection process. We use
multi-output Gaussian process prior distributions to model different
infection rates in populations containing multiple types of individuals.
Further challenges arise because the transmission process itself is
unobserved, and large outbreaks can be computationally demanding to
analyse. We address these issues by data augmentation and a suitable
efficient approximation method. Simulation studies using synthetic data
demonstrate that our framework gives accurate results. Finally, we use
our methods to enhance our understanding of the transmission mechanisms
of the 2001 UK Foot and Mouth Disease outbreak.

Seymour, R.G., Kypraios, T., O’Neill, P.D. and Hagenaars, T.J. (2021), A Bayesian nonparametric analysis of the 2003 outbreak of highly pathogenic avian influenza in the Netherlands. J R Stat Soc Series C, 70: 1323-1343.

Seymour, R. G., Kypraios, T., & O’Neill, P. D. (2022). Bayesian nonparametric inference for heterogeneously mixing infectious disease models. Proceedings of the National Academy of Sciences, 119(10).
  • Wed, Dec-6th, 2023, 2-3pm: Dr Maria Kiladi (University College London)

Title: Eugenics and Statistics at UCL: Karl Pearson and the Department of Applied Statistics and Eugenics, 1913-1933.

Abstract: In 2018 a University of London newspaper, The London Student, revealed that UCL has been the venue of a London Conference of Intelligence, organised already since 2014. Questions were asked not only about the organisation of this particular event, but also on the links between UCL and Eugenics – as well as on the fact that the institution appeared to celebrate notable eugenicists by naming some of its buildings after them. The Eugenics Inquiry Committee was set up as a result (2018) tasked to uncover the history of eugenics at UCL, and the extent to which UCL might have benefited from funds linked to the study of eugenics. The historical aspect of this research pointed towards statistician Karl Pearson and the Department of Applied Statistics and Eugenics.

In this seminar, I will be exploring the history of eugenics at UCL, having worked with the Eugenics Inquiry Committee as its Research Fellow in 2018, and as a Research Fellow for the subsequent Legacies of Eugenics Project which was set up at UCL (2020-2022). My research put the history of eugenics at UCL in the wider context of the day, rather than discussing it as something exceptional that happened to one institution completely cut off from its context – as was the case before I arrived at the committee. I also uncovered material at the University of London that provided a much-needed context on the links between UCL and eugenics, and reconfigured previously simplistic understandings on how the Department operated – or indeed the alleged ‘support’ by the University’s leadership. By looking into this part of UCL history, I will also discuss the work of Karl Pearson as a statistician and Head/Director of the Department of Applied Statistics and Eugenics, particularly at the Francis Galton Laboratory for the Study of National Eugenics.

  • Wed, Nov-15th, 2023, 2-3pm: Dr Elham Mirfarah (University of St Andrews)

Title: An Introduction to Mixture of Experts Modelling and my Contribution to this Topic”

Abstract: This seminar provides an insightful introduction to the fascinating world of a Mixture of Experts (MoE) modelling, a versatile technique used in various fields. I will delve into the fundamentals of MoE models, introducing their structure and practical applications. And then, I will highlight my contributions to this field, especially for datasets with censored observations. I will wrap up my talk with some issues that I am currently working on.

  • Wed, Nov-29th, 2023, 2-3pm: Professor James Russell (University of Auckland)Title: Introduced rodent management on islandsAbstract: This talk will be incredibly applied in content and light on analytical details providing an overview of a lifetime drawing upon diverse analytical approaches to addressing the impacts and management of introduced rodents on islands. The results presented will include those that have utilised population differential equation modelling, spatially explicit capture recapture analyses, proof-of-absence (false absence) modelling, probabilistic genetic assignment of individuals, individual agent-based models, survival analyses and gradient-boosted decision trees. Overall the talk will present an example of how a broad training in ecology and statistics can empower a practitioner to wield an incredibly diverse analytical tool kit in pursuit of high impact conservation outcomes. References to all material will be provided for those who wish to dive-deeper into particular elements, or meet with the speaker who is on sabbatical at the University of Aberdeen until Christmas.

Short bio: Professor James Russell is a conservation biologist at the University of Auckland jointly appointed in the School of Biological Sciences and the Department of Statistics. He obtained his PhD in Biology and Statistics from the University of Auckland in 2007 and then worked overseas as a research fellow at UC Berkeley in the French territories of Reunion Island and French Polynesia before joining the faculty in 2010. He has held visiting Professor appointments at the Université Paris-Saclay, Universidade de São Paulo and University of Aberdeen.

James is a strategic advisor to Predator Free New Zealand, scientific advisor to Zero Invasive Predators, National Geographic Explorer, associate editor of the journal Biological Invasions, member of the IUCN Invasive Species Specialist Group and Pigeon and Dove Specialist Group and life member of the Ornithological Society of New Zealand.

James works throughout the world on islands to enable biodiversity conservation drawing upon mixed methodologies from the natural and social sciences. He was the 2012 New Zealand Emerging Scientist of the Year and received the 2018 Society for Conservation Biology Oceania Section distinguished service award and a 2020 University of Auckland Research Excellence Medal.

  • Wed, Nov-1, 2023, 2-3pm: Dr Sarah Christofides (University of Cardiff)
Title: Adventures in Statistics: A Ecologist’s Story
Abstract: Ecology and statistics make a natural pairing, but working at the interface between the two requires developing interdisciplinary skills. In this talk I reflect on approaching this interface from the ecology side. I will talk about some interesting statistical problems that have come up in my research, and how collaborating with statisticians has helped to provide biological insights.
Short bio of the speaker: I did my PhD on fungus-bacteria interaction in decomposing wood, under the supervision of Prof Lynne Boddy and Prof Andy Weightman. I then did a postdoc in the lab of Prof. Hilary Rogers, on how stressed grass is when it gets eaten by cows. After a couple of years doing bioinformatics support at Cardiff University’s Genome Research Hub, I was appointed lecturer in bioinformatics.
  • Wed, Oct-25, 2023, 2-3pm: JJ Valetta Memorial Lecture

Speaker : Professor Colin Torney (University of Glasgow)

Topic: From machine learning to migration: Quantitative approaches for understanding animal groups on the move

Abstract: Recent advances in technology and quantitative methods have led to a growth in our ability to study mobile animal groups in their natural environments. Understanding the movement patterns of these groups requires the study of individual behaviour and the interactions between leadership, imitation, and environmental drivers that influence movement decisions. In this talk I will discuss the methods we’re using to investigate these questions, including tools to collect movement and behavioural data from migratory species, and machine learning techniques to infer behavioural rules from movement data for both individuals and social groups.

Short bio of the speaker: Colin is a Professor in Applied Mathematics in the School of Mathematics and Statistics at the University of Glasgow. He completed his PhD in Applied and Computational Mathematics in 2009 at University College, Dublin and also holds a Masters degree in Engineering from Liverpool University and an MSc in Computational Science from UCD. Colin’s background is in mathematics and high performance computing but now has a strong applied focus in the areas of movement ecology, data science, and AI. In previous roles, he has been a software developer for a financial risk company, a risk analyst for a multi-national hedgefund, and an English and I.T. teacher in Pangani, Tanzania.

  • Wed, Oct-18, 2023, 2-3pm: Professor Christopher Jewell (University of Lancaster) 

Title: Parameter inference in epidemic models incorporating population

Abstract: Epidemic models which incorporate a high level of population heterogeneity are useful for studying the drivers of disease transmission in space, via networks, and due to human behaviour.  To do this effectively, models must be fitted to observations of disease prevalence and incidence in a principled manner.  Inference, however, is complicated by the inability to observed key quantities that would otherwise lead to tractable likelihood functions.  For example, infection times are impossible to observe directly — typically it is only when a subject is tested or reports feeling ill that we know they are infected.  To address this, particle filtering methods have been used successfully to marginalise over censored data in epidemic models, though fail rapidly as model complexity increases.  On the other hand, data-augmentation MCMC methods have proved highly successful in small populations, but lose efficiency rapidly with increasing population size.  This talk presents work inspired by Covid19 and anti-microbial resistance, proposing a new class of data-augmentation algorithms capable of fitting discrete space and time Markov models via constrained and non-centred Metropolis-Hastings proposals.  This approach shows promise for opening up possibilities for inference on nuanced epidemic models in the future, evaluating disease interventions, and supporting public health decision making.

Short bio of the speaker: Chris works at the interface between epidemiology, infectious disease modelling, statistics, and high performance computing. He originally trained as a veterinary surgeon, but became interested in epidemics through his experience working on the foot and mouth disease outbreak in the UK in 2001. He believes strongly in application-focused statistical research and in effective communication of scientific outputs.

As a trained vet, Chris’ interests lie in decision support systems for disease outbreak response, public health and zoonotic diseases. He has applications in communicable diseases such as foot and mouth disease, vector-borne diseases such as theileriosis, and zoonoses such as campylobacteriosis. In computational statistics, he works on MCMC methods for inference on stochastic dynamical models. He has a particular interest in high performance computing techniques for applying modern statistical methods to real-time inference on large population datasets.

  • Wed, Sep-27, 2023, 2-3pm: Dr Cecilia Balocchi, University of Edinburgh

Title: Bayesian Nonparametric Analysis of Spatial Variation with Discontinuities

Abstract: Spatial data often display high levels of smoothness but can simultaneously present abrupt discontinuities, especially in urban environments. We model neighbourhood crime trends over time in the City of Philadelphia by combining a spatial local shrinkage model with spatial partitions of areal units to allow for discontinuities. Two main challenges arise in this setting.   First, the vast space of spatial partitions makes typical stochastic search techniques computationally prohibitive. We introduce an ensemble optimization procedure that summarises the posterior by simultaneously targeting several high probability partitions.   Second, the data are organised in a hierarchical structure with multiple resolution levels. We introduce a model combining the Nested Dirichlet Process with the Hierarchical Dirichlet Process to allow for flexible partitions of multi-resolution data and sharing of information between the partitions at different resolutions.   Both our methods are demonstrated on synthetic data and on real data in Philadelphia.

Short bio: Cecilia is a Lecturer in Statistics in the School of Mathematics at the University of Edinburgh. She completed her PhD in 2020 from the University of Pennsylvania, and her dissertation received the 2021 Savage Award in Applied Methodology. Her research interests include Bayesian nonparametrics, model-based clustering, and spatial methods, with applications in urban analytics, maternal health, and genomics.

Related papers:

1. Cecilia Balocchi, Sameer K. Deshpande, Edward I. George & Shane T. Jensen (2023) Crime in Philadelphia: Bayesian Clustering with Particle Optimization, Journal of the American Statistical Association, 118:542, 818-829, DOI: 10.1080/01621459.2022.2156348


  • Wed, Sep-20, 2023, 2-3pm: Professor Mario Cortina Borja (University College London)

Title: Modelling high–dimensional time series with generalised network autoregressive processes

in collaboration with Guy Nason and Daniel Salnikov (Department of Mathematics, Imperial College London)

Abstract: In this talk we will present applications and extensions of gener- alised network autoregressive (GNAR) processes taking advantage of the corresponding R package introduced by Knight et al (J Statistical Software, 96:5, 2020). GNAR models are remarkably parsimonious and facilitate modelling high dimensional time series data. We will present the network autocorrelation and the partial network autocor- relation functions for multivariate time series and introduce the Corbit and Wagner plots which serve as visual diagnostics for GNAR model selection. We will describe modelling the daily number of COVID-19 patients transferred to mechanical ventilation beds in NHS Trusts in England, and forecasting yearly livebirths in Spanish provinces.

About the speaker: Mario Cortina Borja is chair of Significance’s editorial board, and professor of biostatistics in the Population Policy and Practice Teaching and Research Department at the Great Ormond Street Institute of Child Health, University College London. He studied Actuarial Science and Statistics at the Universidad Nacional Autónoma de México, and the University of Bath. Before coming to UCL in 2000, Mario was a statistician at the Instituto de Investigaciones Antropológicas in Mexico; a research officer at the School of Chemical Engineering in the University of Bath; a lecturer, then senior lecturer, in Statistics at the Instituto Tecnológico Autónomo de México; and, for five years, consulting and teaching officer in the Department of Statistics, University of Oxford.

  • Wed, Sep-13, 2023, 2-3pm: Dr Chrissy Fell (University of St Andrews), Dr Ben Baer (University of St Andrews)

Speaker: Chrissy Fell

Title: Examples of deep learning applied to medical and ecological images.

Abstract: In this talk I will discuss three projects I have been working on applying deep learning to images. The first project is creating an automated classifier for images of endometrial and cervical biopsies that allows prioritisation of pathology workloads. The second project I will talk about is automatically detecting animals in aerial images. Finally I will explain my recent work on using brain MRI scans from the UK Biobank to classify if someone is at high or low genetic risk of mental health condition.

Speaker: Ben Baer

Title: Some problems I’m working on

Abstract: The talk has three parts in which an overview of an estimation framework I now commonly use is sandwiched by some problems I’m working on. In the first part, a problem involving a barely identified discrete parameter in a highly structured model is presented. After, the failure of various Bayes estimators with non-informative priors is briefly described. In the second part, some aspects of data coarsening and semi- and non-parametric efficiency theory are explained alongside examples from causal inference and survival analysis. In the third part, several ongoing projects involving coarsening or efficiency theory are presented with a frame or two per project. The audience is encouraged to frequently stop me with comments and questions.


  • Wednesday, Apr-26, 2023, 2-3pm: Magnus Rattray, University of Manchester
Title: Gaussian process methods for modeling temporal and spatial gene expression changes
Abstract: Gaussian process (GP) inference provides a flexible nonparametric probabilistic modelling framework that is well suited to modelling spatial and temporal data. I will provide a brief tutorial introduction to GP inference and present some applications to problems in single-cell and spatial transcriptomics data analysis and modelling. GPs can be used to incorporate prior knowledge into trajectory inference from single-cell data, e.g. to model periodic trajectories or to include capture time labels into trajectory
inference. GPs also provide a natural framework for modelling branching trajectories and can be used to infer the ordering of gene branching events along pseudotime. RNA-Seq data are typically summarised as counts and we have implemented a negative binomial likelihood that can be used to improve the performance of GP inference methods.
[1] BinTayyash, N., Georgaka, S., John, S. T., Ahmed, S., Boukouvalas, A., Hensman, J., & Rattray, M. (2021). Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments. Bioinformatics, 37(21), 3788-3795.
[2] Ahmed, S., Rattray, M., & Boukouvalas, A. (2019). GrandPrix: scaling up the Bayesian GPLVM for single-cell data. Bioinformatics, 35(1), 47-54.
[3] Boukouvalas, A., Hensman, J., & Rattray, M. (2018). BGP: identifying gene-specific branching dynamics from single-cell data with a branching Gaussian process. Genome biology, 19, 1-15.
About the speaker: Magnus Rattray is Professor of Computational & Systems Biology, and Director of the Institute for Data Science & Artificial Intelligence at the University of Manchester. He is co-president of the Machine Learning for Computational Systems Biology (MLCSB) COSI of the major international bioinformatics conference ISMB and is a Fellow of the European Laboratory of Learning and Intelligent Systems (ELLIS) associated with the ELLIS Health Programme. His research group is funded by the Wellcome Trust, UKRI, CRUK and EU. Magnus uses probabilistic modelling and Bayesian inference techniques to study biological systems across a broad range of temporal and spatial scales, from gene expression in single cells to longitudinal population health data. Recent work includes methods to uncover oscillations from single-cell imaging time course data and the development of scalable Gaussian process models for pseudotime and branching process inference using single-cell omics data
  • Wednesday, Mar-1, 2023, 2-3pm: Rachel McRea, University of Lancaster 

Title: Statistical models for conservation translocations

Collaborative work with: Katie Bickerton (University of Kent), Fay Frost (Lancaster University), Stefano Canessa (Bern University), John Ewen (ZSL)

Abstract: Conservation translocations are being increasingly used in the conservation of threatened species and as part of ecological restoration programmes (Bickerton et al, 2022). Robust estimates of abundance are essential for meaningful conservation decision-making and the impact of translocations on source populations needs to be understood.  Within this talk I will present a new capture-recapture model for translocated populations and will then present a modelling framework where capture-recapture is combined with removal/depletion methodology (Zhou et al, 2019).  I will demonstrate that an exact likelihood is possible when individual level information is available, and I will show how a standard integrated population modelling approach (Frost, et al, 2022), which assumes independence between component data sets, can be adapted to provide an approximate likelihood when individual level data is not available.  This approach, as well as providing a valuable tool for estimating the abundance of source populations post translocation, also motivates a new direction of research for overcoming issues of dependence of data within a standard integrated population modelling framework.

Bickerton, K., Ewen, J.G., Canessa, S., Cole, N., Frost, F., Mootoocurpen, R. and McCrea, R.S. (2022) Estimating population size of translocated populations: a modification of the Jolly-Seber model. Submitted.

Frost, F., McCrea, R.S., King, R., Gimenez, O. and Zipkin, E. (2022) Integrated population models: Achieving their potential.Journal of Statistical Theory and Practice. In press.

Zhou, M., McCrea, R.S., Matechou, E., Cole. D.J. and Griffiths, R.A., (2019) Removal models accounting for temporary emigration. Biometrics. 75, 24-35.

About the speaker: Rachel McCrea is a Professor of Statistics at Lancaster University. She is currently the Director of the National Centre for Statistical Ecology. She was also elected Fellow of the Learned Society of Wales.

Professor McCrea’s research includes developing new methods for model selection and diagnostic testing for a class of models which are fitted to capture-recapture data. Capture-recapture studies involve the capture and unique marking of wild animals, which are then released back into the population; subsequently attempts are made to recapture them. Models can be used to estimate demographic parameters which are essential to understand the viability of the populations and drivers of population change.

More recently Professor McCrea’s research has moved into the related field of multiple systems estimation and she has worked on problems such as estimating the number of victims of human trafficking. She is the Sciences Theme Lead for the University of Lancaster Migration and Movement Signature Research Theme.

  • Wednesday, Feb-22, 2023, 3-4pm: Victor Elvira, University of Edinburgh

Title: State-Space Models as Graphs

Abstract: Modeling and inference in multivariate time series is central in statistics, signal processing, and machine learning. A fundamental question when analyzing multivariate sequences is the search for relationships between their entries (or the modeled hidden states), especially when the inherent structure is a directed (causal) graph. In such context, graphical modeling combined with parsimony constraints allows to limit the proliferation of parameters and enables a compact data representation which is easier to interpret in applications, e.g., in inferring causal relationships of physical processes in a Granger sense. In this talk, we present a novel perspective consisting on state-space models being interpreted as graphs. Then, we propose two novel algorithms that exploit this new perspective for the estimation of the linear matrix operator in the state equation of a linear-Gaussian state-space model. Finally, we discuss the extension of this perspective for the estimation of other model parameters in more complicated models.

  • Wednesday, Jan-18, 2023, 2-3pm, Dr Chih-Li Sung, Assistant Professor, Department of Statistics and Probability, Michigan State University

Title: When epidemic models meet statistics: understanding COVID-19 outbreak

Abstract: As the coronavirus disease 2019 (COVID-19) has shown profound effects on public health and the economy worldwide, it becomes crucial to assess the impact on the virus transmission and develop effective strategies to address the challenge. A new statistical model derived from the SIR epidemic model with functional parameters is proposed to understand the impact of weather and government interventions on the virus spread in the presence of asymptomatic infections among eight metropolitan areas in the United States. The model uses Bayesian inference with Gaussian process priors to study the functional parameters nonparametrically, and sensitivity analysis is adopted to investigate the main and interaction effects of these factors. This analysis reveals several important results including the potential interaction effects between weather and government interventions, which shed new light on the effective strategies for policymakers to mitigate the COVID-19 outbreak. paper

Bio: Chih-Li Sung is an Assistant Professor in the Department of Statistics and Probability at Michigan State University. His research interests include computer experiment, uncertainty quantification, machine learning, big data, and applications of statistics in engineering. He was awarded Statistics in Physical Engineering Sciences (SPES) Award from ASA in 2019. He is currently an associate editor for Technometrics and Computational Statistics & Data Analysis (CSDA). His research is supported by NSF DMS 2113407. Chih-Li Sung received a Ph.D. at the Stewart School of Industrial & Systems Engineering at Georgia Tech in 2018. He was jointly advised by Profs. C. F. Jeff Wu and Benjamin Haaland. He also received a B.S in applied mathematics and an M.S. in statistics from National Tsing Hua University in Taiwan in 2008 and 2010, respectively.

  • Wednesday, 16-Nov, 2022, in-person talk, Dr Pantelis Samartsidis, Investigator Statistician, MRC Biostatistics Unit, University of Cambridge

Title: A Bayesian multivariate factor analysis model for causal inference using time-series observational data on mixed outcomes

Abstract: Assessing the impact of an intervention by using time-series observational data on multiple units and outcomes is a frequent problem in many fields of scientific research. Here, we propose a novel Bayesian multivariate factor analysis model for estimating intervention effects in such settings and develop an efficient Markov chain Monte Carlo algorithm to sample from the high-dimensional and non-tractable posterior of interest. The proposed method is one of the few that can simultaneously deal with outcomes of mixed type (continuous, binomial, count), increase efficiency in the estimates of the causal effects by jointly modelling multiple outcomes affected by the intervention (as shown via a simulation study), and easily provide uncertainty quantification for all causal estimands of interest. Using the proposed approach, we evaluate the impact that Local Tracing Partnerships had on the effectiveness of England’s Test and Trace programme for COVID-19.

  • Wednesday, 09-Nov, 2022, online talk, Prof Chris Holmes, Professor in Biostatistics at the Departments of Statistics and the Nuffield Department of Medicine, University of Oxford

Title: Bayesian Predictive inference

Abstract: De Finetti promoted the importance of predictive models for observables as the basis for Bayesian inference. The assumption of exchangeability, implying aspects of symmetry in the predictive model, motivates the usual likelihood-prior construction and with it the traditional learning approach involving a prior to posterior update using Bayes’ rule. We discuss an alternative approach, treating Bayesian inference as a missing data problem for observables not yet obtained from the population needed to estimate a parameter precisely or make a decision correctly. This motivates the direct use of predictive models for inference, relaxing exchangeability to start modelling from the data in hand (with or without a prior). Martingales play a key role in the construction. This is joint work with Stephen Walker and Edwin Fong, based on the paper “Martingale Posteriors” to appear with discussion JRSS Series B.

  • Wednesday, 02-Nov, 2022, joint with CREEM, Dr Wei Zhang, Lecturer in Statistics, School of Mathematics and Statistics, University of Glasgow

Title: A flexible and efficient Bayesian implementation of point process models for spatial capture-recapture data

Abstract: Spatial capture-recapture (SCR) is now routinely used for estimating abundance and density of wildlife populations. A standard SCR model includes sub-models for the distribution of individual activity centres and for individual detections conditional on the locations of these activity centres. Both sub-models can be expressed as point processes taking place in continuous space, but there is a lack of accessible and efficient tools to fit such models in a Bayesian paradigm. In this talk, I will describe a set of custom functions and distributions to achieve this. Our work allows for more efficient model fitting with spatial covariates on population density, offers the option to fit SCR models using the semi-complete data likelihood (SCDL) approach instead of data augmentation, and better reflects the spatially continuous detection process in SCR studies that use area searches. In addition, the SCDL approach is more efficient than data augmentation for simple SCR models while losing its advantages for more complicated models that account for spatial variation in either population density or detection. I will present the model formulation, test it with simulations, quantify computational efficiency gains, and conclude with a real-life example using non-invasive genetic sampling data for an elusive large carnivore, the wolverine (Gulo gulo) in Norway.

  • Wednesday, 26-Oct, 2022, JJ Valletta Memorial lecture: Dr TJ McKinley, Lecturer in Mathematical Biology, Department of Mathematics and Statistics, University of Exeter.  In person in Lecture Theatre D, Mathematical Institute.

Title: Emulation-driven inference for complex spatial meta-population models

Abstract: Calibration of complex stochastic infectious disease models is challenging. These often have high-dimensional input spaces, with the models exhibiting complex, non-linear dynamics. Coupled with this is a paucity of necessary data, resulting in a large number of hidden states that must be handled by the inference routine. Likelihood-based approaches to this missing data problem are very flexible, but challenging to scale due to having to monitor and update these hidden states. Methods based on simulating the hidden states directly from the model-of-interest have the advantage that they are often much more straightforward to code, and thus are easier to implement and adapt to changing model structures. However, they often require very large numbers of simulations in order to adequately explore the input space, which can render them infeasible for many large-scale problems.

This seminar will be given in the memory of our colleague JJ Valetta, who suddenly passed away while hillwalking in October 2020.  The seminar will be followed by a reception in the Mathematical Institute Common Room (on the ground floor).  All are welcome, both to the lecture and the reception!

  • Wednesday, 19-Oct, 2022, In-person talk joint with CREEM, Dr Ben Swallow, Lecturer in Statistics, University of St Andrews

Title: Bayesian causal inference for zero-inflated GLMs using a potential outcomes framework’

Abstract: We propose a method for conducting Bayesian causal inference under a generalised linear model potential outcomes framework, for data where there are many more zeros than would naturally be expected. We develop an approach using both semi-continuous and fully continuous probability distributions and apply the approach to both simulated data and ornithological citizen science data in the UK, comparing the results to purely observational studies. Further analyses of the contrasting GLMs are also discussed.

  • Wednesday, 05-Oct, 2022, Online talk (attending from Maths Tutorial Room 1A), Prof Alexandros Beskos, Professor in Statistics, UCL

Title: Manifold Markov chain Monte Carlo methods for Bayesian inference in diffusion models

Abstract: Bayesian inference for nonlinear diffusions, observed at discrete times, is a challenging task that has prompted the development of a number of algorithms, mainly within the computational statistics community. We propose a new direction, and accompanying methodology, borrowing ideas from statistical physics and computational chemistry, for inferring the posterior distribution of latent diffusion paths and model parameters, given observations of the process. Joint configurations of the underlying process noise and of parameters, mapping onto diffusion paths consistent with observations, form an implicitly defined manifold. Then, by making use of a constrained Hamiltonian Monte Carlo algorithm on the embedded manifold, we are able to perform computationally efficient inference for a class of discretely observed diffusion models. Critically, in contrast with other approaches proposed in the literature, our methodology is highly automated, requiring minimal user intervention and applying alike in a range of settings, including: elliptic or hypo-elliptic systems; observations with or without noise; linear or non-linear observation operators. Exploiting Markovianity, we propose a variant of the method with complexity that scales linearly in the resolution of path discretisation and the number of observation times. The talk is based on a forthcoming JRSSB paper:

  • Wednesday, 28-Sept, 2022, In-person talk, Rachel Phillip, Medical Statistician, Clinical Trials Research Unit, University of Leeds

Rachel is an alumnus of our School and the talk should be of interest to both students and staff.

Title: Working as a statistician on Phase I cancer clinical trials

Abstract: Clinical trials are research studies that are conducted in people in order to study and test new medical treatments. Trials are usually conducted in phases that build on each other, with Phase I trials being the first steps of testing new treatments in people. There is often limited safety information on new treatments, so the primary aims of Phase I studies are to ascertain the safety profile of the intervention and to determine the highest dose that can be given safely without severe side effects that can be taken forward for further investigation in future studies. This talk will provide an introduction to the different areas that a statistician works on in clinical trials, the common statistical designs of Phase I studies as well as talking about CONCORDE – an innovative phase I platform trial testing different drug-radiotherapy combinations.

  • Wednesday, 14-Sept, 2022, Online talk, Dr Dennis Prangle, Senior Lecturer in Statistics, School of Mathematics, University of Bristol

Title: Distilling importance sampling for likelihood-free inference

Abstract: Likelihood-free inference involves inferring parameter values given observed data and a simulator model. The simulator is computer code taking the parameters, performing stochastic calculations, and outputting simulated data. In this work, we view the simulator as a function whose inputs are (1) the parameters and (2) a vector of pseudo-random draws, and attempt to infer all these inputs. This is challenging as the resulting posterior can be high dimensional and involve strong dependence. We approximate the posterior using normalizing flows, a flexible parametric family of densities. Training data is generated by ABC importance sampling with a large bandwidth parameter. This is “distilled” by using it to train the normalising flow parameters. The process is iterated, using the updated flow as the importance sampling proposal, and slowly reducing the ABC bandwidth until a proposal is generated for a good approximation to the posterior. Unlike most other likelihood-free methods, we avoid the need to reduce data to low dimensional summary statistics, and hence can achieve more accurate results.