Seminars

Statistics seminars are held on Wednesdays 14:00 – 15:00. Everyone is welcome! We gather for coffee/tea and biscuits for around 15 minutes after the seminar.

The organiser is Ben Baer. Please contact Ben to find out more about the seminars, to suggest a future seminar speaker, or to request joining seminars online.

Most of the seminars this year will be held in-person and few online. The in-person seminars will be held at the Observatory seminar room (except for the JJ Valletta memorial seminar, which is in the Mathematical Institute). Please see below for more details.

Forthcoming statistics seminars 2024-25

- 15th April, Benedict Leonard-Hawkhead, School of Medicine, University of St Andrews
Title: Bridging Data and Practice: Developing Clinically Relevant AI for Glaucoma Care

Abstract: Electronic health records provide a rich source of longitudinal clinical data with significant potential to support artificial intelligence applications in healthcare. However, translating such data into clinically meaningful and implementable tools requires not only robust data curation and modelling, but also an understanding of clinical guidelines and stakeholder perspectives on their use.

This seminar presents a programme of work centred on the development and evaluation of AI for glaucoma. First, a reproducible eye-level cohort was constructed from multi-centre EHR data, enabling longitudinal analysis of disease progression. Using this cohort, a series of statistical and machine learning models were developed to predict visual field loss, demonstrating the feasibility of forecasting clinically relevant outcomes from routinely collected data.

To contextualise these technical advances, a scoping review and thematic analysis synthesised stakeholder perspectives on AI in glaucoma care. Six key themes were identified, but identified limited patient representation and overall stakeholder engagement. To address this gap, primary stakeholder engagement, including surveys, was undertaken with patients and clinicians.

Taken together, this work demonstrates that while EHR-derived machine learning models show promise for predicting glaucoma progression, successful translation into practice depends on aligning technical development with the needs, trust, and experiences of end-users. This seminar will highlight the importance of integrating data science, clinical insight, and stakeholder engagement to enable responsible and effective AI deployment in healthcare.
- 22nd April (CREEM), Aline Magdalena Lee, Norwegian University of Science and Technology
- 29th April, Fabiola Iannarilli, Max Planck Institute of Animal Behavior
- 6th May (CREEM), Fiona Seaton, UK Centre for Ecology & Hydrology

Past seminars

This academic year

10th September: Devin Johnson, NOAA

Title: A Computationally Flexible Approach to Population-Level Inference and Data Integration

Abstract: We propose a multistage method for making inference at all levels of a Bayesian hierarchical model (BHM) using natural data partitions to increase efficiency by allowing computations to take place in parallel form using software that is most appropriate for each data partition. The full hierarchical model is then approximated by the product of independent normal distributions for the data component of the model. In the second stage, the Bayesian maximum a posteriori (MAP) estimator is found by maximizing the approximated posterior density with respect to the parameters. If the parameters of the model can be represented as normally distributed random effects, then the second-stage optimization is equivalent to fitting a multivariate normal linear mixed model. We consider a third stage that updates the estimates of distinct parameters for each data partition based on the results of the second stage. The method is demonstrated with two ecological data sets and models, a generalized linear mixed effects model (GLMM) and an integrated population model (IPM). The multistage results were compared to estimates from models fit in single stages to the entire data set. In both cases, multistageresults were very similar to a full MCMC analysis.

24th September: Cornelia Oedekoven, CREEM

Title: Gibbon monitoring in Laos – challenges and opportunities

Abstract: Monitoring gibbons in their natural habitat is challenging for many reasons. Living in the canopy of the jungle, they are hard to see but easy to hear. Hence, acoustic methods are the preferred method of detection. As human observers are prone to fatigue and other biases, we have developed new acoustic directional recorders and want to test and promote acoustic spatial capture-recapture methods using our recorders as the preferred method for monitoring gibbons. In this seminar I will describe the last few years, the current state and the future of this project. Starting with a description of the recorders, then testing these at Tentsmuir Forest in 2023 and in proper gibbon habitat in the Xe Sap Protected Area in Laos in 2024 and now planning a full-scale survey in Laos in parallel with developing new acoustic processing software. Each of these steps has revealed new questions and challenges including technical, habitat- or software related. Hence, this seminar will not be about new statistical methods, but rather about the challenges of applying these.

- 1st October: Dan Kowal, Cornell University

Title: Facilitating heterogeneous effect estimation via statistically efficient categorical modifiers

Abstract: Categorical covariates such as race, sex, or group are ubiquitous in regression analysis. While main-only (or ANCOVA) linear models are predominant, linear models that include categorical-continuous or categorical-categorical interactions are increasingly important and allow heterogeneous, group-specific effects. However, with standard approaches, the addition of categorical interactions fundamentally alters the estimates and interpretations of the main effects, often inflates their standard errors, and introduces significant concerns about group (e.g., racial) biases. We advocate an alternative parametrization and estimation scheme using abundance-based constraints (ABCs). ABCs induce a model parametrization that is both interpretable and equitable. Crucially, we show that with ABCs, the addition of categorical interactions 1) leaves main effect estimates unchanged and 2) enhances their statistical power, under reasonable conditions. Thus, analysts can, and arguably should include categorical interactions in linear models to discover potential heterogeneous effects—without compromising estimation, inference, and interpretability for the main effects. Using simulated data, weverify these invariance properties for estimation and inference and showcase the capabilities of ABCs to increase statistical power. We apply these tools to study demographic heterogeneities among the effects of social and environmental factors on STEM educational outcomes for children in North Carolina. An R package lmabc is available.

- 29th October: Karla Diaz Ordaz, University College London (JJ Valletta Memorial Lecture)

Title: From causal inference to machine learning and back: a two-way street towards better science

Abstract: Machine learning methods have become established for prediction problems, but there is increasing interest in using these algorithms for causal inference. However, causal effect estimation often involves counterfactuals, and prediction tools from the machine learning literature cannot be used “out-of-the-box” for causal inference.

At the same time, there is an increasing interest in using causal reasoning when building and interpreting machine learning algorithms. Doing so can help reduce unfairness and other algorithmic biases stemming from the training data not being representative of the target population. Causality can also help with interpretability and explainability of machine learning outputs.

In this talk, I will review Causal Machine learning, a framework to `de-bias’ standard machine learning algorithms so they perform well for causal tasks. I will also discuss the role causal inference can play in machine learning to improve fairness and explainability of so-called “black-box” models.

This two-way street opens the way to making better use of the data and obtaining reliable answers to real-life scientific problems, while maintaining good statistical principles.

- 5th November: Fiona Seaton, Centre for Ecology & Hydrology (POSTPONED)

- 12th November: Jere Koskela, Newcastle University

Title: Detecting structural variation in reconstructed genealogies

Abstract: Modelling the latent ancestry of a sample of DNA sequences is a gold-standard data augmentation method in population genetic inference. The genetic ancestry of a sample at a single site of DNA is a tree, but the tree varies along the genome due to a process called recombination. It is now possible to produce fitted ancestries for very large datasets, and a growing number of inference pipelines operate directly on those fitted ancestries. I’ll present the data structure which has made the explosion in scalable ancestral inference possible, and describe a model for the correlation between local ancestral trees at different sites. Comparing fitted ancestries with model-predictions reveals that leading ancestry-fitting approaches suffer from pervasive biases, which turn out to be straightforward to correct. The correlation model also yields a way to identify genomic regions where recombination is suppressed, i.e. where ancestral trees change less frequently than expected, which is often a signal of large-scale rearrangements in DNA. The method identifies 50 such regions in a sample of 2504 human genomes, of which 24 are known structural variants and the remaining 26 appear to be previously unevidenced. This is joint work with Anastasia Ignatieva (Oxford), Martina Favero (Stockholm), Jaromir Sant (Turin), and Simon Myers (Oxford).

- 19th November: Alex Aylward, University of Oxford

Title: Trustworthy figures: the many lives of a eugenic number

Abstract: Eugenics was, is, a numbers game. Its rise in the decades around 1900 coincided with the emergence of modern statistical methods. Early advocates of eugenics attempted to assess the racial ‘value’ of populations through ambitious projects of measurement and quantification. Eugenicists collected data, and they made numbers. But how did they use them? In this talk I ask what it is eugenicists hoped quantification could do for them, via a case-study of a notable eugenic number: 17.4%. For several years, this peculiarly precise figure dominated the Eugenics Society’s campaign to legalise voluntary sterilisation in interwar Britain.Then, suddenly, it disappeared. The rise and fall of this number speak to the contested place of quantification in science and public life. Its curious afterlives, meanwhile, exemplify the power of numbers to survive and evolve beyond the contexts of their production.

- 26th November: Hannah Wauchope, University of Edinburgh

Title: What is a unit of nature? Measurement challenges in the emerging biodiversity credit market (and other musings on biodiversity measurements)

- 3rd December: Janine Illian, University of Glasgow

Title: Spatial point process modelling with covariates

- 4th February, Nathan Kirk, University of St Andrews

Title: Quasi-Monte Carlo Methods via Combinatorial Discrepancy

Abstract: Quasi-Monte Carlo (QMC) methods offer deterministic accuracy improvements over standard Monte Carlo sampling for numerical approximation, but their classical error bound—given by the Koksma-Hlawka inequality and governed by the Hardy-Krause variation—often proves too conservative in practice. In this talk, I present a recent randomized QMC framework that begins with ordinary random samples and partitions them into highly uniform point sets using tools from combinatorial discrepancy. The method introduces a new measure of smoothness, the smoothed-out variation, which captures cancellations ignored by the Hardy-Krause variation formulation and leads to a strictly tighter error bound. I’ll also show how the same construction extends naturally to weighted function spaces, producing sampling nodes that can be tuned to exploit known structure in your problem.

- 11th February, Will Kay, Cardiff University

Title: Quantitative Learning and Literacy in Ecology: Reflections and Future Directions

Abstract: Ecology is increasingly defined by rapid growth in data volume, analytical complexity, novel technologies, and statistical innovation. These developments create opportunities for novel ecological insights and advances, but they also magnify long-standing challenges in quantitative training. In this talk I reflect on recent, international discussions in this area and synthesise contributions from educators, researchers, and practitioners across educational contexts, geographies, and career stages to revisit the ongoing discussion about what quantitative education should look like for ecologists.

Rather than advocating for increasingly complex techniques, focus is needed on the principles required for durable quantitative literacy in a rapidly evolving analytical landscape. Quantitative training should prioritise foundational concepts: how data are generated, how uncertainty arises, how models reflect ecological systems, and how analytical decisions shape inference. As automated tools and artificial intelligence (AI) become more prevalent – presenting both challenges and opportunities – sound conceptual grounding is essential to ensure the judicious use of quantitative methods.

Consistent themes emerging from engagement with educators, learners, practitioners and employers is the importance of people in quantitative education. While some standardisation of the concepts and tools ecologists are taught is welcome, quantitative literacy in ecology is ultimately built through context-rich, people-centred pedagogy that embraces diverse learner backgrounds. Achieving this requires coordinated efforts from institutions and practitioners.

- 18th February (CREEM), Michael Morrissey, School of Biology
Title: The role of OLS in fundamental evolutionary theory
Abstract: Ordinary least squares regressions arise, sometimes without much justification, in several key places in evolutionary theory. This talk will introduce two such cases, and explore whether OLS is unjustified, or if it is just sensible and pragmatic, or if it really is fundamental to how evolution works.
First, the OLS regression of phenotype (measurable traits) on genotype (how many copies of a gene variant an individual has inherited from its parents) is fundamental to defining a quantity called the genetic variance. The genetic variance, in turn, controls the rate of evolution. The definition of genetic variance is due to Fisher, who did not justify the OLS regression. Obviously the notion works: it has been proven time and again in, for example, agricultural genetics – but is it fundamental to evolution, or some useful pragmatic approximation? On close examination, it turns out that the OLS is indeed fundamental, and working out how so answers some outstanding issues in evolutionary genetics.
Second, the vast majority of estimates of the strength and direction of natural selection have been made by the OLS regression of fitness (measures of demographic representation such as survival and fecundity) on traits. That this particular case of OLS regression *is* fundamental to evolutionary theory is already well established. But is OLS as an empirical practice advisable? The vast majority of available biostatistics advice strongly steers empiricists away from using quantities with distributions like fitness measures as response variables in OLS regressions. A bit of theory and some simulations show the OLS behaves extremely well for selection analysis; indeed, perhaps some purported assumptions of the method are a fiction.
- 25th February, Hana Jurikova, School of Earth & Environmental Sciences
Title: Earth’s CO2 and Environmental History: Current Understanding and Future Opportunities
Abstract: Earth’s past atmospheric CO2 concentrations have been estimated from proxies and models, with varying degrees of success, resulting in considerable discrepancies between estimates. The application of boron isotopes to foraminifera has enabled robust reconstructions of CO2 over the Cenozoic (the last ~66 million years), leading to increasing consensus for Earth’s more recent history. Recent advances in applying boron isotopes to brachiopods now allow the extension of robust CO2 reconstructions all the way back to the emergence of complex multicellular life (~500 million years ago), offering unprecedented insight into the role of CO2 in driving environmental, biotic, and climatic change.
The increasing temporal extent and resolution of Earth’s CO2 records open new avenues for quantitative analysis, enabling the application of statistical and modelling approaches. In this talk, I will summarise the current understanding of Earth’s CO2 and environmental history, examine the role of CO2 in major climate transitions and mass extinctions using the latest records (e.g. Jurikova et al. 2025), and discuss emerging opportunities to integrate innovative mathematical and statistical methods to advance our understanding of Earth’s past dynamics.

11th March (CREEM), Eric Peterson, Concordia University
1st April (CREEM), David Borchers, University of St Andrews
8th April (CREEM), Oisin Mac Aodha, University of Edinburgh

Previous academic years

Seminars from previous academic years (since 2022) are listed here.

Seminars

Forthcoming statistics seminars 2024-25

Past seminars

This academic year

Previous academic years

Recent Posts

Recent Comments