Seminars

Statistics seminars are held on Wednesdays 14:00 – 15:00. Everyone is welcome!  We gather for coffee/tea and biscuits around 15 minutes before the seminar begins.

The organiser is Ben Baer.  Please contact Ben to find out more about the seminars, to suggest a future seminar speaker, or to request joining seminars online.

Most of the seminars this year will be held in-person and few online. The in-person seminars will be held at the Observatory seminar room (except for the JJ Valletta memorial seminar, which is in the Mathematical Institute). Please see below for more details.

Forthcoming statistics seminars 2024-25

    • 26th November: Hannah Wauchope, University of Edinburgh

Title: What is a unit of nature? Measurement challenges in the emerging biodiversity credit market (and other musings on biodiversity measurements)

Abstract: TBA

    • 3rd December: Janine Illian, University of Glasgow

Title: TBA 

Abstract: TBA

Past seminars

This academic year

    • 10th September: Devin Johnson, NOAA

Title: A Computationally Flexible Approach to Population-Level Inference and Data Integration

Abstract: We propose a multistage method for making inference at all levels of a Bayesian hierarchical model (BHM) using natural data partitions to increase efficiency by allowing computations to take place in parallel form using software that is most appropriate for each data partition. The full hierarchical model is then approximated by the product of independent normal distributions for the data component of the model. In the second stage, the Bayesian maximum a posteriori (MAP) estimator is found by maximizing the approximated posterior density with respect to the parameters. If the parameters of the model can be represented as normally distributed random effects, then the second-stage optimization is equivalent to fitting a multivariate normal linear mixed model. We consider a third stage that updates the estimates of distinct parameters for each data partition based on the results of the second stage. The method is demonstrated with two ecological data sets and models, a generalized linear mixed effects model (GLMM) and an integrated population model (IPM). The multistage results were compared to estimates from models fit in single stages to the entire data set. In both cases, multistageresults were very similar to a full MCMC analysis.

    • 24th September: Cornelia Oedekoven, CREEM

Title: Gibbon monitoring in Laos – challenges and opportunities

Abstract: Monitoring gibbons in their natural habitat is challenging for many reasons. Living in the canopy of the jungle, they are hard to see but easy to hear. Hence, acoustic methods are the preferred method of detection. As human observers are prone to fatigue and other biases, we have developed new acoustic directional recorders and want to test and promote acoustic spatial capture-recapture methods using our recorders as the preferred method for monitoring gibbons. In this seminar I will describe the last few years, the current state and the future of this project. Starting with a description of the recorders, then testing these at Tentsmuir Forest in 2023 and in proper gibbon habitat in the Xe Sap Protected Area in Laos in 2024 and now planning a full-scale survey in Laos in parallel with developing new acoustic processing software. Each of these steps has revealed new questions and challenges including technical, habitat- or software related. Hence, this seminar will not be about new statistical methods, but rather about the challenges of applying these.

    • 1st October: Dan Kowal, Cornell University

Title: Facilitating heterogeneous effect estimation via statistically efficient categorical modifiers 

Abstract: Categorical covariates such as race, sex, or group are ubiquitous in regression analysis. While main-only (or ANCOVA) linear models are predominant, linear models that include categorical-continuous or categorical-categorical interactions are increasingly important and allow heterogeneous, group-specific effects. However, with standard approaches, the addition of categorical interactions fundamentally alters the estimates and interpretations of the main effects, often inflates their standard errors, and introduces significant concerns about group (e.g., racial) biases. We advocate an alternative parametrization and estimation scheme using abundance-based constraints (ABCs). ABCs induce a model parametrization that is both interpretable and equitable. Crucially, we show that with ABCs, the addition of categorical interactions 1) leaves main effect estimates unchanged and 2) enhances their statistical power, under reasonable conditions. Thus, analysts can, and arguably should include categorical interactions in linear models to discover potential heterogeneous effects—without compromising estimation, inference, and interpretability for the main effects. Using simulated data, weverify these invariance properties for estimation and inference and showcase the capabilities of ABCs to increase statistical power. We apply these tools to study demographic heterogeneities among the effects of social and environmental factors on STEM educational outcomes for children in North Carolina. An R package lmabc is available.

    • 29th October: Karla Diaz Ordaz, University College London (JJ Valletta Memorial Lecture)

Title: From causal inference to machine learning and back: a two-way street towards better science 

Abstract: Machine learning methods have become established for prediction problems, but there is increasing interest in using these algorithms for causal inference. However, causal effect estimation often involves counterfactuals, and prediction tools from the machine learning literature cannot be used “out-of-the-box” for causal inference. 

At the same time, there is an increasing interest in using causal reasoning when building and interpreting machine learning algorithms. Doing so can help reduce unfairness and other algorithmic biases stemming from the training data not being representative of the target population. Causality can also help with interpretability and explainability of machine learning outputs.  

In this talk, I will review Causal Machine learning,  a framework to `de-bias’ standard machine learning algorithms so they perform well for causal tasks. I will also discuss the role causal inference can play in machine learning to improve fairness and explainability of so-called “black-box” models. 

 This two-way street opens the way to making better use of the data and obtaining reliable answers to real-life scientific problems, while maintaining good statistical principles.  

    • 5th November: Fiona Seaton, Centre for Ecology & Hydrology (POSTPONED)

Title: TBA (POSTPONED)

Abstract: TBA

    • 12th November: Jere Koskela, Newcastle University

Title: Detecting structural variation in reconstructed genealogies

Abstract: Modelling the latent ancestry of a sample of DNA sequences is a gold-standard data augmentation method in population genetic inference. The genetic ancestry of a sample at a single site of DNA is a tree, but the tree varies along the genome due to a process called recombination. It is now possible to produce fitted ancestries for very large datasets, and a growing number of inference pipelines operate directly on those fitted ancestries. I’ll present the data structure which has made the explosion in scalable ancestral inference possible, and describe a model for the correlation between local ancestral trees at different sites. Comparing fitted ancestries with model-predictions reveals that leading ancestry-fitting approaches suffer from pervasive biases, which turn out to be straightforward to correct. The correlation model also yields a way to identify genomic regions where recombination is suppressed, i.e. where ancestral trees change less frequently than expected, which is often a signal of large-scale rearrangements in DNA. The method identifies 50 such regions in a sample of 2504 human genomes, of which 24 are known structural variants and the remaining 26 appear to be previously unevidenced. This is joint work with Anastasia Ignatieva (Oxford), Martina Favero (Stockholm), Jaromir Sant (Turin), and Simon Myers (Oxford).

    • 19th November: Alex Aylward, University of Oxford

Title: Trustworthy figures: the many lives of a eugenic number

Abstract: Eugenics was, is, a numbers game. Its rise in the decades around 1900 coincided with the emergence of modern statistical methods. Early advocates of eugenics attempted to assess the racial ‘value’ of populations through ambitious projects of measurement and quantification. Eugenicists collected data, and they made numbers. But how did they use them? In this talk I ask what it is eugenicists hoped quantification could do for them, via a case-study of a notable eugenic number: 17.4%. For several years, this peculiarly precise figure dominated the Eugenics Society’s campaign to legalise voluntary sterilisation in interwar Britain.Then, suddenly, it disappeared. The rise and fall of this number speak to the contested place of quantification in science and public life. Its curious afterlives, meanwhile, exemplify the power of numbers to survive and evolve beyond the contexts of their production.

Previous academic years

Seminars from previous academic years (since 2022) are listed here.

Recent Posts

    Recent Comments

    No comments to show.