Brian C. Zhang- PhD

I am currently a senior computational biologist at Adaptive Biotechnologies on the TCR Discovery Team.
I defended my PhD in 2022. At Oxford, I was a member of Pier Palamara’s group. We developed a method, ARG-Needle, to infer the genome-wide genealogy of a set of genetic samples from genotyping array or sequencing data. We built the genome-wide genealogy of 337K samples in the UK Biobank and scanned the genealogy for associations with 7 complex traits. This yielded more rare and ultra-rare associations than found by genotype imputation, which were overall enriched for loss-of-function variation. See our paper in Nature Genetics or my thesis.
During my PhD, I was supported by the Clarendon Scholarship and was a member of St. John’s College. Before Oxford, I worked for two years as a research engineer at DeepMind in London.
Interests. Adaptive Immune Receptor Repertoires, Autoimmune Disease, Statistical Genetics, Population Genetics, Machine Learning

Projects

Biobank-Scale Ancestral Recombination Graph Inference [code] [blog] In population genetics, the ancestral recombination graph (ARG) captures the history of coalescence, recombination, and mutation events that gives rise to observed genetic data. We developed a method, ARG-Needle, that leverages coalescent modeling for ascertained genotyping array data to infer accurate, biobank-scale ARGs from SNP arrays. We also developed a framework for performing mixed-model association of unobserved variation implied by an inferred ARG. Using these methods, we inferred the ARG of 337,464 individuals in the UK Biobank and performed genealogy-based association of 7 complex traits, recapitulating as well as detecting complementary associations compared to reference-based imputation. As these methods only require SNP array data, we anticipate they will be particularly relevant for populations that are currently undersequenced.
Mathematics of Linear Mixed Models My first PhD project focused on improving linear mixed model association in genetics. Standard inference under the mixed model does not scale to modern genetic datasets, so my PhD supervisors and I were looking to build on past methods like BOLT-LMM and LDpred to further improve the scalability of mixed model association. Although I have put the project on pause, my work in this area led me to write a set of expository notes on the mathematics of linear mixed models. (In progress, last modified February 2020.)
Coconuts and Islanders: A Statistics-First Guide to the Boltzmann Distribution An arXiv writeup presenting the Boltzmann distribution in what I hope is an accessible and intuitive way. I learned this approach from my father and the notes are dedicated to his memory.
Random Graphs and Giant Components [code] An R Markdown blog post introducing the Erdős-Rényi random graph and giant component. I tried to build intuition through figures and animations, but have also linked to further reading on random graphs. Done in my free time during my PhD.

Publications

2023

Biobank-scale inference of ancestral recombination graphs enables genealogy-based mixed model association of complex traits, Nature Genetics. [code] [blog] Brian C. Zhang, Arjun Biddanda, Árni Freyr Gunnarsson, Fergus Cooper, and Pier Francesco Palamara

2019

Coconuts and Islanders: A Statistics-First Guide to the Boltzmann Distribution, arXiv 2019. Brian Zhang

2018

Vector-based navigation using grid-like representations in artificial agents, Nature. [blog] Andrea Banino, Caswell Barry, Benigno Uria, Charles Blundell, Timothy Lillicrap, Piotr Mirowski, Alexander Pritzel, Martin Chadwick, Thomas Degris, Joseph Modayil, Greg Wayne, Hubert Soyer, Fabio Viola, Brian Zhang, Ross Goroshin, Neil Rabinowitz, Razvan Pascanu, Charlie Beattie, Stig Petersen, Amir Sadik, Stephen Gaffney, Helen King, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Dharshan Kumaran

2017

The Kinetics Human Action Video Dataset, arXiv 2017. Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, Andrew Zisserman

D.Phil. Thesis

Biobank-Scale Ancestral Recombination Graphs: Inference and Applications to the Analysis of Complex Traits, University of Oxford, Department of Statistics. Brian C. Zhang