Brian Zhang's blog

Statistics and other topics

Recent posts

Jul 10, 2018 · 14 min read
Random Graphs and Giant Components This post will introduce some of the ideas behind random graphs, a very exciting area of current probability research. As has been a theme in my posts so far, I try to emphasize a reproducible, computational example. In this case, we’ll be looking at the “giant component” and how that arises in random graphs. There’s a lot more than this example that I find exciting, so I’ve deferred a longer discussion on random graphs to the end of this post, with a lot of references for the interested reader.
Apr 4, 2018 · 8 min read
Distributions with SymPy Any good statistics student will need to do some integrals in her / his life. While I generally feel comfortable with simple integrals, I thought it might be worth setting up a workflow to help automate this process! Previously, especially coming from a physics background, I’ve worked a lot with Mathematica, an advanced version of the software available online as WolframAlpha. Mathematica is extremely powerful, but it’s not open-source and comes with a hefty license, so I decided to research alternatives.
Jan 30, 2018 · 10 min read
Clustering with K-Means and EM Introduction K-means and EM for Gaussian mixtures are two clustering algorithms commonly covered in machine learning courses. In this post, I’ll go through my implementations on some sample data. I won’t be going through much theory, as that can be easily found elsewhere. Instead I’ve focused on highlighting the following: Pretty visualizations in ggplot, with the helper packages deldir, ellipse, and knitr for animations. Structural similarities in the algorithms, by splitting up K-means into an E and M step.
Nov 25, 2017 · 4 min read
Statistics / ML Books At the start of the last post, I talked briefly about courses I’ve been working through. Here are some follow-up thoughts on good books!^[This post is strategically placed so I can cite some of these textbooks in later posts.] This post will focus on textbooks with a machine learning focus. I’ve read less of the classic statistics textbooks, as I hadn’t specialized much in statistics until my PhD. However, these are a few texts that are on my radar to consult:
Nov 9, 2017 · 8 min read
Polynomial Regression Introduction: side courses As a PhD student in the UK system, I was expecting a lot less coursework, with my first year diving straight into research. However, there are still a lot of gaps in my knowledge, so I hope to always be on the lookout for learning opportunities, including side classes. At the moment, I’m hoping to follow along with these three courses and do some assignments from time to time:
Nov 4, 2017 · 2 min read
Blogging Aims Hi there, and thanks for stopping by! In this post, I briefly introduce my current ideas for this blog and say a bit about myself. As of September, I’ve been a first-year PhD student at Oxford’s Statistics department. I received my bachelor’s in Physics from Harvard in 2015, and after working for two years am excited to be back in an academic setting. Part of this transition means more freedom and a lot more self-structured learning time.