Aapeli Vuorinen

Masters Thesis

Over the past few months, I’ve been hectically pushing to get my thesis — titled The Blockchain Propagation Process: a Machine Learning and Matrix Analytic Approach — completed as I’ll be graduating in June this year from the Master of Science program from the University of Melbourne.

Last Friday, I was happy to at last print off a final version of my thesis and hand it in to the university! I have yet to give a presentation about my research and to finish off a couple of exams, but other than that, I’m almost done!

My research

My research was on the blockchain propagation process, that is, the process by which newly mined blocks (for example in the Bitcoin system) propagate from the miner to the rest of the network. This has important implications on the fairness and efficiency of the network, and is one of the large reasons why transfering Bitcoins is a slow task. I modeled this process using a matrix analytic approach and in particular, using phase-tyep distributions. I’ve attached the (somewhat lengthy) abstract below.

I wish to express my gratitude towards Peter Taylor for his patient, inspiring supervision, and for providing me with direction when I needed it through insightful comments and criticism. I would also like to thank my good friend and mentor, Yoni Nazarathy for his continuous encouragement, support, and guidance over the past years, both in mathematics and in life. Finally, I would like to thank the friends and family who supported me through this endeavour.

Abstract

Blockchain, spearheaded by the Bitcoin cryptocurrency, is a novel, emerging technology that allows a distributed network of users to synchronise and maintain a decentralised, global ledger of transactions with no central control or authority. One key part of this technology is the process by which blocks and transactions propagate through the underlying network of users, allowing them to come to a consensus on the state of the ledger and advance the blockchain. In this thesis, we explored the propagation process for Bitcoin within the mathematical framework of matrix analytic processes by performing an observational experiment to collect data and fitting phase-type distributions to this dataset.

It is well known that the propagation delay of blocks is a key limiting factor in the efficiency and scalability of blockchain technology. It is imperative that the network reaches an overall consensus on the current state of the chain of blocks at regular epochs. If blocks propagate faster through the network, then this happens faster and the time between blocks can be decreased. This means that the blockchain can then process more events as well as verify those events faster. Additionally, recent work on a strategy known as selfish-mining has shown that adversarial miners can take advantage of this delay to inflate their share of mining rewards.

We set out to model the block propagation process of Bitcoin blocks. To do so, we set up a distributed, large-scale observational experiment by creating a global data collection system for observing the block propagation process. We collected data on the propagation patterns of 14810 blocks over 5 months from November 2018 until March 2019, and created a novel tool for observing the blockchain from several geographically diverse locations.

We fitted Coxian phase-type distributions of a varying number of phases to the data and performed model selection, finally choosing a model with \(p=4\) phases. We used a variant of the EM-algorithm originally introduced by Asmussen, and made improvements to the computational run-time of the algorithm. We present the final parameter estimates of our model and discuss the merits and shortcomings of this approach.

This thesis makes three contributions. The first is an open dataset of Bitcoin block propagation data which we have cleaned and pre-processed to include additional covariates extracted from the blockchain. We present a short exploratory data analysis of patterns in the dataset in Part III. Our second contribution is a phase-type model for the block propagation delay, which we discuss in Part II. Finally, we have made an incremental improvement to the EM-algorithm for fitting phase-type distributions by parallelising it and reducing its memory footprint, which is particularly useful for very large, “big data” datasets. We also believe that our exposition describing the Bitcoin ecosystem in Part I is of value.

Website

I’ve set up a dedicated site which contains the tehsis, the cleaned dataset, the source code for data collection, and my improved phase-type fitting algorithm.