Geometric Introduction to Markov Chain Monte Carlo
MCMC is a foundational concept of machine learning. Animations in Excel provide simple conceptual understanding.
Description
The objective of the course is to explain Markov Chain Monte Carlo (MCMC) in a manner that is simple to understand. This is achieved by using graphic animations.
MCMC is a probability technique to solve problems that could be considered ‘complex’ such as integration of multidimensional functions. In this course we focus on simple examples of MCMC and discuss how these simple examples can be extended to more complex problems.
There is limited mathematics contained in the explanations.
SECTION 1. INTRODUCTION
The Introduction section provides preliminary concepts without going into the MCMC algorithms. You will learn what is meant by Markov Chain Monte Carlo – which is broken down into two concepts: Markov Chain, and Monte Carlo. A Markov Chain is a process where the value of some variable depends on its previous value - rather than many of its previous values. A Monte Carlo process simply means a random process.
Before focusing on MCMC non-MCMC approaches are discussed, which here is called a linear process, where samples are not taken randomly but systematically. You will learn (at an overview level) the weaknesses and strengths of the systematic approach compared to MCMC.
Many texts on MCMC go straight into detailed mathematics. This course is different because it uses geometric animations to explain the concepts. In the introduction, you will learn the foundational basis for the graphic and animation methodology.
1.1 Lecture 1 Introduction Lecture – What is Markov Chain Monte Carlo?
You will learn what is meant by Markov Chain Monte Carlo – which is broken down into two concepts: Markov Chain, and Monte Carlo. A Markov Chain is a process where the value of some variable depends on its previous value - rather than many of its previous values. A Monte Carlo process simply means a random process.
In the lecture you will learn what is meant by a multivariate distribution and the scope of the course.
The course uses simple geometric applications and the strengths and weaknesses of this approach is discussed. In short the geometric approach is useful for a ‘first look’ at MCMC rather than an in-depth course.
The instructor is providing this course as he is actively developing Machine Learning algorithms. MCMC is a foundational methodology for Machine Learning. This course does not explain precisely how MCMC is used in Machine Learning, but you will learn that MCMC is a prerequisite for some later Machine Learning courses.
You will learn the difference between random sampling and systematic uniform sampling.
1.2 Lecture 2 Main Workbook Setup
The participant needs to set up the appropriate files in Excel. There are only three files used:
MCMC.xlsm
MCMC_Simple.xlsm
Xlshape.bmp
MCMC.xlsm is the main file. You will learn how to ensure that macros are enabled. The software language used for the animations and simulations is VBA. You will be exposed to the code but not in depth. You will learn the concepts of how: the Freeform shape in Excel is read in; and how a mirror image in an external file is read in. By reading in the mirror external file one can identify the pixels of the Freeform shape.
1.3 Lecture 3 Linear Method
A brief overview of a geometric MCMC simulation is provided. This simulation includes animation.
For the geometric case study you will learn how to apply a systematic (i.e. non-random) strategy.
You will learn the approach of applying the command buttons in MCMC.xlsm.
1.4 Lecture 4 The Excel Animation
In this lecture you will learn more about the process of tracking points (which is a form of animation) and identifying the position of each of the points created. You will also learn how to calculate the mean and variance of the created points.
SECTION 2. SIMPLE MCMC
There are two main MCMC algorithms considered in this course: Gibb Sampling and the Metropolis-Hastings algorithm. Gibbs Sampling is the main focus of this section. You will learn how Gibbs Sampling is related to random walk and some of the historic basis of Gibbs Sampling. You will learn some of the simple problems Gibbs sampling can be applied to – although the main focus is the geometric animation.
2.1 Lecture 5 Random Walk
A random walk is the simplest form of MCMC. So in this lecture you will learn how to apply a very simple MCMC algorithm. The random walk is a simple form of Gibbs Sampling. You will learn how the random walk is applied to the geometric shape, that is if a point chosen is outside the shape it is not accepted.
2.2 Lecture 6 Gibbs Sampling discussion
You will learn some of the history of Gibbs sampling. You will learn that MCMC existed centuries prior to MCMC being formally named as such. In addition, you will learn that Gibbs sampling was named after a theoretical mathematical physicist only because there are analogies between Gibbs sampling and theoretical physics. You will briefly consider simple MCMC problems such as gambling.
2.3 Lecture 7 Probability in Markov Chains
It is often difficult to read texts available on the internet because they do not explain the different types of probability functions that MCMC can be applied to. In this lecture you will learn the three classes of probability functions that are defined:
The transition probability function
The main probability function.
The combined probability function.
The main probability function can be a conditional function or a non-conditional function. A non-conditional function is one where a new value chosen randomly does not depend on the previously chosen value.
You will learn that if the main probability function is linked with the transition function to form the combined probability function that the combined probability function will be a conditional function.
2.4 Lecture 8 Calculating the mean and variance.
You have learnt how to apply Gibbs Sampling to a ‘natural’ conditional probability function, or transition function. In his lecture, you will learn how to calculate the mean and variance of both a conditional probability and a non-conditional probability function. For the latter Gibbs Sampling requires the use of a transition probability.
Hence in summary you will learn how to calculate the mean and variance of a multi-dimensional probability function within an enclosed multidimensional space.
SECTION 3 METROPOLIS-HASTINGS (MH) ALGORITHM
MH is the main algorithm used by the instructor. It is designed to be applied to non-conditional probability functions. Although Gibbs sampling can also be applied to non-conditional probabilities, MH generates points that are more centred near the mode, and as a result MH is (in many cases) more efficient. Because the examples in this course are simple, rather than large-dimensions the full benefit of this approach is not completely realised – only discussed. It is expected that the participant will learn at a conceptual level the difference between MH and Gibbs Sampling sufficient to be able to understand the textbooks far more easily.
3.1 Lecture 9 The Metropolis-Hastings algorithm (MH)
MH is the main algorithm used by the instructor. It is designed to be applied to non-conditional probability functions. MH links a nonconditional function with a transition function to create a conditional function. Hence the MH algorithm converts a problem so that Gibbs sampling can be applied. Here you will learn how to link a nonconditional function with a transition function. You will learn that the mathematics behind MH is very easy and required very little coding.
3.2 Lecture 10 Simple MH Exercise
Before considering the geometric exercise, we consider a simpler one-dimensional example. With this example, you will learn precisely how the MH algorithm is applied. Although there is some VBA code, it is less than 100 lines, and any participant familiar with software development will easily understand the VBA code.
You will learn that the effect of using MH is that once MH is applied to a function, the mean and variance of the function is calculated by using the mean and variance of the simulated points. The function itself is not required.
3.3 Lecture 11 Geometric MH Exercise
This lecture contains the principal exercise of the course. You will learn how MH can be applied to a multidimensional function within a multidimensional space. You will see how the chosen particles are tracked as they move randomly within the geometric shape. You will learn how to calculate the mean and variance of the main probability function by simply calculating the mean and variance of the points.
3.4 Lecture 12 VBA code for MH algorithm
The course does not go through the VBA code in depth. In this lecture, some of the details of the VBA code are explained. The focus of this explanation is the MH algorithm. The participant (if from a software development background) will learn how to integrate MH into code to be applied to other MCMC problems.
3.5 Lecture 13 Applying MH to a collection of points.
Thus far, MH (and other MCMC algorithms) have been applied to single points which then move randomly. You will learn an alternative strategy which is to use a collection of points (which then moved randomly). In this lecture the advantages and disadvantages of this alternative approach is discussed.
SECTION 4 CONCLUSIONS
In this section we will summarise what you have learnt in the course.
4.1 Lecture 14 Conclusions
The participant will have learnt the foundations of MCMC and can now read textbooks (and similar) with much greater understanding. A student studying MCMC in a University course will now have a huge advantage (compared to colleagues) in understanding the foundation concepts.
SECTION 5 ADDITIONAL LECTURES
In this section we introduce lectures that are related to the course – but if included in the course proper may have extended the scope beyond its original objective. You will learn some applications of MCMC, the mathematical basis of MH and discuss related courses under development.
5.1 Lecture 15 Applications of MCMC
Whilst there are many applications of MCMC, the instructor is largely limiting a discussion about applications that he has worked on.
These include:
Geometric Probability
Comminution
Mineralogy
Games – Mastermind
5.2 Lecture 16 Proof of MH algorithm
The student will have learnt that a key property of the MH algorithm is that the distribution of points is weighted to the function itself. The proof of this theory is difficult – so in this lecture the proof is given only for a 2-value problem; and can be considered a conceptual proof only. However guidance is given to those who are familiar with advanced multivariate mathematics (eigenvalues and eigenvectors) on how to solve the problem more formally.
5.3 Lecture 17
What You Will Learn!
- Understand what is meant by Markov Chain Monte Carlo (MCMC).
- Apply and understand random walk methods.
- Solve provided problems using MCMC.
- Understand the two main algorithms (Gibbs Sampling and Metropolis-Hastings algorithm) and the applicability of them.
- Understand what is meant by conditional probability, transitional probability and non-conditional probability.
- Understand that although MCMC is directly relevant to conditional problems, it can also be applied to non-conditional problems.
- For VBA developers understand how to create animations useful for explaining mathematical concepts.
- Learn how to apply the Metro-Hastings algorithm to problems – and to expose the VBA code that focuses on the application of Metropolis-Hastings algorithm.
- Although the code is written in VBA it would be obvious to a developer how the methodology is applicable to other software languages.
Who Should Attend!
- The course is largely non-mathematical; and should therefore be understandable to most participants – although it is suggested it would be of most interest to those with mathematical interests such as high level High School students, Mathematicians and Engineers at both graduate and professional level.
- It is strongly suggested that most Machine Learning experts would benefit from the course as MCMC is a foundational concept of machine learning.