top of page

ST451 BAYESIAN MACHINE LEARNING

1.

Consider a linear regression model where the response variable is y = (y1,...,yn) and there is one feature x = (x1,...,xn). We are interested in fitting a model where

where the error terms are independent and distributed according to the Normal distribution with mean 0 and known variance.In other words, the yi’s are independent given x, and distributed according to the Normal distribution with mean and known variance.

(a) Derive the likelihood function for the unknown parameter.[2 marks]


(b) Derive the Jeffreys prior for.Use it to obtain the corresponding posterior distribution.[5 marks]


(c) Consider the normal distribution prior forwith zero mean and variance.Use it to obtain the corresponding posterior distribution.[5 marks]


(d) Consider the least squares criterion.

and show that the estimator of that minimises equation (1), also maximises the likelihood function derived in part (a). Derive this estimator. In addition, consider the following penalised least squares criterion.[4 marks]


(e) Find a Bayes estimator for each of the posteriors in parts (b) and (c) and compare them with the estimators of part (d)[5 marks]


(f) Let yn+1 represent a future value from the same model given the corresponding value of the predictor xn+1. Find the posterior predictive distribution of yn+1 for one of the posteriors in parts (b) or (c). [4 marks]



2.

(a) Consider the following classification problem where the aim is to determine whether a person should take a Covid-19 test based on four binary features. These features include common Covid-19 symptoms such as B: Breathing problems, C: Cough, L: Loss of smell and F: Fever. A training sample is given, consisting of 6 persons that were asked to report if they had any of these symptoms, responding 1 if they had that symptom and 0 otherwise, and took a Covid-19 test afterwards that either had a positive or negative result. The data of this training sample are summarised in the table below:

Suppose that a person has all the above symptoms, i.e. breathing problems, cough, loss of smell and fever. We would like to use a naive Bayes classifier, trained on the above sample, and follow its classification to decide whether she should take the test or not.

i. What would the classification of the Naive Bayes classifier be using maximum likelihood? Provide justification for your answer.


ii. What would the recommendation of the Naive Bayes classifier be using Laplace smoothing withProvide justification for your answer.


iii. Reformulate the problem, without making the assumption of the naive Bayes classifier, and explain the difficulties in this case. [13 marks]

(b) Consider the following model

Show that this is a linear Gaussian state space model by writing down the state and space equations and identifying the matrices required for the Kalman filter equations.[12 marks]


3.

Assume that the data x = (x1,...,xn) are independent binary random variables and the distribution of each xi is given by a mixture of two Normal distributions with means µk, k = 0, 1, and variance of 1 in both cases. The model can be written in terms of the augmented likelihood.

where zi is an unobserved latent binary variable that takes the values 0 and 1 to indicate the mixture component k, zi follows the Bernoulli(p) distribution with p known, and N(x|m, S) denotes the probability density function of the Normal distribution with mean m and variance S in terms of the random variable x. The model is completed by assigning improper priors on µ0, µ1 such that ⇡(µ0, µ1) / 1.

(a) Let ✓ = (z1,...,zn, µ0, µ1) and write down the variational Bayes algorithm that approximates ⇡(✓|x) using the mean field approximation.

Provide the necessary derivations and list the steps of the algorithm.


(b) Write down the Gibbs algorithm to sample from ⇡(✓|y). Derive the relevant full conditional distributions and list the steps of the algorithm. [25 marks]


4.

Consider a probit regression model, where the target is the vector y = (y1,...,yn) and there is a single feature x = (x1,...,xn). The yi’s are considered as independent Bernoulli random variables given x and ris the cumulative density function of the standard Normal distribution.

(a) Assign a prior and describe an algorithm to sample from its posterior distribution. Provide the steps of the algorithm in detail.


(b) Now consider the following mixture prior

The notation and N(x|m, S) denotes the probability density function of the Normal distribution with mean m and variance S in terms of the random variable x. Describe the shape of the prior above and provide an interpretation of the parameter. Comment also on how its posterior can be useful in this case.


(c) Describe an algorithm that can be used to sample from the joint posterior.Provide the steps of the algorithm in detail.


(d) Now consider the case of having p features in the model and consider priors that dedicate a separate for each one of them. Describe how the algorithm in part 4(c) above can be used for feature selection. [25 marks]



Comments


bottom of page