top of page

MATH38172 GENERALISED LINEAR MODELS

Updated: Aug 26, 2021

1.

Data have been collected on the following variables in a medical study:

Sex sex of the patient (Male/Female)

Treatment treatment given to the patient (A/B)

Recover did the patient recover? (1 - Yes/0 - No) A generalised linear model has been fitted.

(a) Using the R output below (OUTPUT 1 overleaf), write down the model that has been fitted in equation form. Take care to define any notation and to specify the distribution of any random variables. [3]

(b) Interpret the parameters of the fitted model. [4]

(c) Use the fitted model to predict which treatment is best. [3]

[Total 10 marks]



2.

(a) Explain what is meant by the random component of a generalised linear model. For a given dataset or problem, how would you choose the random component? [4]

Data have been collected about depression in young people, consisting of the following variables: Age, the person’s age, categorised as 12-14, 15-17, or 18-19;

Group, categorised as learning difficulties (coded LD) or serious emotional disturbance (coded SED);

Gender, male or female, coded M or F

Depression, high or low, coded 1 or 0.

Several logistic regression models have been fitted to investigate how the probability of depression depends on the other variables. Each model has a different linear predictor. Some information about the fits is reported below.

(b) Explain how you would choose the best linear predictor for these data from the above choices. Carry out your procedure and report your results. [4]

(c) What other models would you suggest fitting, if any? [2]

[Total 10 marks]

3.

Consider the following generalised linear model:

Yi ∼ Gamma(µi , φ)

g(µi) = β0 + β1xi

with negative reciprocal link, g(µ) = −1/µ.

(a) Show that the score and Fisher information matrix for the β parameters[5]

(b) Explain how these quantities are used in an iterative procedure to estimate the β parameters, and briefly explain how φ can be estimated. [5]

[Total 10 marks]

4.

Two generalised linear models have been fitted to the data (xi , yi), i = 1, . . . , 40, both using the linear predictor η = β0 + β1x + β2x 2 .

(a) Discuss what these plots tell us about the model assumptions. Which of the two models fits best? What is the most likely difference between Model 1 and Model 2? [5]

(b) What other diagnostic plots would you suggest? What features of these plots would suggest problems with the model? [5]

[Total 10 marks]

5.

(a) Show that BinProp(m, µ), µ ∈ (0, 1), m ∈ N, is an exponential dispersion family, clearly stating the constraints on the dispersion parameter. [4]

(b) Suppose that Y ∼ BinProp(m, µ). Without using the fact that this is an exponential dispersion family, show directly that KY (t) = m log(1 − µ + µet/m). [8]

(c) Show that BinProp(m, µ) is the unique exponential dispersion family that (i) has variance function V (µ) = µ(1 − µ) and (ii) satisfies the constraints identified in part (a) above. [8]

[Total 20 marks]

6.

Data have been collected on ear infections experienced by swimmers, consisting of the following variables: NumInfec: the number of ear infections experienced by the swimmer, Swim: how often the person swims in the ocean (occasionally or frequently), Loc: the usual swimming location (Beach/non Beach), and Age: the age group of the swimmer, with levels 15–19, 20–24, and 25–29. Two GLM models have been fitted to these data, with the associated R output given in OUTPUT 6A and OUTPUT 6B overleaf.

(a) Write down in equation form the model that has been fitted in OUTPUT 6A, and interpret the parameters of the model and their estimated values. [5]

(b) Using the model fitted in OUTPUT 6A, calculate a 95% Wald confidence interval for the expected number of infections for an occasional swimmer aged 15-19 who usually swims at the beach. [5]

(c) Explain how to compute a 95% profile likelihood confidence interval for the same quantity. [5]

(d) Carry out a hypothesis test to determine whether the expected number of infections depends on age. Use a 5% significance level and explain your methodology. [5]

[Total 20 marks]

Comments


bottom of page