SPCE0038 - Machine Learning with Big Data
- 2695389849
- Aug 20, 2021
- 3 min read
Updated: Aug 25, 2021
Question 1
(a) Describe the one-versus-rest (OvR) (also called one-versus-all; OvA) strategy to perform multiclass classification given a binary classifier. [2 marks]
(b) Describe the one-versus-one (OvO) strategy to perform multiclass classification given a binary classifier. [2 marks]
(c) Given N classes, how many binary classifiers are required for the one-versus-rest (OvR) and one-versusone (OvO) strategies. [2 marks]
(d) Specify the advantages and disadvantages of the one-versus-rest (OvR) and one-versus-one (OvO) strategies. [4 marks]
(e) Describe the problems of over-fitting and under-fitting. Also illustrate your descriptions with diagrams. [3 marks]
(f) Describe what a learning curve is and discuss how learning curves can be used to check when models are over-fitted, under-fitted, or well-fitted. Also illustrate your discussion with diagrams. [7 marks]
Question 2
Gradient descent algorithms take a step ↵ in the direction of decreasing gradient, where the update of parameter ✓ is given by a form similar to ✓(t) = ✓(t)
where t denotes iteration number and C the cost function. The variable ↵ is often also called the learning rate. Gradient descent based algorithms are often used to train deep learning models.
(a) Explain what happens if the learning rate is too small. Also illustrate your explanation with a diagram. [2 marks]
(b) Explain what happens if the learning rate is too large. Also illustrate your explanation with a diagram. [2 marks]
(c) How does the ideal value for the learning rate change between early and late stages of training. [2 marks]
(d) Describe how an annealing strategy (i.e. adaptive strategy) may be used to address this issue (the issue highlighted in part (c) directly above). Be sure to discuss the pitfalls to be aware of with this approach. Supplement your description with an equation of an example learning schedule. [5 marks]
(e) Describe three reasons why deep learning has been so effective in recent years. [3 marks]
(f) Explain the process of transfer learning and the conditions required for transfer learning to be effective. Illustrate your explanation with a diagram. [6 marks]

Question 3
Consider the binary logistic regression cost function
(a) How are the predicted probabilities pˆ defined in terms of the model parameters ✓ and features x. [2 marks]
(b) Define the cost function penalty terms for the cases where the class targets are 0 and 1, respectively, i.e. for y = 0 and y = 1. Plot these penalty terms and explain why they make intuitive sense. [6 marks]
(c) Explain why the second term in the cost function, i.e. the Pn j=1 ✓2 j term, is included. [2 marks]
(d) Derive the unconstrained support vector machine (SVM) cost function[8 marks]
(e) What are the advantages of SVM classification compared to logistic regression? [2 marks]
Question 4
(a) For an ensemble of classifiers describe in words the difference between hard and soft voting, and give a reason why soft voting may be better in some circumstances. [6 marks]
(b) If one is using the same classifier in an ensemble there are two standard ways in which the training set is manipulated to generate an ensemble. State the names of these two methods, and briefly describe their features. [4 marks]
(c) In an ensemble of classifiers one can sample from the features as well as, or instead of, the training set. State the name of the approaches when: (i) One samples from both features and training instances. (ii) One samples from the features only. [2 marks]
(d) Describe what a Random Forest is in words and how it is typically trained. [2 marks]
(e) Consider the following classifier: 1 rnd_clf = RandomForestClassifier ( 2 n_estimators=500, 3 max_leaf_nodes=16, 4 n_jobs=1, 5 random_state=42)
(i) How many individual classifiers are being used in the ensemble.
(ii) What does max leaf nodes mean?
(iii) If one wanted to parallelise this code, how could one do this? (iv) What would happen if the random state input was not specified? [6 marks]
Comments