This course introduces the fundamental techniques and theory underlying modern regression analysis and statistical machine learning. Through a combination of lectures, R practice sessions, and a final project, students gain both conceptual understanding and hands-on experience in analyzing real data.
Learning Objectives
- Understand the theoretical foundations of regression and classification models.
- Apply penalized regression, tree-based methods, neural networks, and unsupervised learning techniques.
- Develop practical data analysis skills using the R statistical programming language.
Topics Covered
Regression Analysis — Ch. 3 Linear Regression · Ch. 4 Classification · Ch. 5 Resampling Methods · Ch. 6 Penalized Regression · Ch. 7 Regression Splines
Machine Learning — Ch. 8 Tree-Based Methods · Ch. 10 Deep Learning · Ch. 12 Unsupervised Learning
Teaching Method
Approximately 50% lectures and discussions, 50% presentations and R practice. Theoretical concepts are taught through PPT slides; real data analysis skills are acquired through hands-on R exercises.
Textbook
James, Witten, Hastie & Tibshirani, An Introduction to Statistical Learning, 2nd ed. (Springer, 2021) — free PDF
Prerequisites
Introduction to Statistics, Mathematical Statistics. Working knowledge of calculus and linear algebra.
All sessions are 75 minutes. Closed-book quizzes are written exams without notes. Open-book quizzes allow notes but no internet access.
| Wk | Date | Topic | |
|---|---|---|---|
| 1 | Mar 2 (Mon) | Public Holiday Independence Movement Day (substitute) |
No Class |
| 1 | Mar 4 (Wed) | Ch. 3 — Introduction & Simple Linear Regression (I) 3.1: Estimating coefficients, assessing accuracy |
|
| 2 | Mar 9 (Mon) | Ch. 3 — Simple Linear Regression (II) 3.1 (cont.): Confidence intervals, hypothesis tests |
|
| 2 | Mar 11 (Wed) | Ch. 3 — Multiple Linear Regression (I) 3.2: Estimating coefficients, model fit |
|
| 3 | Mar 16 (Mon) | Ch. 3 — Multiple Linear Regression (II) 3.2 (cont.): Variable selection, predictions |
|
| 3 | Mar 18 (Wed) | Ch. 3 — Other Considerations 3.3: Qualitative predictors, interactions, non-linearity |
|
| 4 | Mar 23 (Mon) | Quiz 1 Coverage: Ch. 3 Linear Regression |
Quiz 1 · Closed book |
| 4 | Mar 25 (Wed) | Ch. 4 — Logistic Regression (I) 4.3: Logistic model, MLE, multiple logistic regression · Make-up class Mar 27 |
|
| 5 | Mar 30 (Mon) | Ch. 4 — Logistic Regression (II) 4.3 (cont.): Multinomial logistic regression |
|
| 5 | Apr 1 (Wed) | Ch. 4 — Generative Models for Classification 4.4: LDA, QDA, Naive Bayes |
|
| 6 | Apr 6 (Mon) | Quiz 2 Coverage: Ch. 4 Classification |
Quiz 2 · Closed book |
| 6 | Apr 8 (Wed) | Ch. 5 — Resampling Methods 5.1: Cross-Validation · 5.2: The Bootstrap |
|
| 7 | Apr 13 (Mon) | Ch. 6 — Shrinkage Methods 6.2: Ridge regression, Lasso |
|
| 7 | Apr 15 (Wed) | Ch. 7 — Regression Splines 7.4: Piecewise polynomials, spline basis, natural splines |
|
| 8 | Apr 20 (Mon) | Midterm Exam Coverage: Ch. 3–7 |
Midterm · Closed book |
| 8 | Apr 22 (Wed) | Midterm Exam Coverage: Ch. 3–7 |
Midterm · Closed book |
| 9 | Apr 27 (Mon) | Ch. 8 — Decision Trees 8.1: Regression trees, classification trees, pruning |
|
| 9 | Apr 29 (Wed) | Ch. 8 — Ensemble Methods 8.2: Bagging, Random Forests, Boosting, BART |
|
| 10 | May 4 (Mon) | Quiz 3 Coverage: Ch. 8 Tree-Based Methods |
Quiz 3 · Open book |
| 10 | May 6 (Wed) | Cadet Day |
No Class |
| 11 | May 11 (Mon) | Admissions Outreach Make-up class held Apr 10 |
No Class |
| 11 | May 13 (Wed) | Admissions Outreach |
No Class |
| 12 | May 18 (Mon) | Ch. 10 — Multilayer Neural Networks 10.2: Single & multi-layer perceptrons, activation functions |
|
| 12 | May 20 (Wed) | Ch. 10 — CNNs, RNNs & Fitting 10.3: Convolutional NNs · 10.5: Recurrent NNs · 10.7: Training · Make-up May 22 |
|
| 13 | May 25 (Mon) | Public Holiday Buddha's Birthday (substitute) · Make-up class held May 22 |
No Class |
| 13 | May 27 (Wed) | Quiz 4 Coverage: Ch. 10 Deep Learning |
Quiz 4 · Open book |
| 14 | Jun 1 (Mon) | Ch. 12 — PCA & Matrix Completion 12.2: Principal components · 12.3: Missing values |
|
| 14 | Jun 3 (Wed) | Local Elections Make-up class held Jun 12 |
No Class |
| 15 | Jun 8 (Mon) | Ch. 12 — Clustering Methods 12.4: K-means clustering, hierarchical clustering |
|
| 15 | Jun 10 (Wed) | Final Project Presentations Make-up class held Jun 12 |
Project |
| 16 | Jun 15 (Mon) | Final Exam Coverage: Ch. 8, 10, 12 (+ comprehensive) |
Final · Open book |
| 16 | Jun 17 (Wed) | Final Exam Coverage: Ch. 8, 10, 12 (+ comprehensive) |
Final · Open book |
| 17 | Jun 22 (Mon) | National Trail March |
No Class |
| 17 | Jun 24 (Wed) | National Trail March |
No Class |
| Chapter | Files |
|---|---|
Ch. 1 — Introduction |
Slides |
Ch. 3 — Linear Regression |
Slides |
Ch. 4 — Classification |
Slides |
Ch. 5 — Resampling Methods |
Slides |
Ch. 6 — Linear Model Selection and Regularization |
Slides |
Ch. 7 — Moving Beyond Linearity |
Slides |
Ch. 8 — Tree-Based Methods |
Slides |
Ch. 10 — Deep Learning |
Slides |
Ch. 12 — Unsupervised Learning |
Slides |
2026 Spring
| Item | Date | Format | Files |
|---|---|---|---|
Quiz 1 Ch. 3 — Linear Regression |
Mar 23 | Closed book | ProblemSolution |
Quiz 2 Ch. 4 — Classification |
Apr 6 | Closed book | ProblemSolution |
Midterm Exam Ch. 3–7 |
Apr 20–22 | Closed book | ProblemSolution |
Quiz 3 Ch. 8 — Tree-Based Methods |
May 4 | Open book, closed web | ProblemSolution |
Quiz 4 Ch. 10 — Deep Learning |
May 27 | Open book, closed web | ProblemSolution |
Final Exam Ch. 8, 10, 12 (comprehensive) |
Jun 15–17 | Open book, closed web | ProblemSolution |
Past Exams — 2025 Fall
| Item | Files |
|---|---|
Quiz 1 |
ProblemSolution |
Quiz 2 |
ProblemSolution |
Midterm Exam |
ProblemSolution |
Quiz 3 |
ProblemSolutionMovie.csv |
Quiz 4 |
ProblemSolutioncustomer.csv |
Final Exam |
ProblemSolutionFantasy.csvtravel.csv |
The final project constitutes 20% of the course grade (50% of the final exam component). Students apply statistical learning methods to a real dataset of their choice, producing a written report and an in-class presentation.
Deliverables
- Written Report — Introduce the dataset, describe the methods applied, and interpret the results.
- In-class Presentation — Present findings to the class (Jun 10 & make-up Jun 12).
- R Code — Submit well-commented, reproducible R scripts.
Guidelines
Project guidelines and submission details will be distributed in class. Please contact the instructor if you have questions about dataset selection.