Exercise Regression
BERN02
Format
Work individually or in pairs. If you work in a pair, each person hands in individually, but write in the beginning of the text file whom you have been working with.
Grades
Pass or Fail
Programming language
You can use the language you prefer, but I recommend using R or Python.
To get started with R, I recommend
use it via Posit Cloud (no installation needed, just create an account) and create reports (notebooks) using Quarto
use it on your desktop by installing R and RStudio on your computer, or
use Google Colab. The default in Google Colab is Python. If you want to use R in Google Colab, open a new notebook, go to edit->Notebook settings, select R as runtime type and save.
Material
The data file for this exercise is available on the git repository github.com/luchem/bern02
Local simple regression
Write your own function for performing predictions with Local Regression using one predictor.
The input arguments to the function must be
\(y\) a vector of observations of the response variable
\(x\) a vector of observations of the predictor
\(k\) the number of neighboring points to include in each local regression and
\(x_0\) a vector of values for which a prediction is going to be made
The function shall return
\(pred\) a vector of predicted values, and
\(se\) a vector of standard deviations of the expected value of each predicted value
Apply the function on the Air pollution data set
Load the data set pollution_cleaneddata.csv with meta data pollution_metadata.txt.
Predict Total age-adjusted mortality rate per 100,000 (MORT) for an area with 10, 18 and 25 % of families with income < $3000 (POOR)
Provide predictions with the expected value and the standard error.
Submit lab report on Canvas
Upload the code for the function together with the three predictions in the assignment Exercise: Regression on Canvas.
Write your name and date in the beginning of code and, if applicable, the name of your collaborator.
The exercises/computer labs are designed to be carried out the same day as the lecture. We encourage you to do the exercise on the time assigned for it, to get support from tutors, and submit the report on the morning of the next day the latest.
Curiosa
Local regression is an example of a memory-based method which means that it does not have a fitted model and instead uses the training data to recompute the local fit every time we make a prediction.
The number of neighboring points can be chosen by cross-validation.