Exercise Regression

BERN02

Author

Ullrika Sahlin

Format

Work individually or in pairs. If you work in a pair, each person hands in individually, but write in the beginning of the text file whom you have been working with.

Grades

Pass or Fail

Programming language

You can use the language you prefer, but I recommend using R or Python.

To get started with R, I recommend

  • use it via Posit Cloud (no installation needed, just create an account) and create reports (notebooks) using Quarto

  • use it on your desktop by installing R and RStudio on your computer, or

  • use Google Colab. The default in Google Colab is Python. If you want to use R in Google Colab, open a new notebook, go to edit->Notebook settings, select R as runtime type and save.

Material

The data file for this exercise is available on the git repository github.com/luchem/bern02

URL to the data file

Local simple regression

Write your own function for performing predictions with Local Regression using one predictor.

The input arguments to the function must be

  • \(y\) a vector of observations of the response variable

  • \(x\) a vector of observations of the predictor

  • \(k\) the number of neighboring points to include in each local regression and

  • \(x_0\) a vector of values for which a prediction is going to be made

The function shall return

  • \(pred\) a vector of predicted values, and

  • \(se\) a vector of standard deviations of the expected value of each predicted value

Apply the function on the Air pollution data set

Load the data set pollution_cleaneddata.csv with meta data pollution_metadata.txt.

Predict Total age-adjusted mortality rate per 100,000 (MORT) for an area with 10, 18 and 25 % of families with income < $3000 (POOR)

Provide predictions with the expected value and the standard error.

Submit lab report on Canvas

Upload the code for the function together with the three predictions in the assignment Exercise: Regression on Canvas.

Write your name and date in the beginning of code and, if applicable, the name of your collaborator.

The exercises/computer labs are designed to be carried out the same day as the lecture. We encourage you to do the exercise on the time assigned for it, to get support from tutors, and submit the report on the morning of the next day the latest.

Curiosa

Local regression is an example of a memory-based method which means that it does not have a fitted model and instead uses the training data to recompute the local fit every time we make a prediction.

The number of neighboring points can be chosen by cross-validation.