# Resampling-Based Statistics

## Motivation: Fitting Models to Data

Situation considered yesterday: We have data and want to fit a model with certain parameters (e.g., a linear model) β we estimate the parameter.

*Notation:*

data: \(\mathbf{x} = (x_1, \ldots, x_n)\)

model with unknown parameter \(\theta\)

estimate \(\widehat \theta(\mathbf{x})\)

### Working example: Fitting a normal distribution

data: \(\mathbf{x} = (x_1, \ldots, x_n)\)

model: \(\mathcal{N}(\theta, \sigma^2)\), i.e., a normal distribution with unknown mean \(\theta\) (that we want to estimate) and variance \(\sigma^2\) (that we are less interested in)

use empirical mean as estimator: \(\widehat \theta(\mathbf{x}) = \overline{x} = \frac 1 n \sum_{i=1}^n x_i\)

```
using Distributions
using Statistics
using StatsPlots
= Normal(0.0, 1.0)
d = 100
n = rand(d, n)
x = mean(x) ΞΈ
```

*Problem:* Estimator __never__ gives the __exact__ result β if you have random data, also the estimate is random.

*Aim:* Find the distribution (or at least the variance) of the estimator \(\widehat \theta\) in order to get standard errors, confidence intervals, etc.

In some easy examples, you can calculate the distribution of \(\widehat \theta\) theoretically. *Example:* If \(x_i\) is \(\mathcal{N}(\theta,\sigma^2)\) distributed, then the distribution of \(\widehat \theta(\mathbf{x})\) is \(\mathcal{N}(\theta, \sigma^2/n)\). Strategy: Estimate \(\sigma^2\), e.g. via the sample variance \[ \widehat \sigma^2 = \frac 1 {n-1} \sum_{i=1}^n (x_i - \overline{x})^2 \] and take the standard error, confidence intervals, etc. of the corresponding normal distribution.

```
= std(x)
Ο = Normal(ΞΈ, Ο/sqrt(n))
est_d plot(est_d, legend=false)
= quantile(est_d, [0.025,0.975])
ci_bounds vline!(ci_bounds)
```

*Problem:* In more complex examples, we cannot calculate the distribution.

## The βidealβ solution: Generate *new* data

In theory, one would ideally do the following:

- Generate new independent data \(\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \ldots, \mathbf{x}^{(B)}\) (each sample of size \(n\))
- Apply the estimator separately to each sample \(\leadsto\) \(\widehat \theta(\mathbf{x}^{(1)}), \ldots, \widehat \theta(\mathbf{x}^{(B)})\)
- Use the empirical distribution \(\widehat \theta(\mathbf{x}^{(1)}), \ldots, \widehat \theta(\mathbf{x}^{(B)})\) as a proxy to the theoretical one.

```
= 1000
B = zeros(B)
est_vector_new for i in 1:B
= rand(d, n)
x_new = mean(x_new)
est_vector_new[i] end
histogram(est_vector_new, legend=false)
= quantile(est_vector_new, [0.025, 0.975])
ci_bounds_new vline!(ci_bounds_new)
```

*But:* In most real world situation, we can not generate new data if the distribution is unknown. We have to work with the data we have β¦

## The practical solution: Resampling / Bootstrap

*Idea:* Use samples \(\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \ldots, \mathbf{x}^{(B)}\) that are not completely new, but obtained from __resampling__ the original data \(\mathbf{x}\).

*Question:* How can one obtained another sample of the same size \(n\)? \(\leadsto\) (re-)sampling with replacement

The overall procedure is as follows:

- Generate \(B\) samples \(\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \ldots, \mathbf{x}^{(B)}\) of size \(n\) by independently resampling from \(\mathbf{x}\) with replacement.
- Apply the estimator separately to each sample \(\leadsto\) \(\widehat \theta(\mathbf{x}^{(1)}), \ldots, \widehat \theta(\mathbf{x}^{(B)})\)
- Use the empirical distribution \(\widehat \theta(\mathbf{x}^{(1)}), \ldots, \widehat \theta(\mathbf{x}^{(B)})\) as a proxy to the theoretical one.

```
= zeros(B)
est_vector_bs for i in 1:B
= rand(x, n)
x_bs = mean(x_bs)
est_vector_bs[i] end
histogram(est_vector_bs, legend=false)
= quantile(est_vector_bs, [0.025, 0.975])
ci_bounds_bs vline!(ci_bounds_bs)
```

If the sample \(\mathbf{x} = (x_1,\ldots,x_n)\) consists of independent and identically distributed data, the resampling procedure often provides a code proxy to the true (unknown) distribution of the estimator.

The above resampling procedure is called bootstrap (from ββTo pull oneself up by oneβs bootstraps.ββ) as only data are used that are already available.

- Reconsider the
`tree`

data set and the simple linear regression model`Volume ~ Girth`

. Calculate a 95% confidence interval for \(\beta_1\) via bootstrap and compare to the Julia output of the linear model. - Use bootstrap to estimate the standard error for the predicted volume of a tree with
`Girth=10`

the output above.

*Problem:* If the data are *not* independent, the above (i.i.d.) bootstrap samples would have a misspecified dependence structure and therefore lead to a bad uncertainty estimate. For some situations, there are specific modifications of the bootstrap procedure (e.g. block bootstrap for time series), but they tend to work well only if dependence is sufficiently weak.

## Parametric Bootstrap

There are situations where it is hardly possible to construct reasonable confidence intervals or estimate the standard error. But one could at least get a __rough guess__ of the uncertainty by the following thought experiment:

Assume that the estimated parameter value \(\theta^*\) would be equal to the true one. How uncertain would an estimate be in that case?

The answer is given by the following procedure, called *parametric bootstrap*:

- Generate independent data \(\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \ldots, \mathbf{x}^{(B)}\) (each sample of size \(n\)) from the model with parameter \(\theta^*\)
- Apply the estimator separately to each sample \(\leadsto\) \(\widehat \theta(\mathbf{x}^{(1)}), \ldots, \widehat \theta(\mathbf{x}^{(B)})\)
- Use the empirical distribution \(\widehat \theta(\mathbf{x}^{(1)}), \ldots, \widehat \theta(\mathbf{x}^{(B)})\) as a proxy to the theoretical one.

- Consider the following function that generate \(n\) correlated samples that are uniformly distributed on \([\mu-0-5,\mu+0.5]\).

```
= function(mu, n)
myrand = 0.9
rho = zeros(n)
res 1] = rand(1)[1]
res[if n > 1
for i in 2:n
= rho*res[i-1] .+ (1-rho)*rand(1)[1]
res[i] end
end
.= mu - 0.5 .+ res
res return(res)
end
```

The additional parameter `rho`

(between 0 and 1) controls the strength of dependence with 0 meaning independence and 1 meaning full dependence.

Write functions that estimate the standard deviation of the estimated mean via (a) generating new samples from the true unknown distribution, (b) i.i.d. bootstrap, (c) parametric bootstrap 2. Use the functions for different values of `rho`

and compare the results.