Exercises:

LAB

Ex. 1

Check the biased nature of \(s^2_b\) via MC simulation, generating \(n=10\) iid values from a Normal distribution. Plot also \(s^2\) and comment the difference.

Solution

## FUNCTIONS ##

# Biased sample-variance, done in an efficient manner
var_b <- function(x)
{
  var(x)*(length(x)-1)/length(x)
}



## EXERCISE ##

set.seed(123)

R <- 1000        # Replications
n <- 10          # Samples per Replication

sigma <- 1       # Normal StdDev


samples <- array(0, c(R, n))
for (i in 1:R){
  samples[i, ] <- rnorm(n, 0, sigma)
}

samples_stat <- array(0, R)
samples_stat <- apply(samples, 1, var_b)


par(mfrow=c(1,1), oma=c(0,0,0,0))


# Biased sample-variance (hist)
hist(samples_stat, breaks= 40, probability = TRUE,
     xlab=expression(sb^2), main= bquote(Histogram: sb^2), cex.main = 1.5, ylim = c(0, 1.5))

# Biased sample-variance (line)
curve(((n)/sigma^2) * dchisq(x * ((n)/sigma^2), df = n - 1),
      add = TRUE, col="red", lwd=2, main="N(0,1)")

# UN-biased sample-variance (line)
curve(((n-1)/sigma^2) * dchisq(x * ((n-1)/sigma^2), df = n - 1),
      add = TRUE, col="blue", lwd=2, main="N(0,1)")

# Make the "bias" more apparent
segments(1, 0, 1, 1.2,
         col="orange", lwd=2)

legend("topright",
       legend = c("Biased sample-variance (dist.)", "Unbiased sample-variance (dist.)"),
       col=c("red", "blue"),
       lty=1:1)

Observations and conclusions:
Form the obtained plot, we can observe that the maximums for the distributions of estimators \(s^2\) and \(s^2_b\) don’t coincide, with \(\text{max}(s^2)\) closer to the true value of \(\sigma^2 = 1\). This still holds true as the sample size \(n\) increases. Given the unbiased nature of \(s^2\), we can then conclude that \(s^2_b\) is a biased estimator. The orange line identifies the true value of \(\sigma^2 = 1\).

Ex. 2

Let us consider a darts target divided into \(K=4\) radial zones. We assign to each zone the following hitting probability:

  • Zone 1 (from 1 to 3 points): \(p_1=\frac{7}{16}\);

  • Zone 2 (from 4 to 6 points); \(p_2=\frac{5}{16}\);

  • Zone 3 (from 7 to 9 points); \(p_3=\frac{3}{16}\);

  • Zone 4 (the highest points in the middle of the target; say: 10, 25, 50 points): \(p_4=\frac{1}{16}\);

The number of players is equal to \(6\) and each of them throws \(50\) darts, which corresponds to the number of considered observations (\(n=50\) for each player).

We assume as null hypothesis: due to a moderate control on darts skills, each friend has decreasing probabilities to hit the best zones, according to the table outlined just above i.e.: \[ H_0 : \,p_1=\frac{7}{16};\,\,\,\,p_2=\frac{5}{16};\,\,\,\,p_3=\frac{3}{16};\,\,\,\,p_4=\frac{1}{16}. \]

We try to simulate the data and perform a null-hypothesis-test, assuming that:

  • Firstly: the players asymptotically behave exaclty according to the probabilities defining \(H_0\);

  • Secondly: a player is added (meaning: \(7\) total players), whose skills in playing darts are such that the hitting probability of best zones is proportional to their points.

All hypothesis testing involved is performed via Pearson’s \(\chi^2\) homogeneity testing among the single-player distribution samples for the darts hit-zones.

Solution

## FUNCTIONS ##

chisq_term <- function(o, e)
{
  return ((o-e)*(o-e)/e)
}



## EXERCISE ##

set.seed(1023)

# Observation nr., friends, zones, probabilities

n <- 50
K <- 4

bad_friends <- 6
good_friends <- 1
friends <- good_friends + bad_friends

bad_prob <-  c(7/16, 5/16, 3/16, 1/16)
good_prob <- c(1/16, 3/16, 5/16, 7/16)


# Simulation (bad friends only)
throws_bad <- replicate(bad_friends, sample(1:K, n, replace=TRUE, prob=bad_prob))

# Simulation (good friends only)
throws_good <- replicate(good_friends, sample(1:K, n, replace=TRUE, prob=good_prob))


pivot = 0.0

# COMPUTE SUMS - BAD FRIENDS

for (friend in (1:bad_friends))
{
  pivot <- pivot + sum(chisq_term(table(throws_bad[,friend]), n*bad_prob))
}

# TEST - BAD-ONLY FRIENDS

pchisq(pivot, df=(K-1)*(bad_friends-1), lower.tail=FALSE)
## [1] 0.9538548
# COMPUTE SUMS - GOOD FRIENDS

for (friend in (1:good_friends))
{
  pivot <- pivot + sum(chisq_term(table(throws_good[,friend]), n*bad_prob))
}

# TEST - GOOD+BAD FRIENDS

pchisq(pivot, df=(K-1)*(friends-1), lower.tail=FALSE)
## [1] 1.217645e-25

Observations:

The relatively low number of observations (\(280,350\)) makes the simulation heavily-dependent on the initially-set simulation seed: a modification of produced data changes significatively the computation of the p-values involved. However the general conclusion does not vary.

Conclusions:

We can conclude that in the first case, when we consider moderately bad players only, we cannot reject the null hypothesis for a reasonable range of significancies: \(p\) is generally close to \(0.85\) across many of the tested seeds. When the good player joins, we have to reject the null hypothesis for any reasonable level of significance: \(p\), which is almost always sub-millesimal.

Ex. 3

Consider now some of the most followed Instagram accounts in 2018: for each of the owners, we report also the number of Twitter followers (in milions). Are the Instagram and Twitter account somehow associated? Perform a correlation test, compute the p-value and give an answer.

Solution

library("magrittr")
library("ggplot2")
library("ggpubr")

# Dataframe
Owners <- c("Katy Perry", "Justin Bieber", "Taylor Swift", "Cristiano Ronaldo",
             "Kim Kardashian", "Ariana Grande", "Selena Gomez", "Demi Lovato")
Instagram <- c(69, 98, 107, 123, 110, 118, 135, 67)
Twitter <- c(109, 106, 86, 72, 59, 57, 56, 56)

# Plot
plot(Instagram, Twitter, pch=21, bg=2, xlim=c(60, 150), ylim=c(40, 120))
text(Instagram[-6], Twitter[-6]+5, Owners[-6], cex=0.8)
text(Instagram[6], Twitter[6]-5, Owners[6], cex=0.8)

# Create a more suitable dataframe
all <- data.frame(Owners = Owners, Instagram = Instagram, Twitter = Twitter)

# Plot correlation using ggpubr library
ggpubr::ggscatter(all, x = 'Instagram', y = 'Twitter',
          color = 'red',                                          # for the points
          add = "reg.line",
          add.params = list(color = "blue", fill = "lightgray"),  # for the line
          conf.int = TRUE,
          cor.coef = TRUE,
          cor.method = "pearson",
          xlab = "Instagram", ylab = "Twitter",
          xlim=c(60, 150), ylim=c(40, 120))

# Pearson test
pearson <- cor.test(Instagram, Twitter,
                    method = "pearson")

# Print the result
pearson
## 
##  Pearson's product-moment correlation
## 
## data:  Instagram and Twitter
## t = -1.1454, df = 6, p-value = 0.2957
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8689006  0.4006898
## sample estimates:
##        cor 
## -0.4235845

Observations:

The Pearson correlation coefficient in this case measures the linear relationship between Instagram and Twitter followers data, bounded by the fact that correlations of \(-1\) and \(+1\) imply (respectively) an exact (anti-)linear relationship.

Firstly, it must be noted that – due to the very reduced sample size – results coming from such observations hardly can be generalized to the distribution the sample is a representative of.

The observed correlation amounts to \(cor = -0.424\) and evidences moderate anticorrelation between Twitter and Instagram followers count in the considered sample.

However, significance testing of the Null Hipothesis \(H_0:\) true correlation is equal to 0 leads to the conclusion it cannot be rejected, with \(p = 0.295\). This is also confirmed by a \(95\%\) confidence interval compatible with \(R=0\)

Conclusions:

It is not possible to reject the null hypothesis stating that Twitter and Instagram follower count is uncorrelated for the people the dataset is representative of.

Ex. 4

Compute analitically \(J(\gamma, \gamma; y)\), \(J(\gamma, \beta; y)\), \(J(\beta, \beta; y)\), i.e. the single Observed Information Matrix terms for a bidimensional Weibull model.

Solution

Let \(l(\gamma,\beta;y)\) be the log-likelihood for such model:
\[ l(\gamma,\beta;y)=n \log(\gamma) - n \gamma \log(\beta) + \gamma \sum_{i=1}^n(\log(y_i)) - \sum_{i=1}^n(y_i/\beta)^\gamma. \] Let \(J\) be the Observed Information Matrix, defined as:

\[ J(\theta;y) = -\frac{\partial^2 l(\theta;y)}{\partial\theta \partial\theta^T}. \] Then: \[ \frac{\partial l(\gamma,\beta;y)}{\partial \gamma} = \frac{n}{\gamma} - n\log(\beta) + \sum_{i=1}^n\log(y_i) - \sum_{i=1}^n\log\left(\frac{y_i}{\beta}\right)^\gamma\log\left(\frac{y_i}{\beta}\right); \] \[ \frac{\partial l(\gamma,\beta;y)}{\partial \beta} = -\frac{n\gamma}{\beta} + \frac{\gamma}{\beta^{\gamma+1}} \sum_{i=1}^n y_i^\gamma. \] Finally we can compute the partial derivatives:

  1. \(\partial_{\gamma\gamma}^2\) term: \[ \frac{\partial^2 l(\gamma,\beta;y)}{\partial\gamma^2} = -\frac{n}{\gamma^2} + \sum_{i=1}^n \left( \frac{y_i}{\beta} \right)^\gamma \left[\log\left( \frac{y_i}{\beta} \right)\right]^2; \]

  2. \(\partial_{\gamma\beta}^2\) and \(\partial_{\beta\gamma}^2\) term:

\[ \frac{\partial^2 l(\gamma,\beta;y)}{\partial\beta\partial\gamma} = \frac{\partial^2 l(\gamma,\beta;y)}{\partial\gamma\partial\beta} = \frac{1}{\beta} \left[ - n + \frac{1}{\beta^\gamma }\sum_{i=1}^n y_i^\gamma \left( \gamma \log\left(\frac{y_i}{\beta}\right)+1\right) \right]; \] 4. \(\partial_{\beta\beta}^2\) term:

\[ \frac{\partial^2 l(\gamma,\beta;y)}{\partial\beta^2} = \frac{n\gamma}{\beta^2} - \frac{\gamma(\gamma+1)}{\beta^{\gamma+2}} \sum_{i=1}^ny_i^\gamma. \]

Ex. 5

Produce the contour plot for the quadratic approximation of the log-likelihood, based on the Taylor series: \[ l(\theta)-l(\hat{\theta}) \approx -\frac{1}{2}(\theta-\hat{\theta})^T\,J(\hat{\theta})(\theta-\hat{\theta}). \]

Solution

## FUNCTIONS ##

# Weibull log-likelihood
log_lik_weibull <- function(data, param)
    {
        -sum(dweibull(data, shape = param[1], scale = param[2], log = TRUE))
    }



## EXERCISE ##

# data
y <- c(155.9, 200.2, 143.8, 150.1,152.1, 142.2, 147, 146, 146, 170.3, 148, 140, 118, 144, 97)
n <- length(y)


# define parameters grid
gamma <- seq(0.1, 15, length=100)
beta <- seq(100,200, length=100)
parvalues <- expand.grid(gamma,beta)

llikvalues <- apply(parvalues, 1, log_lik_weibull, data=y)
llikvalues <- matrix(-llikvalues, nrow=length(gamma), ncol=length(beta), byrow=F)

conf.levels <- c(0,0.5,0.75,0.9,0.95,0.99)


# estimate gamma and beta
gammahat<-uniroot(function(x) n/x+sum(log(y))-n*
                      sum(y^x*log(y))/sum(y^x),
                  c(1e-5,15))$root
betahat<- mean(y^gammahat)^(1/gammahat)
weib.y.mle<-c(gammahat,betahat)

# first element is the MLE for the shape gamma, second element the MLE for the scale beta
weib.y.mle
## [1]   6.886215 155.948671
# observed information matrix
jhat<-matrix(NA,nrow=2,ncol=2)
jhat[1,1]<-n/gammahat^2+sum((y/betahat)^gammahat*
                                (log(y/betahat))^2)
jhat[1,2]<-jhat[2,1]<- n/betahat-sum(y^gammahat/betahat^(gammahat+1)*
                                         (gammahat*log(y/betahat)+1))
jhat[2,2]<- -n*gammahat/betahat^2+gammahat*(gammahat+1)/
    betahat^(gammahat+2)*sum(y^gammahat)
solve(jhat)
##          [,1]      [,2]
## [1,] 1.543250  2.550504
## [2,] 2.550504 38.406106
# quadratic approx
log_lik_qdiff <- function(data, param)
    {
        diffterm <- (param - weib.y.mle)

        # Note that in Taylor expansion around MLE-estimates:
        # - 0th order term is a fixed constant;
        # - 1st order term is 0.
        #
        # For that reason, 2nd+ order terms only account for the relative log-likelihood computation.

        -0.5*t(diffterm)%*%jhat%*%diffterm
    }


# log_lik
c_llikvalues <- apply(parvalues, 1, log_lik_qdiff, data=y)
c_llikvalues <- matrix(c_llikvalues, nrow=length(gamma), ncol=length(beta), byrow=F)


# contour plot
contour(gamma, beta, c_llikvalues,
        levels=-qchisq(conf.levels, 2)/2,
        xlab=expression(gamma),
        labels=as.character(conf.levels),
        ylab=expression(beta))
title('Weibull relative log likelihood (quadratic approx.)')

DAAG

Ex. 3.11

The following data represent the total number of aberrant crypt foci (abnormal growths in the colon) observed in seven rats that had been administered a single dose of the carcinogen azoxymethane and sacrificed after six weeks (thanks to Ranjana Bird, Faculty of Human Ecology, University of Manitoba for the use of these data):

\[ \begin{matrix} 87 & 53 & 72 & 90 & 78 & 85 & 83 \end{matrix} \]

Enter these data and compute their sample mean and variance. Is the Poisson model appropriate for these data? To investigate how the sample variance and sample mean differ under the Poisson assumption, repeat the following simulation experiment several times:

x <- rpois(7, 78.3)
mean(x); var(x)
## [1] 76.28571
## [1] 44.90476

Solution

aberrant_crypt <- c(87, 53, 72, 90, 78, 85, 83)
n <- length(aberrant_crypt)
M <- 10000

crypt_mean <-mean(aberrant_crypt)
crypt_var <- var(aberrant_crypt)

paste("Aberrant crypt foci mean (sample):", crypt_mean)
## [1] "Aberrant crypt foci mean (sample): 78.2857142857143"
paste("Aberrant crypt foci variance (sample):", crypt_var)
## [1] "Aberrant crypt foci variance (sample): 159.904761904762"
vec_mean <- c()
vec_var <- c()
for (i in 0:M-1){
  x <- rpois(n, 78.28)
  vec_mean <- c(vec_mean, mean(x))
  vec_var <- c(vec_var, var(x))
}

hist(x = abs(vec_mean -vec_var),
     main = "Difference between mean and variance",
     xlab = "|mean - var|",
     ylab = "frequency")

# Fisher Dispersion Test for Poissonity (1950)
dispersion_ab <- sum((aberrant_crypt - crypt_mean)^2)/crypt_mean
p_value <- pchisq(dispersion_ab, df= n-1, lower.tail= FALSE)
paste("p-value:", p_value)
## [1] "p-value: 0.0565062105551693"

Observations:

In order to investigate the behaviour of mean and variance of many different samples generated according to a Poisson distribution – and knowing the expected theoretical result – we analyzed in the form of a histogram the absolute-value difference among means and variances for each sample.

The result, shown above, further confirms the idea of testing for Poissonity via checking for coinciding mean and variance in a sample.

This concept has been e.g. formalized in [Fisher, 1950] via the Dispersion test, which we use. The resulting statistic distributes asymptotically as a \(\chi^2\) with \(\#d.f.\) one less than sample size.

Conclusions:

\(\chi^2\) testing on the sample dispersion gives a \(p=0.056\) and conventional practice would suggest a very borderline acceptance of the null hypothesis. However, given the small sample size and the heavy constraint on mean and variance imposed by Poissonity, it could not be straightforwardly advisable to assume Poissonity upfront for modelling needs. An increased sample size – if possible – should be gathered; or a more flexible distribution could be used instead.
Acceptance of Poissonity with \(p=0.056\) and no clear pre-stateable significance level would seem like \(p\)-hacking.

Ex. 3.13

A Markov chain for the weather in a particular season of the year has the transition matrix, from one day to the next:

\[ Pb = \begin{bmatrix} & sun & cloud & rain\\ sun & 0.6 & 0.2 & 0.2\\ cloud & 0.2 & 0.4 & 0.4\\ rain & 0.4 & 0.3 & 0.3 \end{bmatrix} \]

It can be shown, using linear algebra, that in the long run this Markov chain will visit the states according to the stationary distribution:

\[ \begin{matrix} sun & cloud & rain\\ 0.429 & 0.286 & 0.286 \end{matrix} \]

A result called the ergodic theorem allows us to estimate this distribution by simulating the Markov chain for a long enough time.

a) Simulate \(1000\) values, and calculate the proportion of times the chain visits each of the states. Compare the proportions given by the simulation with the above theoretical proportions.

b) Following, there is code that calculates rolling averages of the proportions over a number of simulations and plots the result. It uses the function \(\texttt{rollmean()}\) from the \(\texttt{zoo}\) package. Try varying the number of simulations and the width of the window. How wide a window is needed to get a good sense of the stationary distribution? This series settles down rather quickly to its stationary distribution (it “burns in” quite quickly). A reasonable width of window is, however, needed to give an accurate indication of the stationary distribution.

Solution (a)

set.seed(160898)

# Simulate a discrete MC according to transistion matrix P, given the number of iterations and the initial state
mc_simulation <- function(P, n, initial)
{
  # Number of possible states
  s_states <- nrow(P)

  # States through time
  states <- numeric(n)

  # Initial state
  states[1] <- initial

  for(t in (2:n))
  {
    # Next state
    p <- P[states[t-1],]
    states[t] <- which(rmultinom(1, 1, p) == 1)
  }

  return(states)
}

# Trasition matrix
P <- t(matrix(c( 0.6, 0.2, 0.2,
                 0.2, 0.4, 0.4,
                 0.4, 0.3, 0.3 ), nrow = 3, ncol = 3))

# Number of calculated states
n_iter <- 1000

# Simulate chain
MC <- numeric(n_iter)
MC <- mc_simulation( P, n_iter, 1 )

result <- t(matrix(c("Sun", "Cloud", "Rain",
                     table(MC)/n_iter), nrow = 3, ncol = 2))

# Print the result
print(result)
##      [,1]    [,2]    [,3]   
## [1,] "Sun"   "Cloud" "Rain" 
## [2,] "0.419" "0.279" "0.302"
n_iter <- 1000000

# Simulate chain
MC <- numeric(n_iter)
MC <- mc_simulation( P, n_iter, 1 )

result <- t(matrix(c("Sun", "Cloud", "Rain",
                     table(MC)/n_iter), nrow = 3, ncol = 2))

# Print the result
print(result)
##      [,1]       [,2]       [,3]      
## [1,] "Sun"      "Cloud"    "Rain"    
## [2,] "0.427971" "0.285886" "0.286143"

Observations

The proportion among different states “visited” by the chain depends heavily on the seed and on the initial state of the chain itself. The stated number of total states to consider (\(n=1000\)) is too small to observe a result which is independent from the seed and from the initial state. A better result can be obtained by enlarging \(n\) and by not considering a proper amount of visited states from the beginning of the simulation (so-called burn-in).

Conclusion

Considering \(n=1000\) states from the start of the simulation, the proportion among visited states is not consistent with the the theoretical convergence result. Though, a dominance of the \(Sun\) state is observed.
Dependence on lack of burn-in and a small averaging-sample size is confirmed, in fact, by the observation that with a proper (e.g. \(n=1000000\)) number of states considered – even putting burn-in apart – the proportion coincides with the theoretical one.

Solution (b)

In the following plots, we can observe the variation of the rolling average – for each state – computed for an increasing number of visited states of the Markov Chain and for a different window size.

library("lattice")
library("zoo")

set.seed(160898)

# Simulate a discrete MC according to transistion matrix P, given the number of iterations and the initial state
mc_simulation <- function(P, n, initial)
{
  # Number of possible states
  s_states <- nrow(P)

  # States through time
  states <- numeric(n)

  # Initial state
  states[1] <- initial

  for(t in (2:n))
  {
    # Next state
    p <- P[states[t-1],]
    states[t] <- which(rmultinom(1, 1, p) == 1)
  }

  return(states)
}

# Transition matrix
P <- t(matrix(c( 0.6, 0.2, 0.2,
                 0.2, 0.4, 0.4,
                 0.4, 0.3, 0.3 ), nrow = 3, ncol = 3))

# Plot the result
plotmarkov <-
  function(n=1000, start=1, window=100, transition=P, npanels=5){
    xc2 <- mc_simulation(transition, n, start)
    mav1 <- rollmean(as.integer(xc2==1), window)
    mav2 <- rollmean(as.integer(xc2==2), window)
    mav3 <- rollmean(as.integer(xc2==3), window)
    npanel <- cut(1:length(mav1), breaks=seq(from=1, to=length(mav1),
                                             length=npanels+1), include.lowest=TRUE)
    df <- data.frame(av1=mav1, av2=mav2, av3=mav3, x=1:length(mav1),
                     gp=npanel)
    print(xyplot(av1+av2+av3 ~ x | gp, data=df, layout=c(1,npanels),
                 type="l", par.strip.text=list(cex=0.65),
                 scales=list(x=list(relation="free"))))
  }

# Somehow a starting point:
plotmarkov(1000, 1, 200, P, 1)

# Changing window size:
plotmarkov(1000, 1, 500, P, 1)

# Changing overall number of states:
plotmarkov(2000, 1, 200, P, 1)

# Changing window size:
plotmarkov(2000, 1, 500, P, 1)

# Changing overall number of states:
plotmarkov(3000, 1, 200, P, 1)

# Changing window size:
plotmarkov(3000, 1, 500, P, 1)

# Changing window size:
plotmarkov(3000, 1, 900, P, 1)

# Changing overall number of states:
plotmarkov(4000, 1, 200, P, 1)

# Changing window size:
plotmarkov(4000, 1, 500, P, 1)

# Changing window size:
plotmarkov(4000, 1, 900, P, 1)

# Changing both:
plotmarkov(5000, 1, 1000, P, 1)

# Changing both:
plotmarkov(10000, 1, 2000, P, 1)

# Five-panel plot of different chains
plotmarkov(10000, 1, 1000, P, 5)

Observations

As almost always, a one size fits all approach is rarely applicable, and this is also the case. By increasing the number of total observed states the chain is pushed towards convergence and more stable results will follow. On the other hand, an increased window size can forcefully average results, giving a clearer idea of converged results but covering eventual stationarity in variation.

Conclusion

We can generally observe that, combining a sufficiently large size of the window and a sufficiently far horizon the plot shows what we expect: the convergence proportion is clear and respected statistically, \[ \begin{bmatrix}Sun & Cloud & Rain\\ 0.429 & 0.286 & 0.286 \end{bmatrix} \]

Ex. 4.6

Generate random normal numbers with a sequential dependence structure:

y1 <- rnorm(51)
y <- y1[-1] + y1[-51]
acf(y1)  # acf is ‘autocorrelation function’

acf(y)

Repeat this several times. There should be no consistent pattern in the acf plot for different random samples y1. There will be a fairly consistent pattern in the acf plot for y, a result of the correlation that is introduced by adding to each value the next value in the sequence.

Solution

set.seed(1)

y1_acf <- list()
y_acf <- list()

par(mfrow=c(2, 1))
for(i in 1:3){
  y1 <- rnorm(51)
  y <- y1[-1] + y1[-51]

  y1_acf[[i]] <- acf(y1, na.action=na.pass, plot=FALSE)
  y_acf[[i]] <- acf(y, na.action=na.pass, plot=FALSE)
}

plot(y1_acf[[1]], type="p", pch = 15, max.mfrow=1, ylim=c(-.5,1.1))
points(y1_acf[[2]]$lag, y1_acf[[2]]$acf, pch = 16, cex = 1.1, col="orange")
points(y1_acf[[3]]$lag, y1_acf[[3]]$acf, pch = 17, cex = 1.2, col="red")

plot(y_acf[[1]], type="p", pch = 15, max.mfrow=1, ylim=c(-.5,1))
points(y_acf[[2]]$lag, y_acf[[2]]$acf, pch = 16, cex = 1.1, col="orange")
points(y_acf[[3]]$lag, y_acf[[3]]$acf, pch = 17, cex = 1.2, col="red")

Observations and conclusions:

In order to analyze the problem at hand, some autocorrelation plots have been made by repeating the gerenation of both Normal and sequentially-dependent random numbers.

In the top plot (\(y_1\) series) no easily distinguishable pattern is evident: neither among different samples, nor along the sequence of generated values. Along-sequence correlation is randomly distributed and same-lag behaviour is randomic too.

In the bottom plot (\(y\) series) a clear pattern emerges. The plot shows a periodic behaviour of the autocorrelation function along the sequence. A rough estimate of such period seems to be around \(6\) lags.
Among different samples there seems to be no evident relationship w.r.t. autocorrelation, apart from the evidence they seem to be out of phase, as expected by the indipendent generation of sample-vectors from the \(y_1\) sequence.

Ex. 4.7

Create a function that does the calculations in the first two lines of the previous exercise. Put the calculation in a loop that repeats 25 times. Calculate the mean and variance for each vector y that is returned. Store the 25 means in the vector av, and store the 25 variances in the vector v. Calculate the variance of av.

Solution

corr <- function(){
  y1 <- rnorm(51)
  y <- y1[-1] + y1[-51]
  return(y)
}

computation <- replicate(25, corr(), simplify=FALSE)

av <- c()
v <- c()
for(i in 1:25){
  av <- c(av, mean(computation[[i]]))
  v <- c(v, var(computation[[i]]))
}

var_v <- var(av)
paste("variance:", var_v)
## [1] "variance: 0.104530071426194"

CS

Ex. 3.3

Rewrite the following, replacing the loop with efficient code:

n <- 100000; z <- rnorm(n)
zneg <- 0;j <- 1
for (i in 1:n) {
  if (z[i]<0) {
    zneg[j] <- z[i]
    j <- j + 1
  }
}

Confirm that your rewrite is faster but gives the same result.

Solution

library(stats)

naive_method <- function(n, z){
  zneg_1 <- 0; j_1 <- 1
  for (i in 1:n) {
    if (z[i]<0) {
      zneg_1[j_1] <- z[i]
      j_1 <- j_1 + 1
    }
  }
  return(zneg_1)
}

eff_method <- function(z){
  zneg_2 <- ifelse(z<0, z, NA)
  zneg_2 <- zneg_2[!is.na(zneg_2)]
  j_2 <- length(zneg_2) + 1
  return(zneg_2)
}

n <- 100000; z <- rnorm(n)
naive_time <- system.time(zneg_1 <- naive_method(n, z))  # Nested ass't (some linters may complain)
eff_time <- system.time(zneg_2 <- eff_method(z))         # Nested ass't (some linters may complain)

# Check for equality
comparison <- zneg_1 == zneg_2
paste("Elements that are diffent: ", which(comparison == FALSE))
## [1] "Elements that are diffent:  "
# Benchmark
print("Naive time:"); naive_time
## [1] "Naive time:"
##    user  system elapsed 
##   0.025   0.000   0.025
print("Efficient time:"); eff_time
## [1] "Efficient time:"
##    user  system elapsed 
##   0.004   0.000   0.004

Ex. 3.5

Consider solving the matrix equation \(Ax = y\) for \(x\) , where \(y\) is a known \(n\)-vector and \(A\) is a known \(n \times n\) matrix. The formal solution to the problem is \(x = A^{-1} \cdot y\) , but it is possible to solve the equation directly, without actually forming \(A^{−1}\) . This question explores this direct solution. Read the help file for solve before trying it.

  1. First create an \(A\) , \(x\) and \(y\) satisfying \(Ax = y\).
set.seed(0); n <- 1000
A <- matrix(runif(n*n),n,n); x.true <- runif(n)
y <- A%*%x.true

The idea is to experiment with solving \(Ax = y\) for \(x\) , but with a known truth to compare the answer to.

  1. Using solve , form the matrix \(A^{-1}\) explicitly and then form \(x_{1} = A^{-1} \cdot y\) . Note how long this takes. Also assess the mean absolute differ- ence between \(x_1\) and \(x.true\) (the approximate mean absolute ‘error’ in the solution).

  2. Now use solve to directly solve for \(x\) without forming \(A^{−1}\) . Note how long this takes and assess the mean absolute error of the result.

  3. What do you conclude?

Solution

# Part: (a)
set.seed(0); n <- 1000
A <- matrix(runif(n*n),n,n); x.true <- runif(n)
y <- A%*%x.true

# Part: (b)
form_inverse <- system.time(AI <- solve(A))       # Nested ass't (some linters may complain)
x_1 <- AI %*% y

mean(abs(x_1 - x.true))
## [1] 2.743709e-11
# Part: (c)
dir_inverse <- system.time(x_2 <- solve(A, y))    # Nested ass't (some linters may complain)
mean(abs(x_2 - x.true))
## [1] 8.209841e-13
# Part: (d)
print("system time to solve Ax = y by performing the A inverse:")
## [1] "system time to solve Ax = y by performing the A inverse:"
form_inverse
##    user  system elapsed 
##   0.210   0.000   0.087
print("system time to solve Ax = y by using directly solve function:")
## [1] "system time to solve Ax = y by using directly solve function:"
dir_inverse
##    user  system elapsed 
##   0.102   0.000   0.037

From the experimental results outlined above, the best choice to solve linear systems is to use the solve function, without forming \(A^{-1}\); indeed, both mean absolute difference and system time are much less then the other option.