Não foi possível enviar o arquivo. Será algum problema com as permissões?
Essa é uma revisão anterior do documento!
Exercícios
Semana 1
- Produce a plot of the Rongelap data in which a continuous colour scale or grey scale is used to indicate the value of the emission count per unit time at each location, and the two sub-areas with the 5 by 5 sub-grids at 50 metre spacing are shown as insets.
- Construct a polygonal approximation to the boundary of The Gambia. Construct plots of the malaria data which show the spatial variation in the values of the observed prevalence in each village and of the greenness covariate.
- (1) Consider the elevation data as a simple regression problem with elevation as the response and north-south location as the explanatory variable. Fit the standard linear regression model using ordinary least squares. Examine the residuals from the linear model, with a view to deciding whether any more sophisticated treatment of the spatial variation in elevation might be necessary.
- (2) Find a geostatistical data-set which interests you.
- What scientific questions are the data intended to address? Do these concern estimation, prediction, or testing?
- Identify the study region, the design, the response and the covariates, if any.
- What is the support of each response?
- What is the underlying signal?
- If you wished to predict the signal throughout the study region, would you choose to interpolate the response data?
- Load the Paraná data-set from geoR using the command
data(parana)
and inspect its documentation usinghelp(parana)
. For these data, consider the same questions as were raised in Exercise 1.4. - Read the Chapter 2 of Diggle & Ribeiro (2007) (you can get this chapter here)
Semana 2
- (3) load the data sets
parana
,Ksat
eca20
available ingeoR
using commands such as:data(parana)
and the documentation describing each data set with thehelp()
functionhelp(parana)
Perform exploratory data analysis and build a model you find suitable for each data. - (3) In the examples above, would you have othe candidate models for each data-set?
- Inspect an example geoestatistical analysis for the hydraulic conductivity data.
- (4) Consider the following two models for a set of responses, associated with a sequence of positions along a one-dimensional spatial axis .
- , where and are parameters and the are mutually independent with mean zero and variance .
- where the $Z_i$ are as in (a) but A and B are now random variables, independent of each other and of the $Z_i$, each with mean zero and respective variances $\sigma_A^2$ and $\sigma_B^2$.
For each of these models, find the mean and variance of $Y_i$ and the covariance between $Y_i$ and $Y_j$ for any $j \neq i$. Given a single realisation of either model, would it be possible to distinguish between them?
- (5) Suppose that $Y=(Y_1,\ldots,Y_n)$ follows a multivariate Gaussian distribution with ${\rm E}[Y_i]=\mu$ and ${\rm Var}\{Y_i\}=\sigma^2$ and that the covariance matrix of $Y$ can be expressed as $V=\sigma^2 R(\phi)$. Write down the log-likelihood function for $\theta=(\mu,\sigma^2,\phi)$ based on a single realisation of $Y$ and obtain explicit expressions for the maximum likelihood estimators of $\mu$ and $\sigma^2$ when $\phi$ is known. Discuss how you would use these expressions to find maximum likelihood estimators numerically when $\phi$ is unknown.
- (6) Is the following a legitimate correlation function for a one-dimensional spatial process $S(x) : x \in \IR$? Give either a proof or a counter-example.
- (7) Consider the following method of simulating a realisation of a one-dimensional spatial process on $S(x) : x \in \IR$, with mean zero, variance 1 and correlation function $\rho(u)$. Choose a set of points $x_i \in \IR : i=1,\ldots,n$. Let $R$ denote the correlation matrix of $S=\{S(x_1),\ldots,S(x_n)\}$. Obtain the singular value decomposition of $R$ as $R = D \Lambda D^\prime$ where $\lambda$ is a diagonal matrix whose non-zero entries are the eigenvalues of $R$, in order from largest to smallest. Let $Y=\{Y_1,\ldots,Y_n\}$ be an independent random sample from the standard Gaussian distribution, ${\rm N}(0,1)$. Then the simulated realisation is
- (7) Write an
R
function to simulate realisations using the above method for any specified set of points $x_i$ and a range of correlation functions of your choice. Use your function to simulate a realisation of $S$ on (a discrete approximation to) the unit interval $(0,1)$. - (7) Now investigate how the appearance of your realisation $S$ changes if in the equation above you replace the diagonal matrix $\Lambda$ by truncated form in which you replace the last $k$ eigenvalues by zeros.
Semana 3
- (8) Fit a model to the surface elevation data assuming a linear trend model on the coordinates and a Matérn correlation function with parameter kappa=2.5. Use the fitted model as the true model and perform a simulation study (i.e. simulate from this model) to compare parameter estimation based on maximum likelihood, restricted maximum likelihood and variograms.
- (9) Simulate 200 points in the unit square from the Gaussian model without measurement error, constant mean equals to zero, unit variance and exponential correlation function with $\phi=0.25$ and anisotropy parameters $(\psi_A=\pi/3, \psi_R=2)$. Obtain parameter estimates (using maximum likelihood):
- assuming a isotropic model
- try to estimate the anisotropy parameters
Compare the results and repeat the exercise for $\phi_R=4$.
- (10) Consider a stationary trans-Gaussian model with known transformation function $h(\cdot)$, let $x$ be an arbitrary
location within the study region and define . Find explicit expressions for ${\rm P}(T>c|Y)$ where $Y=(Y_1,…,Y_n)$ denotes the observed measurements on the untransformed scale and:
- .
- (11) Analyse the Paraná data-set or any other data set of your choice assuming priors obtaining:
- a map of the predicted values over the area
- a map of the predicted std errors over the area
- a map of the probabilities of being above a certain (arbitrarily) choosen threshold over the area
- a map of the 10th, 25th, 50th, 75th and 90th percentiles over the area
- the predictive distribution of the porportion of the area with the value of the study variable below a certain threshold. (as a suggestion you can use the 30th percentile of the data as the value of such a threshold)
Semana 4
- (12) Consider the stationary Gaussian model in which $Y_i = \beta + S(x_i) + Z_i :i=1,\ldots,n$, where $S(x)$ is a stationary Gaussian process with mean zero, variance $\sigma^2$ and correlation function $\rho(u)$, whilst the $Z_i$ are mutually independent ${\rm N}(0,\tau^2)$ random variables. Assume that all parameters except $\beta$ are known. Derive the Bayesian predictive distribution of $S(x)$ for an arbitrary location $x$ when $\beta$ is assigned an improper uniform prior, $\pi(\beta)$ constant for all real $\beta$. Compare the result with the ordinary kriging formulae.
- (13) For the model assumed in the previous exercise, assuming a correlation function parametrised by a scalar parameter $\phi$ obtain the posterior distribution for:
- a normal prior for $\beta$ and assuming the remaining parameters are known
- a normal-scaled-inverse-$chi^2$ prior for $(\beta, \sigma^2)$ and assuming the correlation parameter is known
- a normal-scaled-inverse-$chi^2$ prior for $(\beta, \sigma^2|\phi)$ and assuming a generic prior $p(\phi)$ for correlation parameter.
- (14) Analyse the Paraná data-set or any other data set of your choice assuming priors for the model parameters and obtaining:
- the posterior distribution for the model parameters
- a map of the predictive mean over the area
- a map of the predictive median over the area
- the predictive distribution at three arbitrary selected locations within the area
- (15) Obtain simulations from the Poison model as shown in Figure 4.1 of the text book for the course.
- (15) Try to reproduce or mimic the results shown in Figure 4.2 of the text book for the course simulating a data set and obtaining a similar data-analysis. Note: for the example in the book we have used set.seed(34).
- (16) Reproduce the simulated binomial data shown in Figure 4.6. Use the package geoRglm in conjunction with priors of your choice to obtain predictive distributions for the signal $S(x)$ at locations $x=(0.6, 0.6)$ and $x=(0.9, 0.5)$. Compare the predictive inferences which you obtained in the previous exercise with those obtained by fitting a linear Gaussian model to the empirical logit transformed data, . Compare the results of the two previous analysis and comment generally.
Semana 5
- (17) The composite likelihood (CL) is obtained by the product of independent distributions for pairs of variables at data locations. Assume a Gaussian model with constant mean and isotropic exponential correlation function.
- write down the expression of the CL and discuss how parameter estimates could be obtained
- write down a code to obtain CL parameter estimates for the s100 data set and compare with the ones given by ML and REML.