|
|
Moment methods for analyzing longitudinal data have been proposed by Liang & Zeger (1986), extended by Prentice (1988), and adapted to spatially correlated binary data by Albert & McShane (1995). In their estimating equations, these authors estimate both the parameters for the mean model and the spatial correlation parameter. The estimating equations for the correlation parameter, which we call GEEla, use cross products of residuals (Albert & McShane, 1995). The variogram is a measure of spatial association in the class of intrinsically stationary spatial processes, which contains the class of second-order and weakly stationary processes. Based on the idea of the semivariogram, we propose a set of estimating equations that use squared differences of residuals, which we call GEElb. We derive the formulations for the estimating equations for binary and continuous data and the corresponding asymptotic expressions for Normality are given. One thing that typifies the GEE methodology is the use of "working" covariance matrices. When estimating covariance parameters, we need to specify the covariance matrix of the cross products (GEEla) or the squared differences (GEElb). We consider two kinds of working covariance matrices for GEEla and GEElb. The first one uses a diagonal matrix for the covariance matrix of the squared differences of the residual for GEElb and the cross products of the residual for GEEla (Prentice, 1988; Albert & McShane, 1995) and the second one uses the normal approximation suggested by Prentice & Zhao (1991) to calculate the diagonal entry. In simulations we study the relative efficiency of GEEla, GEM, ML, and REML and the robustness when the mean models have been misspecified. We study performance in moderate and small numbers of lattices (including one) with different types of response (binary and normal) and different strengths of pairwise correlation. Although ML (GEEla with full specification of working covariance matrix of the cross products) is performing better than GEEla and GEM (the simpler formulations considered in this thesis, which are robust to underparameterizing mean model), the simulation results show that GEElb could be more useful for Gaussian or Gaussian-like data than GEEla; however GEM and GEEla seem to be fairly equivalent for spatially correlated binary data when we have replication. Specifically for normal case, all methods estimate beta equally well, and GEElb provides more efficient estimates of the spatial correlation parameter and the variance than GEEla. However, ML and REML outperform GEE with diagonal working covariance matrices when mean and variance-covariance models are correct. The GEE methodology is more robust to the mean misspecification than ML and REML, since they fully use incorrect information. For binary case, the results are mixed, with GEElb outperforming GEEla in estimating a as the spatial correlation increases, and GEElb is better than GEEla when a = 2 and a = 3 with d = 0.2. When we have only one realization (lattice) so that all observations are correlated, we explore relationships among estimation bias, average correlation over the lattice, and number of sampled points. Increasing the sample size 4 times (from 10 x 10 grid to 20 x 20 grid) decreases the bias of ML, REML, GEEla, and GEElb in estimating a and s2 (the greatest improvement occurs for REML), and REML did best in all parameters for the 20 x 20 case. The results show that larger average correlation leads to more bias, but the average correlation is not the sole determining factor, since when the grid grows from 10 x 10 to 20 x 20, even if the average correlation for 20 x 20 is higher than for 10 x 10, the standardized biases for all methods are lower. The magnitude of bias for a does follow the average correlation. For binary case, when d = 0.2 or 0.5, GEEla and GEElb can estimate b without difficulty. When d = 0.1 we do have difficulties in estimating b (this also happened when using macro glimmix and PROC LOGISTIC from SAS) and the bias for a increases when we have more sampled points. For normally distributed data, we rewrite the estimating equations for maximum likelihood and restricted maximum likelihood for comparison with GEE. The result is a very clear and unified picture among the most popular methods, in cluding maximum likelihood (ML), restricted maximum likelihood (REML), best linear unbiased estimator (BLUE), minimum variance quadratic unbiased estimator (MIVQUE), and generalized estimating equations (GEE).
Fiscal Year: fy2002 ·
Problem Area: pa98-2 ·
Theme: cctrgnas ·
Source: extra
<== Explain
Citation:
Wu, C.-T. 1998. Generalized estimating equations for spatially correlated data. Dissertation,Department of Statistics, North Carolina State University
Want more? Send an email to tholmes@fs.fed.us
.
If you're requesting a reprint be sure your email includes the citation and your complete mailing address.
|
Forest Economics and Policy |
|
USDA Forest Service Southern Research Station |