| Title: | Spatial Predictive Modeling |
|---|---|
| Description: | An updated and extended version of 'spm' package, by introducing some further novel functions for modern statistical methods (i.e., generalised linear models, glmnet, generalised least squares), thin plate splines, support vector machine, kriging methods (i.e., simple kriging, universal kriging, block kriging, kriging with an external drift), and novel hybrid methods (228 hybrids plus numerous variants) of modern statistical methods or machine learning methods with mathematical and/or univariate geostatistical methods for spatial predictive modelling. For each method, two functions are provided, with one function for assessing the predictive errors and accuracy of the method based on cross-validation, and the other for generating spatial predictions. It also contains a couple of functions for data preparation and predictive accuracy assessment. |
| Authors: | Jin Li [aut, cre] |
| Maintainer: | Jin Li <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.1.3 |
| Built: | 2026-06-03 09:41:28 UTC |
| Source: | https://github.com/cran/spm2 |
This dataset contains 212 samples of 61 variables including three bee species, inflorescence, temperature, wid speed and various derived landscape variables.
data("bees")data("bees")
A data frame with 212 observations on the following 61 variables.
transida factor with levels G1 G10 G11 G12 G13 G14 G15 G16 G1D1 G1D10 G1D11 G1D12 G1D13 G1D14 G1D15 G1D16 G1D17 G1D2 G1D3 G1D4 G1D5 G1D6 G1D7 G1D8 G1D9 G1G1 G1G10 G1G11 G1G12 G1G13 G1G14 G1G15 G1G16 G1G17 G1G2 G1G3 G1G4 G1G5 G1G6 G1G7 G1G8 G1G9 G2 G2H1 G2H10 G2H11 G2H2 G2H3 G2H4 G2H5 G2H6 G2H7 G2H8 G2H9 G2S1 G2S10 G2S11 G2S2 G2S3 G2S4 G2S5 G2S6 G2S7 G2S8 G2S9 G3 G4 G5 G6 G7 G8 G9 GCH1 GCH10 GCH11 GCH12 GCH13 GCH14 GCH15 GCH16 GCH17 GCH18 GCH2 GCH3 GCH4 GCH5 GCH6 GCH7 GCH8 GCH9 GCS1 GCS10 GCS11 GCS12 GCS13 GCS14 GCS15 GCS16 GCS17 GCS18 GCS2 GCS3 GCS4 GCS5 GCS6 GCS7 GCS8 GCS9 H1 H10 H11 H12 H13 H14 H15 H16 H2 H3 H4 H5 H6 H7 H8 H9 MC1G1 MC1G2 MC1G3 MC1G4 MC1G5 MC1G6 MC1G7 MC1G8 MC1G9 MC1H1 MC1H2 MC1H3 MC1H4 MC1H5 MC1H6 MC1H7 MC1H8 MC1H9 MC2AH1 MC2AH10 MC2AH11 MC2AH12 MC2AH13 MC2AH14 MC2AH15 MC2AH16 MC2AH17 MC2AH18 MC2AH2 MC2AH3 MC2AH4 MC2AH5 MC2AH6 MC2AH7 MC2AH8 MC2AH9 MC2AS1 MC2AS10 MC2AS11 MC2AS12 MC2AS13 MC2AS14 MC2AS15 MC2AS16 MC2AS17 MC2AS18 MC2AS2 MC2AS3 MC2AS4 MC2AS5 MC2AS6 MC2AS7 MC2AS8 MC2AS9 MC2BB1 MC2BB10 MC2BB11 MC2BB12 MC2BB13 MC2BB14 MC2BB15 MC2BB16 MC2BB17 MC2BB2 MC2BB3 MC2BB4 MC2BB5 MC2BB6 MC2BB7 MC2BB8 MC2BB9 MC2BG1 MC2BG10 MC2BG11 MC2BG12 MC2BG13 MC2BG14 MC2BG15 MC2BG16 MC2BG17 MC2BG2 MC2BG3 MC2BG4 MC2BG5 MC2BG6 MC2BG7 MC2BG8 MC2BG9
transsurva numeric vector
plotsurva numeric vector
paddocka numeric vector
plota factor with levels G1-1 G1-10 G1-11 G1-12 G1-13 G1-14 G1-15 G1-16 G1-17 G1-2 G1-3 G1-4 G1-5 G1-6 G1-7 G1-8 G1-9 G21 G210 G211 G22 G23 G24 G25 G26 G27 G28 G29 GC1 GC10 GC11 GC12 GC13 GC14 GC15 GC16 GC17 GC18 GC2 GC3 GC4 GC5 GC6 GC7 GC8 GC9 MC-1 MC-10 MC-11 MC-12 MC-13 MC-14 MC-15 MC-16 MC-2 MC-3 MC-4 MC-5 MC-6 MC-7 MC-8 MC-9 MC1-1 MC1-2 MC1-3 MC1-4 MC1-5 MC1-6 MC1-7 MC1-8 MC1-9 MC2-A1 MC2-A10 MC2-A11 MC2-A12 MC2-A13 MC2-A14 MC2-A15 MC2-A16 MC2-A17 MC2-A18 MC2-A2 MC2-A3 MC2-A4 MC2-A5 MC2-A6 MC2-A7 MC2-A8 MC2-A9 MC2-B1 MC2-B10 MC2-B11 MC2-B12 MC2-B13 MC2-B14 MC2-B15 MC2-B16 MC2-B17 MC2-B2 MC2-B3 MC2-B4 MC2-B5 MC2-B6 MC2-B7 MC2-B8 MC2-B9
obsa factor with levels Andrew Barbara Micah Sonia Steve
hbeea numeric vector
nbeea numeric vector
hovera numeric vector
datea numeric vector
sxa numeric vector
fxa numeric vector
sya numeric vector
fya numeric vector
loca factor with levels 100 150 200 400 450 edge
paira factor with levels I O
infa numeric vector
rankinfa numeric vector
dupla numeric vector
tempa numeric vector
windspeeda numeric vector
winddira factor with levels N NE NNE NNW NW
cloudca numeric vector
disttoedgecalca numeric vector
disttoedgemeasureda numeric vector
w100a numeric vector
w200a numeric vector
w300a numeric vector
w400a numeric vector
w500a numeric vector
w600a numeric vector
w700a numeric vector
w800a numeric vector
w900a numeric vector
w1000a numeric vector
w1500a numeric vector
w2000a numeric vector
c100a numeric vector
c200a numeric vector
c300a numeric vector
c400a numeric vector
c500a numeric vector
c1000a numeric vector
c1500a numeric vector
areaa numeric vector
perimetera numeric vector
gyrationa numeric vector
paratioa numeric vector
shapea numeric vector
fractaldimentiona numeric vector
circumscirclea numeric vector
contiguitya numeric vector
links100a numeric vector
links200a numeric vector
links300a numeric vector
links400a numeric vector
links500a numeric vector
links1000a numeric vector
links1500a numeric vector
links2000a numeric vector
windspeed2a numeric vector
For details, please see the source. This dataset was published as an appendix of the paper listed in the source. Where the long and lat were reprojected to easting and northing
The data source is [https://doi.org/10.25919/5f17b34638cca] or [https://data.csiro.au/collections/collection/CIcsiro:45533], which provides bees count data and relevant predictive variables along with a brief description of the data. The detailed descriptions of the data are available in: "Arthur, A. D., et al. (2010). "Influence of woody vegetation on pollinator densities in oilseed Brassica fields in an Australian temperate landscape." Basic and Applied Ecology 11: 406-414."
Arthur, A. D., Li, J., Henry, S., Cunningham, S.A. (2010). "Influence of woody vegetation on pollinator densities in oilseed Brassica fields in an Australian temperate landscape." Basic and Applied Ecology 11: 406-414.
This function is to calculates correct classification rate (ccr) for categorical data with the observed (obs) data specified as factor. It based on the differences between the predicted values for and the observed values of validation samples for cross-validation. For 0 and 1 data, the observed values need to be specified as factor in order to use this accuracy measure. It is modified from the function 'pred.acc' in 'spm' package.
ccr(obs, pred)ccr(obs, pred)
obs |
a vector of observation values of validation samples. |
pred |
a vector of prediction values of predictive models for validation samples. |
A list with the following component: ccr (correct classification rate) for categorical data.
Jin Li
Jin Li (2019). spm: Spatial Predictive Modeling. R package version 1.2.0. https://CRAN.R-project.org/package=spm.
set.seed(1234) x <- as.factor(sample(letters[1:2], 30, TRUE)) y <- sample(x, 30) ccr(x, y)set.seed(1234) x <- as.factor(sample(letters[1:2], 30, TRUE)) y <- sample(x, 30) ccr(x, y)
This is an updated and extended version of ‘spm' package. The change in package name from 'spm' to 'spm2' is due to the change in Author’s support from Geoscience Australia to Data2Action Australia.
## R CMD check results 0 errors | 0 warnings | 0 notes
Jin Li
This function is a data splitting function for k-fold cross- validation and uses a stratified random sampling technique. It resamples the training data based on sample quantiles.
datasplit(trainy, k.fold = 10)datasplit(trainy, k.fold = 10)
trainy |
a vector of response, must have a length equal to sample size. |
k.fold |
integer; number of folds in the cross-validation. if > 1, then apply k-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
A list of samples each with an index of k-fold number.
This function is largely based on rfcv in randomForest.
Jin Li
A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18-22.
library(spm) data(petrel) idx1 <- datasplit(petrel[, 3], k.fold = 10) table(idx1)library(spm) data(petrel) idx1 <- datasplit(petrel[, 3], k.fold = 10) table(idx1)
This function is to derive the digit number after decimal point for a numeric variable (e.g., lat and long).
decimaldigit(x, dechar = ".", nint = NA, ndec = NA, pad.left = TRUE)decimaldigit(x, dechar = ".", nint = NA, ndec = NA, pad.left = TRUE)
x |
one or more decimal numbers. |
dechar |
The character used to separate the decimal part of a number. |
nint |
The number of characters to which the integer part of the numbers should be padded. |
ndec |
The number of characters to which the decimal part of the numbers should be padded. |
pad.left |
Whether the left (integer) side of the numbers should be padded as well as the right. |
A list of integer number to show digit number after decimal point of x.
This function is modified from decimal.align in 'prettyR' package.
Jin Li
Jim Lemon and Philippe Grosjean (2019). 'prettyR': Pretty Descriptive Stats. R package version 2.1.1. https://CRAN.R-project.org/package=prettyR.
x<-c(0.1, 2.2, 3.03, 44.444, 555.0005, 6666.66666) decimaldigit(x)x<-c(0.1, 2.2, 3.03, 44.444, 555.0005, 6666.66666) decimaldigit(x)
This function is a cross validation function for 38 hybrid methods of 'gbm', 'kriging' and 'IDW', including the average of 'gbmkrige' and 'gbmidw' ('gbmkrigegbmidw') and the average of 'gbm', 'gbmkrige' and 'gbmidw' ('gbmgbmkrigegbmidw'), where 'kriging' methods include ordinary kriging ('OK'), simple kriging ('SK'), block 'OK' ('BOK') and block 'SK'('BSK') and 'IDW' also covers 'NN' and 'KNN'. The data splitting is based on a stratified random sampling method (see the 'datasplit' function for details).
gbmkrigeidwcv( longlat, trainx, trainy, var.monotone = rep(0, ncol(trainx)), family = "gaussian", n.trees = 3000, learning.rate = 0.001, interaction.depth = 2, bag.fraction = 0.5, train.fraction = 1, n.minobsinnode = 10, transformation = "none", weights = rep(1, nrow(trainx)), keep.data = FALSE, verbose = TRUE, delta = 1, formula = res1 ~ 1, vgm.args = "Sph", anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, validation = "CV", cv.fold = 10, predacc = "VEcv", n.cores = 6, ... )gbmkrigeidwcv( longlat, trainx, trainy, var.monotone = rep(0, ncol(trainx)), family = "gaussian", n.trees = 3000, learning.rate = 0.001, interaction.depth = 2, bag.fraction = 0.5, train.fraction = 1, n.minobsinnode = 10, transformation = "none", weights = rep(1, nrow(trainx)), keep.data = FALSE, verbose = TRUE, delta = 1, formula = res1 ~ 1, vgm.args = "Sph", anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, validation = "CV", cv.fold = 10, predacc = "VEcv", n.cores = 6, ... )
longlat |
a dataframe contains longitude and latitude of point samples. |
trainx |
a dataframe or matrix contains columns of predictive variables. |
trainy |
a vector of the response variable. |
var.monotone |
an optional vector, the same length as the number of predictors, indicating which variables have a monotone increasing (+1), decreasing (-1), or arbitrary (0) relationship with the outcome. By default, a vector of 0 is used. |
family |
either a character string specifying the name of the distribution to use or a list with a component name specifying the distribution and any additional parameters needed. See gbm for details. By default, "gaussian" is used. |
n.trees |
the total number of trees to fit. This is equivalent to the number of iterations and the number of basis functions in the additive expansion. By default, 3000 is used. |
learning.rate |
a shrinkage parameter applied to each tree in the expansion. Also known as step-size reduction. |
interaction.depth |
the maximum depth of variable interactions. 1 implies an additive model, 2 implies a model with up to 2-way interactions, etc. By default, 2 is used. |
bag.fraction |
the fraction of the training set observations randomly selected to propose the next tree in the expansion. By default, 0.5 is used. |
train.fraction |
The first train.fraction * nrows(data) observations are used to fit the gbm and the remainder are used for computing out-of-sample estimates of the loss function. |
n.minobsinnode |
minimum number of observations in the trees terminal nodes. Note that this is the actual number of observations not the total weight. By default, 10 is used. |
transformation |
transform the residuals of 'gbm' to normalize the data for 'krige'; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
weights |
an optional vector of weights to be used in the fitting process. Must be positive but do not need to be normalized. If keep.data = FALSE in the initial call to gbm then it is the user's responsibility to resupply the weights to gbm.more. By default, a vector of 1 is used. |
keep.data |
a logical variable indicating whether to keep the data and an index of the data stored with the object. Keeping the data and index makes subsequent calls to gbm.more faster at the cost of storing an extra copy of the dataset. By default, 'FALSE' is used. |
verbose |
If TRUE, gbm will print out progress and performance indicators. By default, 'TRUE' is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. The default is 'formula = res1 ~ 1'. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
hybrid.parameter |
the default is 2 that is for 'gbmkrigegbmidw'; for 'gbmgbmkrigegbmidw', it needs to be 3. |
lambda |
ranging from 0 to 2; the default is 1 for 'gbmkrigegbmidw' and 'gbmgbmkrigegbmidw'; and if it is < 1, more weight is placed on 'krige', otherwise more weight is placed on 'idw'; and if it is 0, 'idw' is not considered and the resultant methods is 'gbmkrige' when the default 'hybrid.parameter' is used; and if it is 2, then the resultant method is 'gbmidw' when the default 'hybrid.parameter' is used. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
n.cores |
The number of CPU cores to use. See gbm for details. By default, 6 is used. |
... |
other arguments passed on to 'randomForest', 'krige' and 'gstat'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only
This function is largely based on 'gbmcv' in 'spm', and 'krigecv' in 'spm2'.
Jin Li
Li, J. (2022). Spatial Predictive Modeling with R. Boca Raton, Chapman and Hall/CRC.
Li, J., Potter, A., Huang, Z., and Heap, A. (2012). Predicting Seabed Sand Content across the Australian Margin Using Machine Learning and Geostatistical Methods, Geoscience Australia, Record 2012/48, 115pp.
Li, J., Heap, A., Potter, A., and Danilel, J.J. (2011). Predicting Seabed Mud Content across the Australian Margin II: Performance of Machine Learning Methods and Their Combination with Ordinary Kriging and Inverse Distance Squared, Geoscience Australia, Record 2011/07, 69pp.
Greg Ridgeway with contributions from others (2015). gbm: Generalized Boosted Regression Models. R package version 2.1.1. https://CRAN.R-project.org/package=gbm
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) # gbmgbmokgbmidw data(sponge) longlat <- sponge[, 1:2] set.seed(1234) gbmgbmkrigegbmidwcv1 <- gbmkrigeidwcv(longlat = longlat, trainx = sponge[, -3], trainy = sponge[, 3], family = "poisson", interaction.depth = 3, transformation = "none", formula = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 3, validation = "CV", predacc = "ALL", n.cores = 2) gbmgbmkrigegbmidwcv1 # gbmokgbmidw for count data data(sponge) longlat <- sponge2[, 1:2] y = sponge[, 3] trainx = sponge[, -3] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { gbmkrigegbmidwcv1 <- gbmkrigeidwcv(longlat = longlat, trainx = trainx, trainy = y, family = "poisson", interaction.depth = 3, transformation = "none", formula = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, validation = "CV", predacc = "VEcv", n.cores = 2) VEcv [i] <- gbmkrigegbmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for gbmokgbmidw", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) # gbmgbmokgbmidw data(sponge) longlat <- sponge[, 1:2] set.seed(1234) gbmgbmkrigegbmidwcv1 <- gbmkrigeidwcv(longlat = longlat, trainx = sponge[, -3], trainy = sponge[, 3], family = "poisson", interaction.depth = 3, transformation = "none", formula = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 3, validation = "CV", predacc = "ALL", n.cores = 2) gbmgbmkrigegbmidwcv1 # gbmokgbmidw for count data data(sponge) longlat <- sponge2[, 1:2] y = sponge[, 3] trainx = sponge[, -3] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { gbmkrigegbmidwcv1 <- gbmkrigeidwcv(longlat = longlat, trainx = trainx, trainy = y, family = "poisson", interaction.depth = 3, transformation = "none", formula = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, validation = "CV", predacc = "VEcv", n.cores = 2) VEcv [i] <- gbmkrigegbmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for gbmokgbmidw", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is for generating spatial predictions using the hybrid methods of 'gbm', 'kriging' and 'IDW', including all methods implemented in 'gbmkrigeidwcv'.
gbmkrigeidwpred( longlat, trainx, predx, trainy, longlatpredx, var.monotone = rep(0, ncol(trainx)), family = "gaussian", n.trees = 3000, learning.rate = 0.001, interaction.depth = 2, bag.fraction = 0.5, train.fraction = 1, n.minobsinnode = 10, transformation = "none", weights = rep(1, nrow(trainx)), keep.data = FALSE, verbose = TRUE, delta = 1, formula = res1 ~ 1, vgm.args = "Sph", anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, cv.fold = 10, n.cores = 8, ... )gbmkrigeidwpred( longlat, trainx, predx, trainy, longlatpredx, var.monotone = rep(0, ncol(trainx)), family = "gaussian", n.trees = 3000, learning.rate = 0.001, interaction.depth = 2, bag.fraction = 0.5, train.fraction = 1, n.minobsinnode = 10, transformation = "none", weights = rep(1, nrow(trainx)), keep.data = FALSE, verbose = TRUE, delta = 1, formula = res1 ~ 1, vgm.args = "Sph", anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, cv.fold = 10, n.cores = 8, ... )
longlat |
a dataframe contains longitude and latitude of point samples. |
trainx |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
trainy |
a vector of the response variable in the formula, that is, the left part of the formula. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted. |
var.monotone |
an optional vector, the same length as the number of predictors, indicating which variables have a monotone increasing (+1), decreasing (-1), or arbitrary (0) relationship with the outcome. By default, a vector of 0 is used. |
family |
either a character string specifying the name of the distribution to use or a list with a component name specifying the distribution and any additional parameters needed. See gbm for details. By default, "gaussian" is used. |
n.trees |
the total number of trees to fit. This is equivalent to the number of iterations and the number of basis functions in the additive expansion. By default, 3000 is used. |
learning.rate |
a shrinkage parameter applied to each tree in the expansion. Also known as step-size reduction. |
interaction.depth |
the maximum depth of variable interactions. 1 implies an additive model, 2 implies a model with up to 2-way interactions, etc. By default, 2 is used. |
bag.fraction |
the fraction of the training set observations randomly selected to propose the next tree in the expansion. By default, 0.5 is used. |
train.fraction |
The first 'train.fraction * nrows(data)' observations are used to fit the gbm and the remainder are used for computing out-of-sample estimates of the loss function. |
n.minobsinnode |
minimum number of observations in the trees terminal nodes. Note that this is the actual number of observations not the total weight. By default, 10 is used. |
transformation |
transform the residuals of 'gbm' to normalise the data; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
weights |
an optional vector of weights to be used in the fitting process. Must be positive but do not need to be normalized. If keep.data = FALSE in the initial call to gbm then it is the user's responsibility to resupply the weights to gbm.more. By default, a vector of 1 is used. |
keep.data |
a logical variable indicating whether to keep the data and an index of the data stored with the object. Keeping the data and index makes subsequent calls to gbm.more faster at the cost of storing an extra copy of the dataset. By default, 'FALSE' is used. |
verbose |
If TRUE, gbm will print out progress and performance indicators. By default, 'TRUE' is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. The default is 'formula = res1 ~ 1'. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
hybrid.parameter |
the default is 2 that is for 'gbmkrigegbmidw'; for 'gbmgbmkrigegbmidw', it needs to be 3. |
lambda |
ranging from 0 to 2; the default is 1 for 'gbmkrigegbmidw' and 'gbmgbmkrigegbmidw'; and if it is < 1, more weight is placed on 'krige', otherwise more weight is placed on 'idw'; and if it is 0, 'idw' is not considered and the resultant methods is 'gbmkrige' when the default 'hybrid.parameter' is used; and if it is 2, then the resultant method is 'gbmidw' when the default 'hybrid.parameter' is used. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
n.cores |
The number of CPU cores to use. See gbm for details. By default, 6 is used. |
... |
other arguments passed on to 'gbm', 'krige' and 'gstat'. |
A dataframe of longitude, latitude, and predictions.
Jin Li
Li, J., Potter, A., Huang, Z., and Heap, A. (2012). Predicting Seabed Sand Content across the Australian Margin Using Machine Learning and Geostatistical Methods, Geoscience Australia, Record 2012/48, 115pp.
Li, J., Heap, A., Potter, A., and Danilel, J.J. (2011). Predicting Seabed Mud Content across the Australian Margin II: Performance of Machine Learning Methods and Their Combination with Ordinary Kriging and Inverse Distance Squared, Geoscience Australia, Record 2011/07, 69pp.
Greg Ridgeway with contributions from others (2015). gbm: Generalized Boosted Regression Models. R package version 2.1.1. https://CRAN.R-project.org/package=gbm
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) data(sponge) data(sponge.grid) longlat <- sponge[, 1:2] set.seed(1234) gbmkrigeidwpred1 <- gbmkrigeidwpred(longlat = longlat, trainx = sponge[, -3], predx = sponge.grid, trainy = sponge[, 3], longlatpredx = sponge.grid[, c(1:2)], family = "poisson", interaction.depth = 3, transformation = "none", formula = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 3, n.cores = 2) names(gbmkrigeidwpred1) range(gbmkrigeidwpred1$predictions)library(spm) data(sponge) data(sponge.grid) longlat <- sponge[, 1:2] set.seed(1234) gbmkrigeidwpred1 <- gbmkrigeidwpred(longlat = longlat, trainx = sponge[, -3], predx = sponge.grid, trainy = sponge[, 3], longlatpredx = sponge.grid[, c(1:2)], family = "poisson", interaction.depth = 3, transformation = "none", formula = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 3, n.cores = 2) names(gbmkrigeidwpred1) range(gbmkrigeidwpred1$predictions)
This function is a cross validation function for 'glm' method in 'stats' package.
glmcv( formula = NULL, trainxy, y, family = "gaussian", validation = "CV", cv.fold = 10, predacc = "VEcv", ... )glmcv( formula = NULL, trainxy, y, family = "gaussian", validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
formula |
a formula defining the response variable and predictive variables. |
trainxy |
a dataframe contains predictive variables and the response variable of point samples. The location information, longitude (long), latitude (lat), need to be included in the 'trainx' for spatial predictive modeling. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
family |
a description of the error distribution and link function to be used in the model. See '?glm' for details. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'glm'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only
This function is largely based on 'rfcv' in 'randomForest' and 'glm' in 'stats'.
Jin Li
A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18-22.
library(spm) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) set.seed(1234) glmcv1 <- glmcv(formula = model, gravel, log(gravel[, 7] +1), validation = "CV", predacc = "ALL") glmcv1 # Since the default 'family' is used, it is actually a 'lm' model. data(sponge) model <- sponge ~ easting + I(easting^2) set.seed(1234) glmcv1 <- glmcv(formula = model, sponge, sponge[, 3], family = poisson, validation = "CV", predacc = "ALL") glmcv1 # For glm model <- gravel / 100 ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmcv1 <- glmcv(formula = model, gravel, gravel[, 7] / 100, family = binomial(link=logit), validation = "CV", predacc = "VEcv") VEcv [i] <- glmcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) set.seed(1234) glmcv1 <- glmcv(formula = model, gravel, log(gravel[, 7] +1), validation = "CV", predacc = "ALL") glmcv1 # Since the default 'family' is used, it is actually a 'lm' model. data(sponge) model <- sponge ~ easting + I(easting^2) set.seed(1234) glmcv1 <- glmcv(formula = model, sponge, sponge[, 3], family = poisson, validation = "CV", predacc = "ALL") glmcv1 # For glm model <- gravel / 100 ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmcv1 <- glmcv(formula = model, gravel, gravel[, 7] / 100, family = binomial(link=logit), validation = "CV", predacc = "VEcv") VEcv [i] <- glmcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is a cross validation function for the hybrid method of 'glm' and 'idw' using 'gstat' (glmidw) (see reference #1), where the data splitting is based on a stratified random sampling method (see the 'datasplit' function for details).
glmidwcv( formula = NULL, longlat, trainxy, y, family = "gaussian", idp = 2, nmaxidw = 12, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )glmidwcv( formula = NULL, longlat, trainxy, y, family = "gaussian", idp = 2, nmaxidw = 12, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
formula |
a formula defining the response variable and predictive variables for 'glm'. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
family |
a description of the error distribution and link function to be used in the model. See '?glm' for details. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'glm' and 'gstat'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only.
This function is largely based on 'rfcv' in 'randomForest', 'idwcv' in 'spm'and 'glm' in 'stats'.
Jin Li
Li, J., Alvarez, B., Siwabessy, J., Tran, M., Huang, Z., Przeslawski, R., Radke, L., Howard, F. and Nichol, S. (2017). "Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness." Environmental Modelling & Software 97: 112-129.
A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18-22.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) set.seed(1234) glmidwcv1 <- glmidwcv(formula = model, longlat = longlat, trainxy = gravel, y = y, idp = 2, nmaxidw = 12, validation = "CV", predacc = "ALL") glmidwcv1 # Since the default 'family' is used, actually a 'lm' model is used. data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ long + I(long^2) y = spongelonglat[, 1] set.seed(1234) glmidwcv1 <- glmidwcv(formula = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, idp = 2, nmaxidw = 12, validation = "CV", predacc = "ALL") glmidwcv1 # glmidw for count data data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ . # use all predictive variables in the dataset y = spongelonglat[, 1] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmidwcv1 <- glmidwcv(formula = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- glmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) # glmidw for percentage data longlat <- petrel[, c(1, 2)] model <- gravel / 100 ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmidwcv1 <- glmcv(formula = model, longlat = longlat, trainxy = gravel, y = gravel[, 7] / 100, family = binomial(link=logit), idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- glmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) set.seed(1234) glmidwcv1 <- glmidwcv(formula = model, longlat = longlat, trainxy = gravel, y = y, idp = 2, nmaxidw = 12, validation = "CV", predacc = "ALL") glmidwcv1 # Since the default 'family' is used, actually a 'lm' model is used. data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ long + I(long^2) y = spongelonglat[, 1] set.seed(1234) glmidwcv1 <- glmidwcv(formula = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, idp = 2, nmaxidw = 12, validation = "CV", predacc = "ALL") glmidwcv1 # glmidw for count data data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ . # use all predictive variables in the dataset y = spongelonglat[, 1] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmidwcv1 <- glmidwcv(formula = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- glmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) # glmidw for percentage data longlat <- petrel[, c(1, 2)] model <- gravel / 100 ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmidwcv1 <- glmcv(formula = model, longlat = longlat, trainxy = gravel, y = gravel[, 7] / 100, family = binomial(link=logit), idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- glmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is for generating spatial predictions using the hybrid method of 'glm' and 'idw' ('glmidw') (see reference #1).
glmidwpred( formula = NULL, longlat, trainxy, y, longlatpredx, predx, family = "gaussian", idp = 2, nmaxidw = 12, ... )glmidwpred( formula = NULL, longlat, trainxy, y, longlatpredx, predx, family = "gaussian", idp = 2, nmaxidw = 12, ... )
formula |
a formula defining the response variable and predictive variables for 'glm'. |
longlat |
a dataframe contains longitude and latitude of point samples. The location information must be named as 'long' and 'lat'. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be named as 'long' and 'lat'. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
family |
a description of the error distribution and link function to be used in the model. See '?glm' for details. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
... |
other arguments passed on to 'glm'. |
A dataframe of longitude, latitude, and predictions.
Jin Li
Li, J., Alvarez, B., Siwabessy, J., Tran, M., Huang, Z., Przeslawski, R., Radke, L., Howard, F. and Nichol, S. (2017). "Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness." Environmental Modelling & Software 97: 112-129.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) glmidwpred1 <- glmidwpred(formula = model, longlat = longlat, trainxy = gravel, y = y, longlatpredx = petrel.grid[, c(1:2)], predx = petrel.grid, idp = 2, nmaxidw = 12) # Since the default 'family' is used, actually a 'lm' model is used. names(glmidwpred1) # Back transform 'glmidwpred$predictions' to generate the final predictions glmidwpred1$predictions.bt <- exp(glmidwpred1$predictions) - 1 range(glmidwpred1$predictions.bt)library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) glmidwpred1 <- glmidwpred(formula = model, longlat = longlat, trainxy = gravel, y = y, longlatpredx = petrel.grid[, c(1:2)], predx = petrel.grid, idp = 2, nmaxidw = 12) # Since the default 'family' is used, actually a 'lm' model is used. names(glmidwpred1) # Back transform 'glmidwpred$predictions' to generate the final predictions glmidwpred1$predictions.bt <- exp(glmidwpred1$predictions) - 1 range(glmidwpred1$predictions.bt)
This function is a cross validation function for the hybrid method of 'glm' and 'krige' (glmkrige), where 'krige' methods include ordinary kriging ('OK'), simple kriging ('SK'), block 'OK' ('BOK') and block 'SK'('BSK') (see reference #1 for further info).
glmkrigecv( formula.glm = NULL, longlat, trainxy, y, family = "gaussian", transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )glmkrigecv( formula.glm = NULL, longlat, trainxy, y, family = "gaussian", transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
formula.glm |
a formula defining the response variable and predictive variables for 'glm'. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
family |
a description of the error distribution and link function to be used in the model. See '?glm' for details. |
transformation |
transform the residuals of 'glm' to normalise the data; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'glm' and 'krige'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only
This function is largely based on 'rfcv' in 'randomForest', 'krigecv' in 'spm2'and 'glm' in 'stats'.
Jin Li
Li, J., Alvarez, B., Siwabessy, J., Tran, M., Huang, Z., Przeslawski, R., Radke, L., Howard, F. and Nichol, S. (2017). "Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness." Environmental Modelling & Software 97: 112-129.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) set.seed(1234) glmkrigecv1 <- glmkrigecv(formula.glm = model, longlat = longlat, trainxy = gravel, y = y, transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, validation = "CV", predacc = "ALL") glmkrigecv1 # Since the default 'family' is used, actually a 'lm' model is used. data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ long + I(long^2) y = spongelonglat[, 1] set.seed(1234) glmkrigecv1 <- glmkrigecv(formula.glm = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, transformation = "arcsine", formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, validation = "CV", predacc = "ALL") glmkrigecv1 # glmok for count data data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ . # use all predictive variables in the dataset y = spongelonglat[, 1] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmkrigecv1 <- glmkrigecv(formula.glm = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- glmkrigecv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) # glmok for percentage data longlat <- petrel[, c(1, 2)] model <- gravel / 100 ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmkrigecv1 <- glmkrigecv(formula.glm = model, longlat = longlat, trainxy = gravel, y = gravel[, 7] / 100, family = binomial(link=logit), formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- glmkrigecv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) set.seed(1234) glmkrigecv1 <- glmkrigecv(formula.glm = model, longlat = longlat, trainxy = gravel, y = y, transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, validation = "CV", predacc = "ALL") glmkrigecv1 # Since the default 'family' is used, actually a 'lm' model is used. data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ long + I(long^2) y = spongelonglat[, 1] set.seed(1234) glmkrigecv1 <- glmkrigecv(formula.glm = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, transformation = "arcsine", formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, validation = "CV", predacc = "ALL") glmkrigecv1 # glmok for count data data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ . # use all predictive variables in the dataset y = spongelonglat[, 1] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmkrigecv1 <- glmkrigecv(formula.glm = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- glmkrigecv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) # glmok for percentage data longlat <- petrel[, c(1, 2)] model <- gravel / 100 ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmkrigecv1 <- glmkrigecv(formula.glm = model, longlat = longlat, trainxy = gravel, y = gravel[, 7] / 100, family = binomial(link=logit), formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- glmkrigecv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is a cross validation function for 38 hybrid methods of 'glm', 'kriging' and 'IDW', including the average of 'glmkrige' and 'glmidw' ('glmkrigeglmidw') and the average of 'glm', 'glmkrige' and 'glmidw' ('glmglmkrigeglmidw'), where 'kriging' methods include ordinary kriging ('OK'), simple kriging ('SK'), block 'OK' ('BOK') and block 'SK'('BSK') and 'IDW' also covers 'NN' and 'KNN' (for details, see reference #1). This function can also be sued for 38 hybrid methods of 'lm', 'kriging' and 'IDW'.
glmkrigeidwcv( formula.glm = NULL, longlat, trainxy, y, family = "gaussian", transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )glmkrigeidwcv( formula.glm = NULL, longlat, trainxy, y, family = "gaussian", transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
formula.glm |
a formula defining the response variable and predictive variables for 'glm'. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
family |
a description of the error distribution and link function to be used in the model. See '?glm' for details. |
transformation |
transform the residuals of 'glm' to normalise the data for 'krige'; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
hybrid.parameter |
the default is 2 that is for 'glmkrigeglmidw'; for 'glmglmkrigeglmidw', it needs to be 3. |
lambda |
ranging from 0 to 2; the default is 1 for 'glmkrigeglmidw' and 'glmglmkrigeglmidw'; and if it is < 1, more weight is placed on 'krige', otherwise more weight is placed on 'idw'; and if it is 0, 'idw' is not considered and the resultant methods is 'glmkrige' when the default 'hybrid.parameter' is used; and if it is 2, then the resultant method is 'glmidw' when the default 'hybrid.parameter' is used. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'glm', 'krige' and 'gstat'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only
This function is largely based on 'rfcv' in 'randomForest', 'krigecv' in 'spm2'and 'glm' in 'stats'.
Jin Li
Li, J. (2022). Spatial Predictive Modeling with R. Boca Raton, Chapman and Hall/CRC.
Li, J., Alvarez, B., Siwabessy, J., Tran, M., Huang, Z., Przeslawski, R., Radke, L., Howard, F. and Nichol, S. (2017). "Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness." Environmental Modelling & Software 97: 112-129.
A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18-22.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) # glmokglidw data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) set.seed(1234) glmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat, trainxy = gravel, y = y, transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "ALL") glmkrigeglmidwcv1 # Since the default 'family' is used, actually a 'lm' model is used. # glmokglmidw data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ long + I(long^2) y = spongelonglat[, 1] set.seed(1234) glmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, transformation = "arcsine", formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "ALL") glmkrigeglmidwcv1 # glmglmokglmidw data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ long + I(long^2) y = spongelonglat[, 1] set.seed(1234) glmglmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, transformation = "arcsine", formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 3, validation = "CV", predacc = "ALL") glmglmkrigeglmidwcv1 # glmokglidw for count data data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ . # use all predictive variables in the dataset y = spongelonglat[, 1] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- glmkrigeglmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) # glmokglmidw for percentage data longlat <- petrel[, c(1, 2)] model <- gravel / 100 ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat, trainxy = gravel, y = gravel[, 7] / 100, family = binomial(link=logit), formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- glmkrigeglmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) # glmokglidw data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) set.seed(1234) glmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat, trainxy = gravel, y = y, transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "ALL") glmkrigeglmidwcv1 # Since the default 'family' is used, actually a 'lm' model is used. # glmokglmidw data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ long + I(long^2) y = spongelonglat[, 1] set.seed(1234) glmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, transformation = "arcsine", formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "ALL") glmkrigeglmidwcv1 # glmglmokglmidw data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ long + I(long^2) y = spongelonglat[, 1] set.seed(1234) glmglmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, transformation = "arcsine", formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 3, validation = "CV", predacc = "ALL") glmglmkrigeglmidwcv1 # glmokglidw for count data data(spongelonglat) longlat <- spongelonglat[, 7:8] model <- sponge ~ . # use all predictive variables in the dataset y = spongelonglat[, 1] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat, trainxy = spongelonglat, y = y, family = poisson, formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- glmkrigeglmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) # glmokglmidw for percentage data longlat <- petrel[, c(1, 2)] model <- gravel / 100 ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat, trainxy = gravel, y = gravel[, 7] / 100, family = binomial(link=logit), formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- glmkrigeglmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is for generating spatial predictions using the hybrid methods of 'glm', 'kriging' and 'IDW', including all methods implemented in 'glmkrigeidwcv'.
glmkrigeidwpred( formula.glm = NULL, longlat, trainxy, predx, y, longlatpredx, family = "gaussian", transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, ... )glmkrigeidwpred( formula.glm = NULL, longlat, trainxy, predx, y, longlatpredx, family = "gaussian", transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, ... )
formula.glm |
a formula defining the response variable and predictive variables for 'glm'. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted. |
family |
a description of the error distribution and link function to be used in the model. See '?glm' for details. |
transformation |
transform the residuals of 'glm' to normalise the data; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
hybrid.parameter |
the default is 2 that is for 'glmkrigeglmidw'; for 'glmglmkrigeglmidw', it needs to be 3. |
lambda |
ranging from 0 to 2; the default is 1 for 'glmkrigeglmidw' and 'glmglmkrigeglmidw'; and if it is < 1, more weight is placed on 'krige', otherwise more weight is placed on 'idw'; and if it is 0, 'idw' is not considered and the resultant methods is 'glmkrige' when the default 'hybrid.parameter' is used; and if it is 2, then the resultant method is 'glmidw' when the default 'hybrid.parameter' is used. |
... |
other arguments passed on to 'glm', 'krige' and 'gstat'. |
A dataframe of longitude, latitude, and predictions.
Jin Li
Li, J., Alvarez, B., Siwabessy, J., Tran, M., Huang, Z., Przeslawski, R., Radke, L., Howard, F. and Nichol, S. (2017). "Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness." Environmental Modelling & Software 97: 112-129.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) glmkrigeidwpred1 <- glmkrigeidwpred(formula.glm = model, longlat = longlat, trainxy = gravel, predx = petrel.grid, y = y, longlatpredx = petrel.grid[, c(1:2)], formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12) # Since the default 'family' is used, actually a 'lm' model is used. names(glmkrigeidwpred1) # Back transform 'glmkrigeidwpred$predictions' to generate the final predictions glmkrigeidw.predictions <- exp(glmkrigeidwpred1$predictions) - 1 range(glmkrigeidw.predictions)library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) glmkrigeidwpred1 <- glmkrigeidwpred(formula.glm = model, longlat = longlat, trainxy = gravel, predx = petrel.grid, y = y, longlatpredx = petrel.grid[, c(1:2)], formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12) # Since the default 'family' is used, actually a 'lm' model is used. names(glmkrigeidwpred1) # Back transform 'glmkrigeidwpred$predictions' to generate the final predictions glmkrigeidw.predictions <- exp(glmkrigeidwpred1$predictions) - 1 range(glmkrigeidw.predictions)
This function is for generating spatial predictions using the hybrid method of 'glm' and 'krige', including all methods implemented in 'glmkrigecv'. (see reference #1 for further info).
glmkrigepred( formula.glm = NULL, longlat, trainxy, predx, y, longlatpredx, family = "gaussian", transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, ... )glmkrigepred( formula.glm = NULL, longlat, trainxy, predx, y, longlatpredx, family = "gaussian", transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, ... )
formula.glm |
a formula defining the response variable and predictive variables for 'glm'. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted. |
family |
a description of the error distribution and link function to be used in the model. See '?glm' for details. |
transformation |
transform the residuals of 'glm' to normalise the data; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
... |
other arguments passed on to 'glm' and 'krige'. |
A dataframe of longitude, latitude, and predictions.
Jin Li
Li, J., Alvarez, B., Siwabessy, J., Tran, M., Huang, Z., Przeslawski, R., Radke, L., Howard, F. and Nichol, S. (2017). "Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness." Environmental Modelling & Software 97: 112-129.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) glmkrigepred1 <- glmkrigepred(formula.glm = model, longlat = longlat, trainxy = gravel, predx = petrel.grid, y = y, longlatpredx = petrel.grid[, c(1:2)], transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12) # Since the default 'family' is used, actually a 'lm' model is used. names(glmkrigepred1) # Back transform 'glmkrigepred$predictions' to generate the final predictions glmkrige.predictions <- exp(glmkrigepred1$predictions) - 1 range(glmkrige.predictions)library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) glmkrigepred1 <- glmkrigepred(formula.glm = model, longlat = longlat, trainxy = gravel, predx = petrel.grid, y = y, longlatpredx = petrel.grid[, c(1:2)], transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12) # Since the default 'family' is used, actually a 'lm' model is used. names(glmkrigepred1) # Back transform 'glmkrigepred$predictions' to generate the final predictions glmkrige.predictions <- exp(glmkrigepred1$predictions) - 1 range(glmkrige.predictions)
This function is a cross validation function for 'glmnet' method in 'glmnet' package.
glmnetcv( trainx, y, family = "gaussian", alpha = 0.5, relax = FALSE, type.measure = "mse", validation = "CV", cv.fold = 10, predacc = "VEcv", ... )glmnetcv( trainx, y, family = "gaussian", alpha = 0.5, relax = FALSE, type.measure = "mse", validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
trainx |
a matrix contains predictive variables of point samples. The location information, longitude (long), latitude (lat), need to be included in the 'trainx' for spatial predictive modelling. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
family |
a description of the error distribution and link function to be used in the model. See '?glmnet' for details. |
alpha |
an elasticnet mixing parameter, with $0 <= alpha <= 1$. See '?glmnet' for details. |
relax |
if TRUE then for each active set in the path of solutions, the model is refit without any regularization. See '?glmnet' for more information. |
type.measure |
loss to use for cross-validation. The default is type.measure="mse". See '?cv.glmnet' for more information. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'fields'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only
This function is largely based on 'glmcv' in this 'spm2' package.
Jin Li
A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18-22.
library(spm) data(petrel) x <- as.matrix(petrel[, c(1, 2, 6:9)]) y <- log(petrel[, 5] + 1) set.seed(1234) glmnetcv1 <- glmnetcv(x, y, validation = "CV", predacc = "ALL") glmnetcv1 data(sponge) x <- as.matrix(cbind(sponge$easting, sponge$easting^2)) set.seed(1234) glmnetcv1 <- glmnetcv(x, sponge[, 3], family = poisson, validation = "CV", predacc = "ALL") glmnetcv1 # For glmnet with gaussian x <- as.matrix(petrel[, c(1, 2, 6:9)]) y <- log(petrel[, 5] + 1) set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmnetcv1 <- glmnetcv(x, y, validation = "CV", predacc = "VEcv") VEcv [i] <- glmnetcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for glmnet", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) # For glmnet with binomial x <- as.matrix(cbind(petrel[, c(2, 6)], petrel$long^3, petrel$lat^2, petrel$lat^3)) set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmnetcv1 <- glmnetcv(x, petrel[, 5] / 100, family = binomial(link=logit), validation = "CV", predacc = "VEcv") VEcv [i] <- glmnetcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for glmnet", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) data(petrel) x <- as.matrix(petrel[, c(1, 2, 6:9)]) y <- log(petrel[, 5] + 1) set.seed(1234) glmnetcv1 <- glmnetcv(x, y, validation = "CV", predacc = "ALL") glmnetcv1 data(sponge) x <- as.matrix(cbind(sponge$easting, sponge$easting^2)) set.seed(1234) glmnetcv1 <- glmnetcv(x, sponge[, 3], family = poisson, validation = "CV", predacc = "ALL") glmnetcv1 # For glmnet with gaussian x <- as.matrix(petrel[, c(1, 2, 6:9)]) y <- log(petrel[, 5] + 1) set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmnetcv1 <- glmnetcv(x, y, validation = "CV", predacc = "VEcv") VEcv [i] <- glmnetcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for glmnet", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) # For glmnet with binomial x <- as.matrix(cbind(petrel[, c(2, 6)], petrel$long^3, petrel$lat^2, petrel$lat^3)) set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glmnetcv1 <- glmnetcv(x, petrel[, 5] / 100, family = binomial(link=logit), validation = "CV", predacc = "VEcv") VEcv [i] <- glmnetcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for glmnet", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is for generating spatial predictions using 'glm' method in 'stats' package.
glmpred(formula = NULL, trainxy, longlatpredx, predx, family = "gaussian", ...)glmpred(formula = NULL, trainxy, longlatpredx, predx, family = "gaussian", ...)
formula |
a formula defining the response variable and predictive variables. |
trainxy |
a dataframe contains predictive variables and the response variable of point samples. The location information, longitude (long), latitude (lat), need to be included in the 'trainx' for spatial predictive modeling, need to be named as 'long' and 'lat'. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
family |
a description of the error distribution and link function to be used in the model. See '?glm' for details. |
... |
other arguments passed on to 'glm'. |
A dataframe of longitude, latitude and predictions.
Jin Li
library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) glmpred1 <- glmpred(formula = model, trainxy = gravel, longlatpredx = petrel.grid[, c(1:2)], predx = petrel.grid) names(glmpred1) # Back transform 'glmpred1$pred.glm1' to generate the final predictions glm.predictions <- exp(glmpred1$pred.glm1) - 1 range(glm.predictions)library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) glmpred1 <- glmpred(formula = model, trainxy = gravel, longlatpredx = petrel.grid[, c(1:2)], predx = petrel.grid) names(glmpred1) # Back transform 'glmpred1$pred.glm1' to generate the final predictions glm.predictions <- exp(glmpred1$pred.glm1) - 1 range(glm.predictions)
This function is a cross validation function for 'gls' method in 'nlme' package.
glscv( model = var1 ~ 1, trainxy, y, corr.args = NULL, weights = NULL, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )glscv( model = var1 ~ 1, trainxy, y, corr.args = NULL, weights = NULL, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
model |
a formula defining the response variable and predictive variables. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be names as 'long' and 'lat'. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
corr.args |
arguments for 'correlation' in 'gls'. See '?corClasses' in 'nlme' for details. By default, "NULL" is used. When "NULL" is used, then 'gls' is actually performing 'lm'. |
weights |
describing the within-group heteroscedasticity structure. Defaults to "NULL", corresponding to homoscedastic errors. See '?gls' in 'nlme' for details. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'gls'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only
This function is largely based on rfcv in 'randomForest' and 'gls' in 'library(nlme)'.
Jin Li
Pinheiro, J. C. and D. M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New York, Springer.
library(spm) library(nlme) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) glscv1 <- glscv(model = model, gravel, log(gravel[, 7] +1), validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ long + lat, nugget = TRUE), predacc = "ALL") glscv1 #For gls set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glscv1 <- glscv(model = model, gravel, log(gravel[, 7] +1), validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ long + lat, nugget = TRUE), predacc = "VEcv") VEcv [i] <- glscv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLS", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) # For lm, that is, gls with 'correlation = NULL' n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL set.seed(1234) for (i in 1:n) { glscv1 <- glscv(model = model, gravel, log(gravel[, 7] +1), validation = "CV", predacc = "VEcv") VEcv [i] <- glscv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLS", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) library(nlme) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) glscv1 <- glscv(model = model, gravel, log(gravel[, 7] +1), validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ long + lat, nugget = TRUE), predacc = "ALL") glscv1 #For gls set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glscv1 <- glscv(model = model, gravel, log(gravel[, 7] +1), validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ long + lat, nugget = TRUE), predacc = "VEcv") VEcv [i] <- glscv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLS", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) # For lm, that is, gls with 'correlation = NULL' n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL set.seed(1234) for (i in 1:n) { glscv1 <- glscv(model = model, gravel, log(gravel[, 7] +1), validation = "CV", predacc = "VEcv") VEcv [i] <- glscv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLS", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is a cross validation function for the hybrid method of 'gls' and 'idw', where the data splitting is based on a stratified random sampling method (see the 'datasplit' function for details)
glsidwcv( model = var1 ~ 1, longlat, trainxy, y, corr.args = NULL, weights = NULL, idp = 2, nmaxidw = 12, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )glsidwcv( model = var1 ~ 1, longlat, trainxy, y, corr.args = NULL, weights = NULL, idp = 2, nmaxidw = 12, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
model |
a formula defining the response variable and predictive variables. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be names as 'long' and 'lat'. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
corr.args |
arguments for 'correlation' in 'gls'. See '?corClasses' in 'nlme' for details. By default, "NULL" is used. When "NULL" is used, then 'gls' is actually performing 'lm'. |
weights |
describing the within-group heteroscedasticity structure. Defaults to "NULL", corresponding to homoscedastic errors. See '?gls' in 'nlme' for details. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'gls' and 'gstat'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only.
This function is largely based on rfcv in 'randomForest' and 'gls' in 'library(nlme)'.
Jin Li
Pinheiro, J. C. and D. M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New York, Springer.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) library(nlme) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) glsidwcv1 <- glsidwcv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), idp = 2, nmaxidw = 12, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "ALL") glsidwcv1 # For glsidw set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glsidwcv1 <- glsidwcv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), idp = 2, nmaxidw = 12, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "VEcv") VEcv [i] <- glsidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLSIDW", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) library(nlme) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) glsidwcv1 <- glsidwcv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), idp = 2, nmaxidw = 12, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "ALL") glsidwcv1 # For glsidw set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glsidwcv1 <- glsidwcv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), idp = 2, nmaxidw = 12, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "VEcv") VEcv [i] <- glsidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLSIDW", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is for generating spatial predictions using the hybrid method of 'gls' and 'idw' ('glsidw') (see reference #1).
glsidwpred( model = var1 ~ 1, longlat, trainxy, y, longlatpredx, predx, corr.args = NULL, weights = NULL, idp = 2, nmaxidw = 12, ... )glsidwpred( model = var1 ~ 1, longlat, trainxy, y, longlatpredx, predx, corr.args = NULL, weights = NULL, idp = 2, nmaxidw = 12, ... )
model |
a formula defining the response variable and predictive variables. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be names as 'long' and 'lat'. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted. The location information must be named as 'long' and 'lat'. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
corr.args |
arguments for 'correlation' in 'gls'. See '?corClasses' in 'nlme' for details. By default, "NULL" is used. When "NULL" is used, then 'gls' is actually performing 'lm'. |
weights |
describing the within-group heteroscedasticity structure. Defaults to "NULL", corresponding to homoscedastic errors. See '?gls' in 'nlme' for details. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
... |
other arguments passed on to 'gls' and 'gstat'. |
A dataframe of longitude, latitude, and predictions.
Jin Li
Pinheiro, J. C. and D. M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New York, Springer.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) library(nlme) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) y <- log(gravel[, 7] +1) glsidwpred1 <- glsidwpred(model = model, longlat = longlat, trainxy = gravel, y = y, longlatpredx = petrel.grid[, c(1:2)], predx = petrel.grid, idp = 2, nmaxidw = 12, corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE)) names(glsidwpred1) # Back transform 'glsidwpred$predictions' to generate the final predictions glsidw.predictions <- exp(glsidwpred1$predictions) - 1 range(glsidw.predictions)library(spm) library(nlme) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) y <- log(gravel[, 7] +1) glsidwpred1 <- glsidwpred(model = model, longlat = longlat, trainxy = gravel, y = y, longlatpredx = petrel.grid[, c(1:2)], predx = petrel.grid, idp = 2, nmaxidw = 12, corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE)) names(glsidwpred1) # Back transform 'glsidwpred$predictions' to generate the final predictions glsidw.predictions <- exp(glsidwpred1$predictions) - 1 range(glsidw.predictions)
This function is a cross validation function for the hybrid method of 'gls' and 'krige' ('glskrige'), where the data splitting is based on a stratified random sampling method (see the 'datasplit' function for details)
glskrigecv( model = var1 ~ 1, longlat, trainxy, y, corr.args = NULL, weights = NULL, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )glskrigecv( model = var1 ~ 1, longlat, trainxy, y, corr.args = NULL, weights = NULL, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
model |
a formula defining the response variable and predictive variables. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be names as 'long' and 'lat'. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
corr.args |
arguments for 'correlation' in 'gls'. See '?corClasses' in 'nlme' for details. By default, "NULL" is used. When "NULL" is used, then 'gls' is actually performing 'lm'. |
weights |
describing the within-group heteroscedasticity structure. Defaults to "NULL", corresponding to homoscedastic errors. See '?gls' in 'nlme' for details. |
transformation |
transform the residuals of 'gls' to normalize the data; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'gls' and 'krige'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only.
This function is largely based on rfcv in 'randomForest', 'krigecv' in 'spm2' and 'gls' in 'library(nlme)'.
Jin Li
Pinheiro, J. C. and D. M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New York, Springer.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) library(nlme) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) glskrigecv1 <- glskrigecv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "ALL") glskrigecv1 # For glskrige set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glskrigecv1 <- glskrigecv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxok = 12, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "VEcv") VEcv [i] <- glskrigecv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLSOK", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) library(nlme) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) glskrigecv1 <- glskrigecv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "ALL") glskrigecv1 # For glskrige set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glskrigecv1 <- glskrigecv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxok = 12, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "VEcv") VEcv [i] <- glskrigecv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLSOK", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is a cross validation function for 38 hybrid methods of 'gls', 'kriging' and 'IDW', including the average of 'glskrige' and 'glsidw' ('glskrigeglsidw') and the average of 'gls', 'glskrige' and 'glsidw' ('glsglskrigeglsidw'), where 'kriging' methods include ordinary kriging ('OK'), simple kriging ('SK'), block 'OK' ('BOK') and block 'SK'('BSK') and 'IDW' also covers 'NN' and 'KNN'.. The data splitting is based on a stratified random sampling method (see the 'datasplit' function for details).
glskrigeidwcv( model = var1 ~ 1, longlat, trainxy, y, corr.args = NULL, weights = NULL, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )glskrigeidwcv( model = var1 ~ 1, longlat, trainxy, y, corr.args = NULL, weights = NULL, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
model |
a formula defining the response variable and predictive variables. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be names as 'long' and 'lat'. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
corr.args |
arguments for 'correlation' in 'gls'. See '?corClasses' in 'nlme' for details. By default, "NULL" is used. When "NULL" is used, then 'gls' is actually performing 'lm'. |
weights |
describing the within-group heteroscedasticity structure. Defaults to "NULL", corresponding to homoscedastic errors. See '?gls' in 'nlme' for details. |
transformation |
transform the residuals of 'gls' to normalise the data for 'krige'; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
hybrid.parameter |
the default is 2 that is for 'glskrigeglsidw'; for 'glsglskrigeglsidw', it needs to be 3. |
lambda |
ranging from 0 to 2; the default is 1 for 'glskrigeglsidw' and 'glsglskrigeglsidw'; and if it is < 1, more weight is placed on 'krige', otherwise more weight is placed on 'idw'; and if it is 0, 'idw' is not considered and the resultant methods is 'glskrige' when the default 'hybrid.parameter' is used; and if it is 2, then the resultant method is 'glsidw' when the default 'hybrid.parameter' is used. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'gls', 'krige' and 'gstat'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only.
This function is largely based on rfcv in 'randomForest', 'krigecv' in 'spm2' and 'gls' in 'library(nlme)'.
Jin Li
Li, J. (2022). Spatial Predictive Modeling with R. Boca Raton, Chapman and Hall/CRC.
Pinheiro, J. C. and D. M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New York, Springer.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) library(nlme) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) glskrigeidwcv1 <- glskrigeidwcv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "ALL") glskrigeidwcv1 # For glskrigeglsidw set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glskrigeidwcv1 <- glskrigeidwcv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "VEcv") VEcv [i] <- glskrigeidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLSOKGLSIDW", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) # For glsglskrigeglsidw set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glskrigeidwcv1 <- glskrigeidwcv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 3, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "VEcv") VEcv [i] <- glskrigeidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLSOKGLSIDW", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) library(nlme) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) glskrigeidwcv1 <- glskrigeidwcv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "ALL") glskrigeidwcv1 # For glskrigeglsidw set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glskrigeidwcv1 <- glskrigeidwcv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "VEcv") VEcv [i] <- glskrigeidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLSOKGLSIDW", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) # For glsglskrigeglsidw set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { glskrigeidwcv1 <- glskrigeidwcv(model = model, longlat = longlat, trainxy = gravel, y = log(gravel[, 7] +1), transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 3, validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE), predacc = "VEcv") VEcv [i] <- glskrigeidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for GLSOKGLSIDW", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is for generating spatial predictions using the hybrid methods of 'gls', 'kriging' and 'IDW', including all methods implemented in 'glskrigeidwcv'.
glskrigeidwpred( model = var1 ~ 1, longlat, trainxy, predx, y, longlatpredx, corr.args = NULL, weights = NULL, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, ... )glskrigeidwpred( model = var1 ~ 1, longlat, trainxy, predx, y, longlatpredx, corr.args = NULL, weights = NULL, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, ... )
model |
a formula defining the response variable and predictive variables. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be names as 'long' and 'lat'. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted. The location information must be named as 'long' and 'lat'. |
corr.args |
arguments for 'correlation' in 'gls'. See '?corClasses' in 'nlme' for details. By default, "NULL" is used. When "NULL" is used, then 'gls' is actually performing 'lm'. |
weights |
describing the within-group heteroscedasticity structure. Defaults to "NULL", corresponding to homoscedastic errors. See '?gls' in 'nlme' for details. |
transformation |
transform the residuals of 'gls' to normalise the data for 'krige'; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
hybrid.parameter |
the default is 2 that is for 'glskrigeglsidw'; for 'glsglskrigeglsidw', it needs to be 3. |
lambda |
ranging from 0 to 2; the default is 1 for 'glskrigeglsidw' and 'glsglskrigeglsidw'; and if it is < 1, more weight is placed on 'krige', otherwise more weight is placed on 'idw'; and if it is 0, 'idw' is not considered and the resultant methods is 'glskrige' when the default 'hybrid.parameter' is used; and if it is 2, then the resultant method is 'glsidw' when the default 'hybrid.parameter' is used. |
... |
other arguments passed on to 'gls', 'krige' and 'gstat'. |
A dataframe of longitude, latitude, and predictions.
Jin Li
Pinheiro, J. C. and D. M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New York, Springer.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) library(nlme) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) y <- log(gravel[, 7] +1) glskrigeidwpred1 <- glskrigeidwpred(model = model, longlat = longlat, trainxy = gravel, predx = petrel.grid, y = y, longlatpredx = petrel.grid[, c(1:2)], transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE)) names(glskrigeidwpred1) # Back transform 'glskrigeidwpred$predictions' to generate the final predictions glskrigeidw.predictions <- exp(glskrigeidwpred1$predictions) - 1 range(glskrigeidw.predictions)library(spm) library(nlme) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) y <- log(gravel[, 7] +1) glskrigeidwpred1 <- glskrigeidwpred(model = model, longlat = longlat, trainxy = gravel, predx = petrel.grid, y = y, longlatpredx = petrel.grid[, c(1:2)], transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE)) names(glskrigeidwpred1) # Back transform 'glskrigeidwpred$predictions' to generate the final predictions glskrigeidw.predictions <- exp(glskrigeidwpred1$predictions) - 1 range(glskrigeidw.predictions)
This function is for generating spatial predictions using the hybrid method of 'gls' and 'krige' (glskrige).
glskrigepred( model = var1 ~ 1, longlat, trainxy, predx, y, longlatpredx, corr.args = NULL, weights = NULL, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, ... )glskrigepred( model = var1 ~ 1, longlat, trainxy, predx, y, longlatpredx, corr.args = NULL, weights = NULL, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, ... )
model |
a formula defining the response variable and predictive variables. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be names as 'long' and 'lat'. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted. The location information must be named as 'long' and 'lat'. |
corr.args |
arguments for 'correlation' in 'gls'. See '?corClasses' in 'nlme' for details. By default, "NULL" is used. When "NULL" is used, then 'gls' is actually performing 'lm'. |
weights |
describing the within-group heteroscedasticity structure. Defaults to "NULL", corresponding to homoscedastic errors. See '?gls' in 'nlme' for details. |
transformation |
transform the residuals of 'gls' to normalize the data; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
... |
other arguments passed on to 'gls' and 'krige'. |
A dataframe of longitude, latitude, and predictions.
Jin Li
Pinheiro, J. C. and D. M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New York, Springer.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) library(nlme) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) y <- log(gravel[, 7] +1) glskrigepred1 <- glskrigepred(model = model, longlat = longlat, trainxy = gravel, predx = petrel.grid, y = y, longlatpredx = petrel.grid[, c(1:2)], transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE)) names(glskrigepred1) # Back transform 'glskrigepred$predictions' to generate the final predictions glskrige.predictions <- exp(glskrigepred1$predictions) - 1 range(glskrige.predictions)library(spm) library(nlme) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) y <- log(gravel[, 7] +1) glskrigepred1 <- glskrigepred(model = model, longlat = longlat, trainxy = gravel, predx = petrel.grid, y = y, longlatpredx = petrel.grid[, c(1:2)], transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE)) names(glskrigepred1) # Back transform 'glskrigepred$predictions' to generate the final predictions glskrige.predictions <- exp(glskrigepred1$predictions) - 1 range(glskrige.predictions)
This function is for generating spatial predictions using 'gls' method in 'nlme' package.
glspred( model = var1 ~ 1, trainxy, longlatpredx, predx, corr.args = NULL, weights = NULL, ... )glspred( model = var1 ~ 1, trainxy, longlatpredx, predx, corr.args = NULL, weights = NULL, ... )
model |
a formula defining the response variable and predictive variables. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be names as 'long' and 'lat'. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted, need to be named as 'long' and 'lat'. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
corr.args |
arguments for 'correlation' in 'gls'. See '?corClasses' in 'nlme' for details. By default, "NULL" is used. When "NULL" is used, then 'gls' is actually performing 'lm'. |
weights |
describing the within-group heteroscedasticity structure. Defaults to "NULL", corresponding to homoscedastic errors. See '?gls' in 'nlme' for details. |
... |
other arguments passed on to 'gls'. |
A dataframe of longitude, latitude and predictions.
Jin Li
Pinheiro, J. C. and D. M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New York, Springer.
library(spm) library(nlme) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) glspred1 <- glspred(model = model, trainxy = gravel, longlatpredx = petrel.grid[, c(1:2)], predx = petrel.grid, corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE)) names(glspred1) # Back transform 'glspred1$predictions' to generate the final predictions gls.predictions <- exp(glspred1$predictions) - 1 range(gls.predictions)library(spm) library(nlme) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] range1 <- 0.8 nugget1 <- 0.5 model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) + I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3) glspred1 <- glspred(model = model, trainxy = gravel, longlatpredx = petrel.grid[, c(1:2)], predx = petrel.grid, corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE)) names(glspred1) # Back transform 'glspred1$predictions' to generate the final predictions gls.predictions <- exp(glspred1$predictions) - 1 range(gls.predictions)
This function is a cross validation function for kriging methods ('krige') in 'gstat'.
krigecv( longlat, trainy, trainpredx = NULL, validation = "CV", cv.fold = 10, nmax = 12, transformation = "none", delta = 1, formula = var1 ~ 1, vgm.args = ("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, predacc = "VEcv", ... )krigecv( longlat, trainy, trainpredx = NULL, validation = "CV", cv.fold = 10, nmax = 12, transformation = "none", delta = 1, formula = var1 ~ 1, vgm.args = ("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, predacc = "VEcv", ... )
longlat |
a dataframe contains longitude and latitude of point samples. |
trainy |
a vector of response, must have length equal to the number of rows in trainx. |
trainpredx |
a dataframe contains predictive variables of point samples. If longitude and latitude are going to be used as predictive variables, they should also be included but they should be named in names other than 'long' and 'lat'. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
nmax |
for local kriging: the number of nearest observations that should be used for a kriging prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
transformation |
transform response variable to normalise the data; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid 'log(0)' in "log" transformation. The default is 1. |
formula |
formula defining response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in the 'gstat' package for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in the 'gstat' package for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in the 'gstat' package for details. |
alpha |
direction in plane (x,y). see variogram in the 'gstat' package for details. |
block |
block size. see 'krige' in the 'gstat' package for details. |
beta |
for simple kriging. see 'krige' in the 'gstat' package for details. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to the function 'gstat'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only
This function is largely based on rfcv in 'randomForest' and some functions in 'library(gstat)'. When 'A zero or negative range was fitted to variogram' occurs, to allow 'gstat' running, the range was set to be positive by using 'min(vgm1$dist)'. In this case, caution should be taken in applying this method. If it still occurs for 'okpred' function, different method may need to be used.
Jin Li
Li, J., 2013. Predictive Modelling Using Random Forest and Its Hybrid Methods with Geostatistical Techniques in Marine Environmental Geosciences, In: Christen, P., Kennedy, P., Liu, L., Ong, K.-L., Stranieri, A., Zhao, Y. (Eds.), The proceedings of the Eleventh Australasian Data Mining Conference (AusDM 2013), Canberra, Australia, 13-15 November 2013. Conferences in Research and Practice in Information Technology, Vol. 146.
A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18-22.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(sp) library(spm) data(swmud) data(petrel) set.seed(1234) okcv1 <- krigecv(longlat = swmud[, c(1,2)], trainy = swmud[, 3], nmax = 7, transformation = "arcsine", vgm.args = ("Sph"), predacc = "VEcv") okcv1 set.seed(1234) skcv1 <- krigecv(longlat = swmud[, c(1,2)], trainy = swmud[, 3], nmax = 7, transformation = "arcsine", vgm.args = ("Sph"), predacc = "VEcv", beta = mean(swmud[, 3])) skcv1 set.seed(1234) ukcv1 <- krigecv(longlat = swmud[, c(1,2)], trainy = swmud[, 3], nmax = 7, transformation = "arcsine", formula = var1 ~ long + lat, vgm.args = ("Sph"), predacc = "VEcv") ukcv1 set.seed(1234) okcv2 <- krigecv(longlat = swmud[, c(1,2)], trainy = swmud[, 3], validation = "LOO", nmax = 7, transformation = "arcsine", vgm.args = ("Sph"), predacc = "ALL") okcv2 set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { okcv1 <- krigecv(longlat = petrel[, c(1,2)], trainy = petrel[, 5], nmax = 12, transformation = "arcsine", predacc = "VEcv") VEcv [i] <- okcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for OK", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) set.seed(1234) n <- 20 # number of iterations, 60 to 100 is recommended. measures <- NULL for (i in 1:n) { okcv1 <- krigecv(longlat = petrel[, c(1,2)], trainy = petrel[, 3], nmax = 12, transformation = "arcsine", predacc = "ALL") measures <- rbind(measures, okcv1$vecv) } plot(measures ~ c(1:n), xlab = "Iteration for OK", ylab = "VEcv (%)") points(cumsum(measures) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(measures), col = 'blue', lwd = 2)library(sp) library(spm) data(swmud) data(petrel) set.seed(1234) okcv1 <- krigecv(longlat = swmud[, c(1,2)], trainy = swmud[, 3], nmax = 7, transformation = "arcsine", vgm.args = ("Sph"), predacc = "VEcv") okcv1 set.seed(1234) skcv1 <- krigecv(longlat = swmud[, c(1,2)], trainy = swmud[, 3], nmax = 7, transformation = "arcsine", vgm.args = ("Sph"), predacc = "VEcv", beta = mean(swmud[, 3])) skcv1 set.seed(1234) ukcv1 <- krigecv(longlat = swmud[, c(1,2)], trainy = swmud[, 3], nmax = 7, transformation = "arcsine", formula = var1 ~ long + lat, vgm.args = ("Sph"), predacc = "VEcv") ukcv1 set.seed(1234) okcv2 <- krigecv(longlat = swmud[, c(1,2)], trainy = swmud[, 3], validation = "LOO", nmax = 7, transformation = "arcsine", vgm.args = ("Sph"), predacc = "ALL") okcv2 set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { okcv1 <- krigecv(longlat = petrel[, c(1,2)], trainy = petrel[, 5], nmax = 12, transformation = "arcsine", predacc = "VEcv") VEcv [i] <- okcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for OK", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) set.seed(1234) n <- 20 # number of iterations, 60 to 100 is recommended. measures <- NULL for (i in 1:n) { okcv1 <- krigecv(longlat = petrel[, c(1,2)], trainy = petrel[, 3], nmax = 12, transformation = "arcsine", predacc = "ALL") measures <- rbind(measures, okcv1$vecv) } plot(measures ~ c(1:n), xlab = "Iteration for OK", ylab = "VEcv (%)") points(cumsum(measures) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(measures), col = 'blue', lwd = 2)
This function is to make spatial predictions using kriging methods ('krige').
krigepred( trainx, trainy, trainx2, nmax = 12, transformation = "none", delta = 1, formula = var1 ~ 1, vgm.args = ("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, ... )krigepred( trainx, trainy, trainx2, nmax = 12, transformation = "none", delta = 1, formula = var1 ~ 1, vgm.args = ("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, ... )
trainx |
a dataframe contains longitude (long), latitude (lat) and predictive variables of point samples. The location information must be named as 'long' and 'lat'. |
trainy |
a vector of response, must have length equal to the number of rows in trainx. |
trainx2 |
a dataframe contains longitude (long), latitude (lat) and predictive variables of point locations (i.e., the centres of grids) to be predicted. The location information must be named as 'long' and 'lat' and in the first two columns respectively.. |
nmax |
for local kriging: the number of nearest observations that should be used for a kriging prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
transformation |
transform the response variable to normalise the data; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. |
formula |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in gstat for details. |
vgm.args |
arguments for vgm, e.g. variogram model of response variable and anisotropy parameters. see notes vgm in gstat for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes vgm in gstat for details. |
alpha |
direction in plane (x,y). see variogram in gstat for details. |
block |
block size. see krige in gstat for details. |
beta |
for simple kriging. see krige in gstat for details. |
... |
other arguments passed on to gstat. |
A dataframe of longitude, latitude, predictions and variances.
The variances in the output are not transformed back when a transformation is used. This is because kriging variances are not measuring the uncertainty of predictions but they are indicator of the spatial distribution of sample density. The variances are exported only for interested users; and if needed, they can be transformed back from the output.
Jin Li
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(sp) library(spm) data(swmud) data(sw) okpred1 <- krigepred(swmud[, c(1,2)], swmud[, 3], sw, nmax = 7, transformation = "arcsine", vgm.args = ("Sph")) names(okpred1)library(sp) library(spm) data(swmud) data(sw) okpred1 <- krigepred(swmud[, c(1,2)], swmud[, 3], sw, nmax = 7, transformation = "arcsine", vgm.args = ("Sph")) names(okpred1)
This function is a cross validation function for 38 hybrid methods of 'RF', 'kriging' and 'IDW', including the average of 'rfkrige' and 'rfidw' ('rfkrigerfidw') and the average of 'rf', 'rfkrige' and 'rfidw' ('rfrfkrigerfidw'), where 'kriging' methods include ordinary kriging ('OK'), simple kriging ('SK'), block 'OK' ('BOK') and block 'SK'('BSK') and 'IDW' also covers 'NN' and 'KNN'.. The data splitting is based on a stratified random sampling method (see the 'datasplit' function for details).
rfkrigeidwcv( longlat, trainx, trainy, mtry = function(p) max(1, floor(sqrt(p))), ntree = 500, transformation = "none", delta = 1, formula = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )rfkrigeidwcv( longlat, trainx, trainy, mtry = function(p) max(1, floor(sqrt(p))), ntree = 500, transformation = "none", delta = 1, formula = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
longlat |
a dataframe contains longitude and latitude of point samples. |
trainx |
a dataframe or matrix contains columns of predictive variables. |
trainy |
a vector of the response variable. |
mtry |
a function of number of remaining predictor variables to use as the 'mtry' parameter in the 'randomForest' call. |
ntree |
number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. By default, 500 is used. |
transformation |
transform the residuals of 'rf' to normalize the data for 'krige'; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
hybrid.parameter |
the default is 2 that is for 'rfkrigerfidw'; for 'rfrfkrigerfidw', it needs to be 3. |
lambda |
ranging from 0 to 2; the default is 1 for 'rfkrigerfidw' and 'rfrfkrigerfidw'; and if it is < 1, more weight is placed on 'krige', otherwise more weight is placed on 'idw'; and if it is 0, 'idw' is not considered and the resultant methods is 'rfkrige' when the default 'hybrid.parameter' is used; and if it is 2, then the resultant method is 'rfidw' when the default 'hybrid.parameter' is used. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'randomForest', 'krige' and 'gstat'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only
This function is largely based on 'rfcv' in 'randomForest', and 'krigecv' in 'spm2'.
Jin Li
Li, J. (2022). Spatial Predictive Modeling with R. Boca Raton, Chapman and Hall/CRC.
Li, J., Potter, A., Huang, Z., and Heap, A. (2012). Predicting Seabed Sand Content across the Australian Margin Using Machine Learning and Geostatistical Methods, Geoscience Australia, Record 2012/48, 115pp.
Li, J., Heap, A., Potter, A., and Danilel, J.J. (2011). Predicting Seabed Mud Content across the Australian Margin II: Performance of Machine Learning Methods and Their Combination with Ordinary Kriging and Inverse Distance Squared, Geoscience Australia, Record 2011/07, 69pp.
Liaw, A. and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18-22.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) # rfrfokrfidw data(sponge) longlat <- sponge[, 1:2] set.seed(1234) rfrfkrigerfidwcv1 <- rfkrigeidwcv(longlat = longlat, trainx = sponge[, -3], trainy = sponge[, 3], formula = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 3, validation = "CV", predacc = "ALL") rfrfkrigerfidwcv1 # rfokrfidw for count data data(sponge) longlat <- sponge2[, 1:2] y = sponge[, 3] trainx = sponge[, -3] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { rfkrigerfidwcv1 <- rfkrigeidwcv(longlat = longlat, trainx = trainx, trainy = y, formula = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- rfkrigerfidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for rfokrfidw", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) # rfrfokrfidw data(sponge) longlat <- sponge[, 1:2] set.seed(1234) rfrfkrigerfidwcv1 <- rfkrigeidwcv(longlat = longlat, trainx = sponge[, -3], trainy = sponge[, 3], formula = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 3, validation = "CV", predacc = "ALL") rfrfkrigerfidwcv1 # rfokrfidw for count data data(sponge) longlat <- sponge2[, 1:2] y = sponge[, 3] trainx = sponge[, -3] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { rfkrigerfidwcv1 <- rfkrigeidwcv(longlat = longlat, trainx = trainx, trainy = y, formula = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- rfkrigerfidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for rfokrfidw", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is for generating spatial predictions using the hybrid methods of 'RF', 'kriging' and 'IDW', including all methods implemented in 'rfkrigeidwcv'.
rfkrigeidwpred( longlat, trainx, predx, trainy, longlatpredx, mtry = function(p) max(1, floor(sqrt(p))), ntree = 500, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, ... )rfkrigeidwpred( longlat, trainx, predx, trainy, longlatpredx, mtry = function(p) max(1, floor(sqrt(p))), ntree = 500, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, ... )
longlat |
a dataframe contains longitude and latitude of point samples. |
trainx |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be named as 'long' and 'lat'. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
trainy |
a vector of the response variable in the formula, that is, the left part of the formula. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted. |
mtry |
a function of number of remaining predictor variables to use as the 'mtry' parameter in the 'randomForest' call. |
ntree |
number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. By default, 500 is used. #' @param longlatpredx a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted. The location information must be named as 'long' and 'lat'. |
transformation |
transform the residuals of 'rf' to normalise the data; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
hybrid.parameter |
the default is 2 that is for 'rfkrigerfidw'; for 'rfrfkrigerfidw', it needs to be 3. |
lambda |
ranging from 0 to 2; the default is 1 for 'rfkrigerfidw' and 'rfrfkrigerfidw'; and if it is < 1, more weight is placed on 'krige', otherwise more weight is placed on 'idw'; and if it is 0, 'idw' is not considered and the resultant methods is 'rfkrige' when the default 'hybrid.parameter' is used; and if it is 2, then the resultant method is 'rfidw' when the default 'hybrid.parameter' is used. |
... |
other arguments passed on to 'rf', 'krige' and 'gstat'. |
A dataframe of longitude, latitude, and predictions.
Jin Li
Li, J., Potter, A., Huang, Z., and Heap, A. (2012). Predicting Seabed Sand Content across the Australian Margin Using Machine Learning and Geostatistical Methods, Geoscience Australia, Record 2012/48, 115pp.
Li, J., Heap, A., Potter, A., and Danilel, J.J. (2011). Predicting Seabed Mud Content across the Australian Margin II: Performance of Machine Learning Methods and Their Combination with Ordinary Kriging and Inverse Distance Squared, Geoscience Australia, Record 2011/07, 69pp.
Liaw, A. and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18-22.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) data(sponge) data(sponge.grid) longlat <- sponge[, 1:2] y = sponge[, 3] trainx = sponge[, -3] set.seed(1234) rfkrigeidwpred1 <- rfkrigeidwpred(longlat = longlat, trainx = trainx, predx = sponge.grid, trainy = y, longlatpredx = sponge.grid[, c(1:2)], formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12) names(rfkrigeidwpred1) range(rfkrigeidwpred1$predictions)library(spm) data(sponge) data(sponge.grid) longlat <- sponge[, 1:2] y = sponge[, 3] trainx = sponge[, -3] set.seed(1234) rfkrigeidwpred1 <- rfkrigeidwpred(longlat = longlat, trainx = trainx, predx = sponge.grid, trainy = y, longlatpredx = sponge.grid[, c(1:2)], formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12) names(rfkrigeidwpred1) range(rfkrigeidwpred1$predictions)
This dataset contains 77 samples of 81 variables including easting (longitude), northing (latitude), bathy, backscatter and their derived variables.
data("sponge2")data("sponge2")
A data frame with 77 observations on the following 89 variables.
eastinga numeric vector, m
northinga numeric vector, m
species.richnessa numeric vector, no unit
muda numeric vector, percentage
sanda numeric vector, percentage
gravela numeric vector, percentage
bathya numeric vector, m
bs25a numeric vector, dB
bs10a numeric vector, dB
bs11a numeric vector, dB
bs12a numeric vector, dB
bs13a numeric vector, dB
bs14a numeric vector, dB
bs15a numeric vector, dB
bs16a numeric vector, dB
bs17a numeric vector, dB
bs18a numeric vector, dB
bs19a numeric vector, dB
bs20a numeric vector, dB
bs21a numeric vector, dB
bs22a numeric vector, dB
bs23a numeric vector, dB
bs24a numeric vector, dB
bs26a numeric vector, dB
bs27a numeric vector, dB
bs28a numeric vector, dB
bs29a numeric vector, dB
bs30a numeric vector, dB
bs31a numeric vector, dB
bs32a numeric vector, dB
bs33a numeric vector, dB
bs34a numeric vector, dB
bs35a numeric vector, dB
bs36a numeric vector, dB
bs_oa numeric vector, dB
bs_homo_oa numeric vector
bs_entro_oa numeric vector, no unit
bs_var_oa numeric vector, dB^2
bs_lmi_oa numeric vector
bathy_oa numeric vector, m
bathy_lmi_oa numeric vector
tpi_oa numeric vector, no unit
slope_oa numeric vector
plan_cur_oa numeric vector
prof_cur_oa numeric vector
relief_oa numeric vector
rugosity_oa numeric vector
dist.coasta numeric vector, m
rugosity3a numeric vector
rugosity5a numeric vector
rugosity7a numeric vector
tpi3a numeric vector, no unit
tpi5a numeric vector, no unit
tpi7a numeric vector, no unit
bathy_lmi3a numeric vector
bathy_lmi5a numeric vector
bathy_lmi7a numeric vector
plan_curv3a numeric vector
plan_curv5a numeric vector
plan_curv7a numeric vector
relief_3a numeric vector
relief_5a numeric vector
relief_7a numeric vector
slope3a numeric vector
slope5a numeric vector
slope7a numeric vector
prof_cur3a numeric vector
prof_cur5a numeric vector
prof_cur7a numeric vector
entro3a numeric vector, no unit
entro5a numeric vector, no unit
entro7a numeric vector, no unit
homo3a numeric vector
homo5a numeric vector
homo7a numeric vector
var3a numeric vector, dB^2
var5a numeric vector, dB^2
var7a numeric vector, dB^2
bs_lmi3a numeric vector
bs_lmi5a numeric vector
bs_lmi7a numeric vector
For details, please see the source. This dataset was published as an appendix of the paper listed in the source. Where the long and lat were reprojected to easting and northing.
see Appendix A-D. Supplementary data at: "http://dx.doi.org/10.1016/j.envsoft.2017.07.016."
Li, J., B. Alvarez, J. Siwabessy, M. Tran, Z. Huang, R. Przeslawski, L. Radke, F. Howard, and S. Nichol. 2017. Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness. Environmental Modelling & Software, 97: 112-129.
This dataset contains 77 samples of 7 predictive variables including longitude, latitude, bathy, backscatter and their derived variables. It is the sponge dataset in 'spm' package, but with long and lat instead of easting and northing.
data("spongelonglat")data("spongelonglat")
A data frame with 77 observations on the following 8 variables.
spongea numeric vector
tpi3a numeric vector
var7a numeric vector
entro7a numeric vector
bs34a numeric vector
bs11a numeric vector
longa numeric vector
lata numeric vector
For details, please see sponge dataset in library(spm). Where the long and lat were projected to easting and northing.
sponge dataset in library(spm)
Li, J., B. Alvarez, J. Siwabessy, M. Tran, Z. Huang, R. Przeslawski, L. Radke, F. Howard, and S. Nichol. 2017. Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness. Environmental Modelling & Software, 97: 112-129.
data(spongelonglat) ## maybe str(spongelonglat) ; plot(spongelonglat) ...data(spongelonglat) ## maybe str(spongelonglat) ; plot(spongelonglat) ...
This function is a cross validation function for 'svm' regression in 'e1071' package.
svmcv( formula = NULL, trainxy, y, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )svmcv( formula = NULL, trainxy, y, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
formula |
a formula defining the response variable and predictive variables. |
trainxy |
a dataframe contains predictive variables and the response variable of point samples. The location information, longitude (long), latitude (lat), need to be included in the 'trainx' for spatial predictive modelling, need to be named as 'long' and 'lat'. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
scale |
A logical vector indicating the variables to be scaled (default: TRUE). |
type |
the default setting is 'NULL'. See '?svm' for various options. |
kernel |
the default setting is 'radial'. See '?svm' for other options. |
degree |
a parameter needed for kernel of type polynomial (default: 3). |
gamma |
a parameter needed for all 'kernels' except 'linear' (default: 1/(data dimension)). |
coef0 |
a parameter needed for kernels of type 'polynomial' and 'sigmoid'(default: 0). |
cost |
cost of constraints violation (default: 1). |
nu |
a parameter needed for 'nu-classification', 'nu-regression', and 'one-classification' (default: 0.5). |
tolerance |
tolerance of termination criterion (default: 0.001). |
epsilon |
'epsilon' in the insensitive-loss function (default: 0.1). |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for 'vecv' or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'svm'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only
This function is largely based on 'rfcv' in 'randomForest' and 'svm' in 'e1071'.
Jin Li
David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel and Friedrich Leisch (2020). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-4. https://CRAN.R-project.org/package=e1071.
library(spm) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) set.seed(1234) svmcv1 <- svmcv(formula = model, gravel, log(gravel[, 7] +1), validation = "CV", predacc = "ALL") svmcv1 data(sponge2) model <- species.richness ~ . set.seed(1234) svmcv1 <- svmcv(formula = model, sponge2[, -4], sponge[, 3], gamma = 0.01, cost = 3.5, scale = TRUE, validation = "CV", predacc = "VEcv") svmcv1 # For svm model <- species.richness ~ . set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { svmcv1 <- svmcv(formula = model, sponge2[, -4], sponge[, 3], gamma = 0.01, cost = 3.5, scale = TRUE, validation = "CV", predacc = "VEcv") VEcv [i] <- svmcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for svm", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) set.seed(1234) svmcv1 <- svmcv(formula = model, gravel, log(gravel[, 7] +1), validation = "CV", predacc = "ALL") svmcv1 data(sponge2) model <- species.richness ~ . set.seed(1234) svmcv1 <- svmcv(formula = model, sponge2[, -4], sponge[, 3], gamma = 0.01, cost = 3.5, scale = TRUE, validation = "CV", predacc = "VEcv") svmcv1 # For svm model <- species.richness ~ . set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { svmcv1 <- svmcv(formula = model, sponge2[, -4], sponge[, 3], gamma = 0.01, cost = 3.5, scale = TRUE, validation = "CV", predacc = "VEcv") VEcv [i] <- svmcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for svm", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is a cross validation function for the hybrid method of 'svm' regression and 'idw' using 'gstat' (svmidw), where the data splitting is based on a stratified random sampling method (see the 'datasplit' function for details).
svmidwcv( formula = NULL, longlat, trainxy, y, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, idp = 2, nmaxidw = 12, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )svmidwcv( formula = NULL, longlat, trainxy, y, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, idp = 2, nmaxidw = 12, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
formula |
a formula defining the response variable and predictive variables for 'svm'. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be named as 'long' and 'lat'. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
scale |
A logical vector indicating the variables to be scaled (default: TRUE). |
type |
the default setting is 'NULL'. See '?svm' for various options. |
kernel |
the default setting is 'radial'. See '?svm' for other options. |
degree |
a parameter needed for kernel of type polynomial (default: 3). |
gamma |
a parameter needed for all 'kernels' except 'linear' (default: 1/(data dimension)). |
coef0 |
a parameter needed for kernels of type 'polynomial' and 'sigmoid'(default: 0). |
cost |
cost of constraints violation (default: 1). |
nu |
a parameter needed for 'nu-classification', 'nu-regression', and 'one-classification' (default: 0.5). |
tolerance |
tolerance of termination criterion (default: 0.001). |
epsilon |
'epsilon' in the insensitive-loss function (default: 0.1). |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for 'vecv' or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'svm' and 'gstat'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only.
This function is largely based on 'rfcv' in 'randomForest', 'idwcv' in 'spm'and 'svm' in 'e1071'.
Jin Li
Li, J., Potter, A., Huang, Z., and Heap, A. (2012). Predicting Seabed Sand Content across the Australian Margin Using Machine Learning and Geostatistical Methods, Geoscience Australia, Record 2012/48, 115pp.
Li, J., Heap, A., Potter, A., and Danilel, J.J. (2011). Predicting Seabed Mud Content across the Australian Margin II: Performance of Machine Learning Methods and Their Combination with Ordinary Kriging and Inverse Distance Squared, Geoscience Australia, Record 2011/07, 69pp.
David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel and Friedrich Leisch (2020). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-4. https://CRAN.R-project.org/package=e1071.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) set.seed(1234) svmidwcv1 <- svmidwcv(formula = model, longlat = longlat, trainxy = gravel, y = y, idp = 2, nmaxidw = 12, validation = "CV", predacc = "ALL") svmidwcv1 # svmidw for count data data(sponge2) model <- species.richness ~ . # use all predictive variables in the dataset longlat <- sponge2[, 1:2] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { svmidwcv1 <- svmidwcv(formula = model, longlat = longlat, trainxy = sponge2[, -4], y = sponge[, 3], gamma = 0.01, cost = 3.5, scale = TRUE, idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- svmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for svmidw", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) set.seed(1234) svmidwcv1 <- svmidwcv(formula = model, longlat = longlat, trainxy = gravel, y = y, idp = 2, nmaxidw = 12, validation = "CV", predacc = "ALL") svmidwcv1 # svmidw for count data data(sponge2) model <- species.richness ~ . # use all predictive variables in the dataset longlat <- sponge2[, 1:2] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { svmidwcv1 <- svmidwcv(formula = model, longlat = longlat, trainxy = sponge2[, -4], y = sponge[, 3], gamma = 0.01, cost = 3.5, scale = TRUE, idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- svmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for svmidw", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is for generating spatial predictions using the hybrid method of 'svm' and 'idw' ('svmidw').
svmidwpred( formula = NULL, longlat, trainxy, y, longlatpredx, predx, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, idp = 2, nmaxidw = 12, ... )svmidwpred( formula = NULL, longlat, trainxy, y, longlatpredx, predx, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, idp = 2, nmaxidw = 12, ... )
formula |
a formula defining the response variable and predictive variables for 'svm'. |
longlat |
a dataframe contains longitude and latitude of point samples. The location information must be named as 'long' and 'lat'. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
scale |
A logical vector indicating the variables to be scaled (default: TRUE). |
type |
the default setting is 'NULL'. See '?svm' for various options. |
kernel |
the default setting is 'radial'. See '?svm' for other options. |
degree |
a parameter needed for kernel of type polynomial (default: 3). |
gamma |
a parameter needed for all 'kernels' except 'linear' (default: 1/(data dimension)). |
coef0 |
a parameter needed for kernels of type 'polynomial' and 'sigmoid'(default: 0). |
cost |
cost of constraints violation (default: 1). |
nu |
a parameter needed for 'nu-classification', 'nu-regression', and 'one-classification' (default: 0.5). |
tolerance |
tolerance of termination criterion (default: 0.001). |
epsilon |
'epsilon' in the insensitive-loss function (default: 0.1). See '?svm' for details. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
... |
other arguments passed on to 'svm'. |
A dataframe of longitude, latitude, and predictions.
Jin Li
Li, J., Potter, A., Huang, Z., and Heap, A. (2012). Predicting Seabed Sand Content across the Australian Margin Using Machine Learning and Geostatistical Methods, Geoscience Australia, Record 2012/48, 115pp.
Li, J., Heap, A., Potter, A., and Danilel, J.J. (2011). Predicting Seabed Mud Content across the Australian Margin II: Performance of Machine Learning Methods and Their Combination with Ordinary Kriging and Inverse Distance Squared, Geoscience Australia, Record 2011/07, 69pp.
David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel and Friedrich Leisch (2020). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-4. https://CRAN.R-project.org/package=e1071.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) svmidwpred1 <- svmidwpred(formula = model, longlat = longlat, trainxy = gravel, y = y, longlatpredx = petrel.grid[, c(1:2)], predx = petrel.grid, idp = 2, nmaxidw = 12) names(svmidwpred1) # Back transform 'svmidwpred$predictions' to generate the final predictions svmidw.predictions <- exp(svmidwpred1$predictions) - 1 range(svmidw.predictions)library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) svmidwpred1 <- svmidwpred(formula = model, longlat = longlat, trainxy = gravel, y = y, longlatpredx = petrel.grid[, c(1:2)], predx = petrel.grid, idp = 2, nmaxidw = 12) names(svmidwpred1) # Back transform 'svmidwpred$predictions' to generate the final predictions svmidw.predictions <- exp(svmidwpred1$predictions) - 1 range(svmidw.predictions)
This function is a cross validation function for the hybrid method of 'svm' regression and 'krige' (svmkrige), where the data splitting is based on a stratified random sampling method (see the 'datasplit' function for details).
svmkrigecv( formula.svm = NULL, longlat, trainxy, y, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )svmkrigecv( formula.svm = NULL, longlat, trainxy, y, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
formula.svm |
a formula defining the response variable and predictive variables for 'svm'. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be named as 'long' and 'lat'. |
y |
a vector of the response variable in the formula.svm, that is, the left part of the formula.svm. |
scale |
A logical vector indicating the variables to be scaled (default: TRUE). |
type |
the default setting is 'NULL'. See '?svm' for various options. |
kernel |
the default setting is 'radial'. See '?svm' for other options. |
degree |
a parameter needed for kernel of type polynomial (default: 3). |
gamma |
a parameter needed for all 'kernels' except 'linear' (default: 1/(data dimension)). |
coef0 |
a parameter needed for kernels of type 'polynomial' and 'sigmoid'(default: 0). |
cost |
cost of constraints violation (default: 1). |
nu |
a parameter needed for 'nu-classification', 'nu-regression', and 'one-classification' (default: 0.5). |
tolerance |
tolerance of termination criterion (default: 0.001). |
epsilon |
'epsilon' in the insensitive-loss function (default: 0.1). |
transformation |
transform the residuals of 'svm' to normalise the data; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for 'vecv' or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'svm' and 'krige'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only
This function is largely based on 'rfcv' in 'randomForest', 'krigecv' in 'spm2'and 'svm' in 'e1071'.
Jin Li
Li, J., Potter, A., Huang, Z., and Heap, A. (2012). Predicting Seabed Sand Content across the Australian Margin Using Machine Learning and Geostatistical Methods, Geoscience Australia, Record 2012/48, 115pp.
Li, J., Heap, A., Potter, A., and Danilel, J.J. (2011). Predicting Seabed Mud Content across the Australian Margin II: Performance of Machine Learning Methods and Their Combination with Ordinary Kriging and Inverse Distance Squared, Geoscience Australia, Record 2011/07, 69pp.
David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel and Friedrich Leisch (2020). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-4. https://CRAN.R-project.org/package=e1071.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) set.seed(1234) svmkrigecv1 <- svmkrigecv(formula.svm = model, longlat = longlat, trainxy = gravel, y = y, transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, validation = "CV", predacc = "ALL") svmkrigecv1 # svmok for count data data(sponge2) model <- species.richness ~ . # use all predictive variables in the dataset longlat <- sponge2[, 1:2] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { svmkrigecv1 <- svmkrigecv(formula.svm = model, longlat = longlat, trainxy = sponge2[, -4], y = sponge2[, 3], gamma = 0.01, cost = 3.5, scale = TRUE, formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- svmkrigecv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for svm", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) set.seed(1234) svmkrigecv1 <- svmkrigecv(formula.svm = model, longlat = longlat, trainxy = gravel, y = y, transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, validation = "CV", predacc = "ALL") svmkrigecv1 # svmok for count data data(sponge2) model <- species.richness ~ . # use all predictive variables in the dataset longlat <- sponge2[, 1:2] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { svmkrigecv1 <- svmkrigecv(formula.svm = model, longlat = longlat, trainxy = sponge2[, -4], y = sponge2[, 3], gamma = 0.01, cost = 3.5, scale = TRUE, formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- svmkrigecv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for svm", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is a cross validation function for 38 hybrid methods of 'svm', 'kriging' and 'IDW', including the average of 'svmkrige' and 'svmidw' ('svmkrigesvmidw') and the average of 'svm', 'svmkrige' and 'svmidw' ('svmsvmkrigesvmidw'), where 'kriging' methods include ordinary kriging ('OK'), simple kriging ('SK'), block 'OK' ('BOK') and block 'SK'('BSK') and 'IDW' also covers 'NN' and 'KNN'.. The data splitting is based on a stratified random sampling method (see the 'datasplit' function for details).
svmkrigeidwcv( formula.svm = NULL, longlat, trainxy, y, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )svmkrigeidwcv( formula.svm = NULL, longlat, trainxy, y, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
formula.svm |
a formula defining the response variable and predictive variables for 'svm'. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be named as 'long' and 'lat'. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
scale |
A logical vector indicating the variables to be scaled (default: TRUE). |
type |
the default setting is 'NULL'. See '?svm' for various options. |
kernel |
the default setting is 'radial'. See '?svm' for other options. |
degree |
a parameter needed for kernel of type polynomial (default: 3). |
gamma |
a parameter needed for all 'kernels' except 'linear' (default: 1/(data dimension)). |
coef0 |
a parameter needed for kernels of type 'polynomial' and 'sigmoid'(default: 0). |
cost |
cost of constraints violation (default: 1). |
nu |
a parameter needed for 'nu-classification', 'nu-regression', and 'one-classification' (default: 0.5). |
tolerance |
tolerance of termination criterion (default: 0.001). |
epsilon |
'epsilon' in the insensitive-loss function (default: 0.1). |
transformation |
transform the residuals of 'svm' to normalise the data for 'krige'; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
hybrid.parameter |
the default is 2 that is for 'svmkrigesvmidw'; for 'svmsvmkrigesvmidw', it needs to be 3. |
lambda |
ranging from 0 to 2; the default is 1 for 'svmkrigesvmidw' and 'svmsvmkrigesvmidw'; and if it is < 1, more weight is placed on 'krige', otherwise more weight is placed on 'idw'; and if it is 0, 'idw' is not considered and the resultant methods is 'svmkrige' when the default 'hybrid.parameter' is used; and if it is 2, then the resultant method is 'svmidw' when the default 'hybrid.parameter' is used. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'svm', 'krige' and 'gstat'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only
This function is largely based on 'rfcv' in 'randomForest', 'krigecv' in 'spm2'and 'svm' in 'e1071'.
Jin Li
Li, J. (2022). Spatial Predictive Modeling with R. Boca Raton, Chapman and Hall/CRC.
Li, J., Potter, A., Huang, Z., and Heap, A. (2012). Predicting Seabed Sand Content across the Australian Margin Using Machine Learning and Geostatistical Methods, Geoscience Australia, Record 2012/48, 115pp.
Li, J., Heap, A., Potter, A., and Danilel, J.J. (2011). Predicting Seabed Mud Content across the Australian Margin II: Performance of Machine Learning Methods and Their Combination with Ordinary Kriging and Inverse Distance Squared, Geoscience Australia, Record 2011/07, 69pp.
David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel and Friedrich Leisch (2020). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-4. https://CRAN.R-project.org/package=e1071.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) # svmokglidw data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) set.seed(1234) svmkrigesvmidwcv1 <- svmkrigeidwcv(formula.svm = model, longlat = longlat, trainxy = gravel, y = y, transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "ALL") svmkrigesvmidwcv1 # svmsvmoksvmidw data(sponge2) model <- species.richness ~ . # use all predictive variables in the dataset longlat <- sponge2[, 1:2] y = sponge[, 3] set.seed(1234) svmsvmkrigesvmidwcv1 <- svmkrigeidwcv(formula.svm = model, longlat = longlat, trainxy = sponge2[, -4], y = y, gamma = 0.01, cost = 3.5, scale = TRUE, formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 3, validation = "CV", predacc = "ALL") svmsvmkrigesvmidwcv1 # svmoksvmidw for count data data(sponge2) model <- species.richness ~ . # use all predictive variables in the dataset longlat <- sponge2[, 1:2] y = sponge2[, 3] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { svmkrigesvmidwcv1 <- svmkrigeidwcv(formula.svm = model, longlat = longlat, trainxy = sponge2[, -4], y = y, gamma = 0.01, cost = 3.5, scale = TRUE, formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- svmkrigesvmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for svm", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(spm) # svmokglidw data(petrel) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) set.seed(1234) svmkrigesvmidwcv1 <- svmkrigeidwcv(formula.svm = model, longlat = longlat, trainxy = gravel, y = y, transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "ALL") svmkrigesvmidwcv1 # svmsvmoksvmidw data(sponge2) model <- species.richness ~ . # use all predictive variables in the dataset longlat <- sponge2[, 1:2] y = sponge[, 3] set.seed(1234) svmsvmkrigesvmidwcv1 <- svmkrigeidwcv(formula.svm = model, longlat = longlat, trainxy = sponge2[, -4], y = y, gamma = 0.01, cost = 3.5, scale = TRUE, formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 3, validation = "CV", predacc = "ALL") svmsvmkrigesvmidwcv1 # svmoksvmidw for count data data(sponge2) model <- species.richness ~ . # use all predictive variables in the dataset longlat <- sponge2[, 1:2] y = sponge2[, 3] set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { svmkrigesvmidwcv1 <- svmkrigeidwcv(formula.svm = model, longlat = longlat, trainxy = sponge2[, -4], y = y, gamma = 0.01, cost = 3.5, scale = TRUE, formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV", predacc = "VEcv") VEcv [i] <- svmkrigesvmidwcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for svm", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)
This function is for generating spatial predictions using the hybrid methods of 'svm', 'kriging' and 'IDW', including all methods implemented in 'svmkrigeidwcv'.
svmkrigeidwpred( formula.svm = NULL, longlat, trainxy, predx, y, longlatpredx, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, ... )svmkrigeidwpred( formula.svm = NULL, longlat, trainxy, predx, y, longlatpredx, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 2, lambda = 1, ... )
formula.svm |
a formula defining the response variable and predictive variables for 'svm'. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted. |
scale |
A logical vector indicating the variables to be scaled (default: TRUE). |
type |
the default setting is 'NULL'. See '?svm' for various options. |
kernel |
the default setting is 'radial'. See '?svm' for other options. |
degree |
a parameter needed for kernel of type polynomial (default: 3). |
gamma |
a parameter needed for all 'kernels' except 'linear' (default: 1/(data dimension)). |
coef0 |
a parameter needed for kernels of type 'polynomial' and 'sigmoid'(default: 0). |
cost |
cost of constraints violation (default: 1). |
nu |
a parameter needed for 'nu-classification', 'nu-regression', and 'one-classification' (default: 0.5). |
tolerance |
tolerance of termination criterion (default: 0.001). |
epsilon |
'epsilon' in the insensitive-loss function (default: 0.1). See '?svm' for details. |
transformation |
transform the residuals of 'svm' to normalise the data; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
hybrid.parameter |
the default is 2 that is for 'svmkrigesvmidw'; for 'svmsvmkrigesvmidw', it needs to be 3. |
lambda |
ranging from 0 to 2; the default is 1 for 'svmkrigesvmidw' and 'svmsvmkrigesvmidw'; and if it is < 1, more weight is placed on 'krige', otherwise more weight is placed on 'idw'; and if it is 0, 'idw' is not considered and the resultant methods is 'svmkrige' when the default 'hybrid.parameter' is used; and if it is 2, then the resultant method is 'svmidw' when the default 'hybrid.parameter' is used. |
... |
other arguments passed on to 'svm', 'krige' and 'gstat'. |
A dataframe of longitude, latitude, and predictions.
Jin Li
Li, J., Potter, A., Huang, Z., and Heap, A. (2012). Predicting Seabed Sand Content across the Australian Margin Using Machine Learning and Geostatistical Methods, Geoscience Australia, Record 2012/48, 115pp.
Li, J., Heap, A., Potter, A., and Danilel, J.J. (2011). Predicting Seabed Mud Content across the Australian Margin II: Performance of Machine Learning Methods and Their Combination with Ordinary Kriging and Inverse Distance Squared, Geoscience Australia, Record 2011/07, 69pp.
David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel and Friedrich Leisch (2020). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-4. https://CRAN.R-project.org/package=e1071.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) svmkrigeidwpred1 <- svmkrigeidwpred(formula.svm = model, longlat = longlat, trainxy = gravel, predx = petrel.grid, y = y, longlatpredx = petrel.grid[, c(1:2)], formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12) names(svmkrigeidwpred1) # Back transform 'svmkrigeidwpred$predictions' to generate the final predictions svmkrigeidw.predictions <- exp(svmkrigeidwpred1$predictions) - 1 range(svmkrigeidw.predictions)library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) svmkrigeidwpred1 <- svmkrigeidwpred(formula.svm = model, longlat = longlat, trainxy = gravel, predx = petrel.grid, y = y, longlatpredx = petrel.grid[, c(1:2)], formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12) names(svmkrigeidwpred1) # Back transform 'svmkrigeidwpred$predictions' to generate the final predictions svmkrigeidw.predictions <- exp(svmkrigeidwpred1$predictions) - 1 range(svmkrigeidw.predictions)
This function is for generating spatial predictions using the hybrid method of 'svm' and 'krige' (svmkrige).
svmkrigepred( formula.svm = NULL, longlat, trainxy, predx, y, longlatpredx, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, ... )svmkrigepred( formula.svm = NULL, longlat, trainxy, predx, y, longlatpredx, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, transformation = "none", delta = 1, formula.krige = res1 ~ 1, vgm.args = c("Sph"), anis = c(0, 1), alpha = 0, block = 0, beta, nmaxkrige = 12, ... )
formula.svm |
a formula defining the response variable and predictive variables for 'svm'. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted. |
scale |
A logical vector indicating the variables to be scaled (default: TRUE). |
type |
the default setting is 'NULL'. See '?svm' for various options. |
kernel |
the default setting is 'radial'. See '?svm' for other options. |
degree |
a parameter needed for kernel of type polynomial (default: 3). |
gamma |
a parameter needed for all 'kernels' except 'linear' (default: 1/(data dimension)). |
coef0 |
a parameter needed for kernels of type 'polynomial' and 'sigmoid'(default: 0). |
cost |
cost of constraints violation (default: 1). |
nu |
a parameter needed for 'nu-classification', 'nu-regression', and 'one-classification' (default: 0.5). |
tolerance |
tolerance of termination criterion (default: 0.001). |
epsilon |
'epsilon' in the insensitive-loss function (default: 0.1). See '?svm' for details. |
transformation |
transform the residuals of 'svm' to normalise the data; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
... |
other arguments passed on to 'svm' and 'krige'. |
A dataframe of longitude, latitude, and predictions.
Jin Li
Li, J., Potter, A., Huang, Z., and Heap, A. (2012). Predicting Seabed Sand Content across the Australian Margin Using Machine Learning and Geostatistical Methods, Geoscience Australia, Record 2012/48, 115pp.
Li, J., Heap, A., Potter, A., and Danilel, J.J. (2011). Predicting Seabed Mud Content across the Australian Margin II: Performance of Machine Learning Methods and Their Combination with Ordinary Kriging and Inverse Distance Squared, Geoscience Australia, Record 2011/07, 69pp.
David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel and Friedrich Leisch (2020). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-4. https://CRAN.R-project.org/package=e1071.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) svmkrigepred1 <- svmkrigepred(formula.svm = model, longlat = longlat, trainxy = gravel, predx = petrel.grid, y = y, longlatpredx = petrel.grid[, c(1:2)], transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12) names(svmkrigepred1) # Back transform 'svmkrigepred$predictions' to generate the final predictions svmkrige.predictions <- exp(svmkrigepred1$predictions) - 1 range(svmkrige.predictions)library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] longlat <- petrel[, c(1, 2)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) y <- log(gravel[, 7] +1) svmkrigepred1 <- svmkrigepred(formula.svm = model, longlat = longlat, trainxy = gravel, predx = petrel.grid, y = y, longlatpredx = petrel.grid[, c(1:2)], transformation = "none", formula.krige = res1 ~ 1, vgm.args = "Sph", nmaxkrige = 12) names(svmkrigepred1) # Back transform 'svmkrigepred$predictions' to generate the final predictions svmkrige.predictions <- exp(svmkrigepred1$predictions) - 1 range(svmkrige.predictions)
This function is for generating spatial predictions using 'svm' method in 'e1071' package.
svmpred( formula = NULL, trainxy, longlatpredx, predx, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, ... )svmpred( formula = NULL, trainxy, longlatpredx, predx, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = if (is.vector(trainxy)) 1 else 1/ncol(trainxy), coef0 = 0, cost = 1, nu = 0.5, tolerance = 0.001, epsilon = 0.1, ... )
formula |
a formula defining the response variable and predictive variables. |
trainxy |
a dataframe contains predictive variables and the response variable of point samples. The location information, longitude (long), latitude (lat), need to be included in the 'trainx' for spatial predictive modeling, need to be named as 'long' and 'lat'. |
longlatpredx |
a dataframe contains longitude and latitude of point locations (i.e., the centers of grids) to be predicted, need to be named as 'long' and 'lat'. |
predx |
a dataframe or matrix contains columns of predictive variables for the grids to be predicted. |
scale |
A logical vector indicating the variables to be scaled (default: TRUE). |
type |
the default setting is 'NULL'. See '?svm' for various options. |
kernel |
the default setting is 'radial'. See '?svm' for other options. |
degree |
a parameter needed for kernel of type polynomial (default: 3). |
gamma |
a parameter needed for all 'kernels' except 'linear' (default: 1/(data dimension)). |
coef0 |
a parameter needed for kernels of type 'polynomial' and 'sigmoid'(default: 0). |
cost |
cost of constraints violation (default: 1). |
nu |
a parameter needed for 'nu-classification', 'nu-regression', and 'one-classification' (default: 0.5). |
tolerance |
tolerance of termination criterion (default: 0.001). |
epsilon |
'epsilon' in the insensitive-loss function (default: 0.1). See '?svm' for details. |
... |
other arguments passed on to 'svm'. |
A dataframe of longitude, latitude and predictions.
Jin Li
David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel and Friedrich Leisch (2020). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-4. https://CRAN.R-project.org/package=e1071.
library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) svmpred1 <- svmpred(formula = model, trainxy = gravel, longlatpredx = petrel.grid[, c(1:2)], predx = petrel.grid) names(svmpred1) # Back transform 'svmpred1$pred.svm1' to generate the final predictions svm.predictions <- exp(svmpred1$pred.svm1) - 1 range(svm.predictions)library(spm) data(petrel) data(petrel.grid) gravel <- petrel[, c(1, 2, 6:9, 5)] model <- log(gravel + 1) ~ lat + bathy + I(long^3) + I(lat^2) + I(lat^3) svmpred1 <- svmpred(formula = model, trainxy = gravel, longlatpredx = petrel.grid[, c(1:2)], predx = petrel.grid) names(svmpred1) # Back transform 'svmpred1$pred.svm1' to generate the final predictions svm.predictions <- exp(svmpred1$pred.svm1) - 1 range(svm.predictions)
This function is a cross validation function for 'Tps' method in 'fields' package.
tpscv( trainx, trainy, m = NULL, p = NULL, theta = 3, lambda = NULL, lon.lat = TRUE, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )tpscv( trainx, trainy, m = NULL, p = NULL, theta = 3, lambda = NULL, lon.lat = TRUE, validation = "CV", cv.fold = 10, predacc = "VEcv", ... )
trainx |
a dataframe contains longitude (long), latitude (lat) and predictive variables of point samples. That is, they must be names as 'long' and 'lat'. |
trainy |
a vector of response, must have length equal to the number of rows in trainx. |
m |
A polynomial function of degree (m-1) will be included in the model as the drift (or spatial trend) component. Default is 'm = NULL' that is the value such that 2m-d is greater than zero where d is the dimension of x. |
p |
polynomial power for Wendland radial basis functions as in 'Tps'. 'p = NULL' that leads to a default value of 2 for spatial predictive modelling based on 'x' containing only the location information. |
theta |
the tapering range. 'theta = 3' degrees is a very generous taper range. For spatial predictive modeling the taper should be large enough to about 20 non zero nearest neighbors for every location. |
lambda |
smoothing parameter, the default is 'NULL'. See '?Tps' for further info. |
lon.lat |
if 'TRUE' locations are interpreted as longitude and latitude and great circle distance is used to find distances among locations. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'gstat'. |
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only
This function is largely based on 'krigecv' in this package and 'Tps' and 'fastTpsMLE' in 'fields' package.
Jin Li
Douglas Nychka and Reinhard Furrer and John Paige and Stephan Sain and Florian Gerber and Matthew Iverson, 2020. fields: Tools for Spatial Data, R package version 10.3 https://CRAN.R-project.org/package=fields.
library(fields) library(spm) data(petrel) tpscv1 <- tpscv(petrel[, c(1,2)], petrel[, 5], cv.fold = 5, predacc = "VEcv") tpscv1 tpscv1 <- tpscv(petrel[, c(1,2)], petrel[, 5], lambda = 0.9, cv.fold = 5, predacc = "VEcv") tpscv1 tpscv1 <- tpscv(petrel[, c(1,2)], petrel[, 5], validation = "LOO", predacc = "VEcv") tpscv1 set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { tpscv1 <- tpscv(petrel[, c(1,2)], petrel[, 5], cv.fold = 10, lambda = 0.13, predacc = "VEcv") VEcv [i] <- tpscv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for TPS", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) n <- 20 # number of iterations,60 to 100 is recommended. # set.seed(1234) VEcv <- NULL for (i in 1:n) { set.seed(1234 + i) # set random seed for each iteration. You can remove # this line and use above set.seed(1234) and see what you can get. tpscv1 <- tpscv(petrel[, c(1,2)], petrel[, 5], predacc = "VEcv") VEcv [i] <- tpscv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for TPS", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)library(fields) library(spm) data(petrel) tpscv1 <- tpscv(petrel[, c(1,2)], petrel[, 5], cv.fold = 5, predacc = "VEcv") tpscv1 tpscv1 <- tpscv(petrel[, c(1,2)], petrel[, 5], lambda = 0.9, cv.fold = 5, predacc = "VEcv") tpscv1 tpscv1 <- tpscv(petrel[, c(1,2)], petrel[, 5], validation = "LOO", predacc = "VEcv") tpscv1 set.seed(1234) n <- 20 # number of iterations,60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { tpscv1 <- tpscv(petrel[, c(1,2)], petrel[, 5], cv.fold = 10, lambda = 0.13, predacc = "VEcv") VEcv [i] <- tpscv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for TPS", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) n <- 20 # number of iterations,60 to 100 is recommended. # set.seed(1234) VEcv <- NULL for (i in 1:n) { set.seed(1234 + i) # set random seed for each iteration. You can remove # this line and use above set.seed(1234) and see what you can get. tpscv1 <- tpscv(petrel[, c(1,2)], petrel[, 5], predacc = "VEcv") VEcv [i] <- tpscv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for TPS", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2)