Title: | Ordination Methods for the Analysis of Beta-Diversity Indices |
---|---|
Description: | The analysis of different aspects of biodiversity requires specific algorithms. For example, in regionalisation analyses, the high frequency of ties and zero values in dissimilarity matrices produced by Beta-diversity turnover produces hierarchical cluster dendrograms whose topology and bootstrap supports are affected by the order of rows in the original matrix. Moreover, visualisation of biogeographical regionalisation can be facilitated by a combination of hierarchical clustering and multi-dimensional scaling. The recluster package provides robust techniques to visualise and analyse pattern of biodiversity and to improve occurrence data for cryptic taxa. |
Authors: | Leonardo Dapporto, Matteo Ramazzotti, Simone Fattorini, Roger Vila, Gerard Talavera, Roger H.L. Dennis |
Maintainer: | Leonardo Dapporto <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 3.3 |
Built: | 2025-02-07 06:29:00 UTC |
Source: | https://github.com/leondap/recluster |
biodecrypt uses the function ahull from package alphahull to construct concave hulls with custom concavity (alpha) for each taxon. This function can also remove sea or ground areas from the analysis based on a SpatialPolygonsDataFrame representing the area of interest. The main input is represented by: i) a matrix of longitude and latitude (decimal degrees of longitude and latitude, WGS84) for all occurrence records, ii) a vector indicating species membership of each record in the same order of the matrix (1,2..n for known species and 0 for cases to be attributed). By using spatial coordinates, the list of identified records and alpha values, biodecrypt computes a concave hull for each species based on known records. Then, the function attempts to attribute unknown cases to their most likely species based on the comparison of hull localtion, geometry and the location of occurrence data (see details).
biodecrypt(mat, id, alpha = NULL, ratio = 2.5, buffer = 90, polygon=NULL, checkdist = T, minimum = 7, plot=T, map = NULL, xlim = NULL, ylim = NULL, main = NULL)
biodecrypt(mat, id, alpha = NULL, ratio = 2.5, buffer = 90, polygon=NULL, checkdist = T, minimum = 7, plot=T, map = NULL, xlim = NULL, ylim = NULL, main = NULL)
mat |
A matrix for longitude and latitude (in decimal degrees) for all records. |
id |
A vector indicating species membership of each record (in the same order of mat). Identified records are indicated with 1,2..n, unidentified records with 0. |
alpha |
A vector indicating an initial alpha value for each species. If NULL, the default value of 8 for all species is used. |
ratio |
The minimum ratio between the distance from the second distant hull compared to the closest hull to allow attribution. Default 2.5 times. |
buffer |
A distance buffer from hulls (metres). |
polygon |
A SpatialPolygonsDataFrame with area of interests (ground or sea). Typically obtained from Natural Earth (https://www.naturalearthdata.com/). If NULL no removal is applied |
checkdist |
Logical, if TRUE cases attributed to a given species based on relative distance from hulls but closer to an identified record of another species are not attributed. |
minimum |
The minimum number of specimens required to build alpha hulls. If the number of identified specimens is lower, convex hulls are calculated to improve procedure stability. |
map |
A map to be plotted during the procedure to show the separation progress. |
plot |
Flag to FALSE is plotting the result is not required |
xlim |
Longitude boudaries for the map. |
ylim |
latitude boudaries for the map. |
main |
The name to be plotted on the graph |
Once the hulls for species are drawn according to the distribution of known records, each unidentified record could be either: i) inside more than one hull, ii) inside a single hull, or iii) outside all hulls. - Cases inside more than one hull In this case, the function cannot attribute the unidentified records to a species and only the a priori identified records belonging to intersection areas are passed to the final vector as identified. - Cases inside a single hull The unidentified records falling inside a single hull are attributed to that species if their distance to any other hull is higher than a buffer value provided by the user. Unidentified records inside the buffer of another hull are not attributed. - Cases outside all hulls The unidentified records which do not fall inside any hull are attributed to the closest hull if: i) the distance from the second nearest hull is higher than the buffer and if ii) the ratio between the minimum distance to the second closest hull and to the closest hull is more than a value indicated by the user (ratio). - Check for distances from the nearest identified record As described above, the attribution of unknown records is strictly determined by the distance from the hulls. The biodecrypt function also contains an option (checkdist=T) to check if cases attributed to a given species based on relative distance from hulls are closer to an identified record of another species, which may occasionally occur. If this option is selected (as in the default settings) these cases are not attributed to any species. In general, alpha values <5 return small and fragmented hulls with low predictive power; moreover, depending on the topology of occurrence data, some low alpha values could return no hulls. In this cases, biodecrypt tries to increase alpha up the the lowest value for which a hull is obtained. The alpha values used is stored in the alphaused vector.
type |
"sep" an argument to be passed to biodecrypt.plot. |
areas |
The areas of hulls for all the species (in km squares). |
intersections |
The areas of intersections among hulls for each pair of species. |
sympatry |
The fraction of the overlap area compared to the total area of the two hulls. |
NUR |
The percentage of Non-attributed Unidentified Records (NUR). |
table |
The result table with Longitude and Latitude for each occurrence datum, its id after the biodecrypt procedure (id2, the result of the procedure) and its initial id (id). |
hulls |
The hulls in sf format. |
hullspl |
The hulls in alphahull format. |
alphaused |
The values of alpha used fopr each species (see details). |
Leonardo Dapporto
Platania L. et al. Assigning occurrence data to cryptic taxa improves climatic niche assessments: biodecrypt, a new tool tested on European butterflies. Glocal Ecology and Biogeography (2020)
n1<-7 n2<-7 mat<-rbind(cbind(rnorm(n = n1, mean = 1, sd = 2),rnorm(n = n1, mean = 40, sd = 2)),cbind(rnorm(n = n2, mean = 7, sd = 2),rnorm(n = n2, mean = 45, sd = 2))) id<-c(rep(1,n1),rep(2,n2)) id[sample(c(1:(n1+n2)))[1:round((n1+n2)/4,0)]]<-0 # Make the separation with custom parameters attribution<-biodecrypt(mat,id, alpha=c(10,10)) #plot the results plot(mat,type="n") biodecrypt.plot(attribution) #Group plots into pies biodecrypt.plot(attribution, square=2, minsize=0.5) # Make the separation with custom parameters # With a lower fraction values the first hull (alpha equal to 1) can become more #concave. Excluded dots works as a punctiform sub-hull in the attribution. attribution<-biodecrypt(mat,id, alpha=c(1,5), buffer=20, ratio=2, minimum=5) #plot the results plot(mat,type="n") biodecrypt.plot(attribution) {record attribution}
n1<-7 n2<-7 mat<-rbind(cbind(rnorm(n = n1, mean = 1, sd = 2),rnorm(n = n1, mean = 40, sd = 2)),cbind(rnorm(n = n2, mean = 7, sd = 2),rnorm(n = n2, mean = 45, sd = 2))) id<-c(rep(1,n1),rep(2,n2)) id[sample(c(1:(n1+n2)))[1:round((n1+n2)/4,0)]]<-0 # Make the separation with custom parameters attribution<-biodecrypt(mat,id, alpha=c(10,10)) #plot the results plot(mat,type="n") biodecrypt.plot(attribution) #Group plots into pies biodecrypt.plot(attribution, square=2, minsize=0.5) # Make the separation with custom parameters # With a lower fraction values the first hull (alpha equal to 1) can become more #concave. Excluded dots works as a punctiform sub-hull in the attribution. attribution<-biodecrypt(mat,id, alpha=c(1,5), buffer=20, ratio=2, minimum=5) #plot the results plot(mat,type="n") biodecrypt.plot(attribution) {record attribution}
The function biodecrypt.cross wraps the biodecrypt function to carry out cross-validation of known cases thus verifying the robustness of the attribution of unknown cases. This function requires the same input of biodecrypt (coordinates and vector with attribution together with values of distance ratio, buffer and alpha). Moreover this function requires a "runs" value defining the number of different runs, thus the fraction of test records included in each run. In each run, randomly selected group of test records (actually identified to a given species) are regarded as unidentified (0 value) and the biodecrypt function is carried out to attribute them. The analysis is repeated as often as defined in runs (a runs value of 10 will perform a ten-fold cross-validation based on the initial selection of ten randomly distributed subsets).
biodecrypt.cross(mat,id,alpha=NULL,ratio=2.5,buffer=90,fraction=0.95, partCount = 10, checkdist=T, clipToCoast="terrestrial", proj = "+proj=longlat +datum=WGS84",minimum=7, map=NULL,xlim=NULL,ylim=NULL,main=NULL,runs=10,test=T)
biodecrypt.cross(mat,id,alpha=NULL,ratio=2.5,buffer=90,fraction=0.95, partCount = 10, checkdist=T, clipToCoast="terrestrial", proj = "+proj=longlat +datum=WGS84",minimum=7, map=NULL,xlim=NULL,ylim=NULL,main=NULL,runs=10,test=T)
mat |
A matrix for longitude and latitude (in decimal degrees) for all records. |
id |
A vector indicating species membership of each record (in the same order of mat). Identified records are indicated with 1,2..n, unidentified records with 0. |
alpha |
A vector indicating an initial alpha value for each species. If NULL, the default value of 8 for all species is used. |
ratio |
The minimum ratio between the distance from the second distant hull compared to the closest hull to allow attribution. Default 2.5 times. |
buffer |
A distance buffer from hulls(in km). |
fraction |
The minimum fraction of occurrences that must be included in polygon. |
partCount |
The maximum number of disjunct polygons that are allowed.. |
clipToCoast |
Either "no" (no clipping), "terrestrial" (only terrestrial part of range is kept) or "aquatic" (only non-terrestrial part is clipped). |
checkdist |
Logical, if TRUE cases attributed to a given species based on relative distance from hulls but closer to an identified record of another species are not attributed. |
proj |
the projection information for mat. In this version, the default is the only supported option. |
minimum |
The minimum number of specimens required to build alpha hulls. If the number of identified specimens is lower, convex hulls are calculated to improve procedure stability. |
map |
A map to be plotted during the procedure to show the separation progress. |
xlim |
Longitude boudaries for the map. |
ylim |
latitude boudaries for the map. |
main |
The name to be plotted on the graph |
runs |
The number of runs among which the cases are randomly assigned as non-attributed records |
test |
A logical, if TRUE, a biodecrypt analysis is also carried out to compute NUR. |
The procedure attributes the subsets of identified records to the test group (unknown cases) as evenly as possible among runs both in terms of total number of test records and records belonging to the same original species. If the number of runs equates the number of records, then each identified record is individually attributed in a jackknife procedure. Subsequently, the attribution vector obtained is provided and compared with the original membership and two values are provided: the percentages of identified cases attributed to a wrong species (Mis-Identified Records, MIR) and the percentage of known cases not attributed to any species (Non-attributed Identified Records, NIR). The function also has an option to calculate the percentage of Non-attributed Unidentified Records (NUR) representing the fraction of unknown records that could not be attributed to a species after a typical biodecrypt analysis using the parameters provided by the user and the complete set of records.
type |
"cross" an argument to be passed to biodecrypt.plot. |
NUR |
The percentage of Non-attributed Unidentified Records. |
areas |
The hull areas for all the species (in km squares). |
intersections |
The areas of intersections among hulls for each pair of species. |
sympatry |
The fraction of the overlap area compared to the total area of the two hulls. |
table |
The result table of the test (if test=TRUE) with Longitude and Latitude for each occurrence datum, its id after the biodecrypt procedure (id2) and its initial id (id). |
cross |
The result table with the original attribution (original), the attribution obtained after cross validation (predicted) and the classification as MIR or NIR. Longitude and Latitude are also provided. |
MIR |
The percentage of Mis-Identified Records. |
NIR |
The percentage of Non-Identified Records. |
Leonardo Dapporto
Platania L. et al. Assigning occurrence data to cryptic taxa improves climatic niche assessments: biodecrypt, a new tool tested on European butterflies. Glocal Ecology and Biogeography (2020).
## Not Run ## Create an example for a dataset #mat<-rbind(cbind(rnorm(n = 20, mean = 1, sd = 4),rnorm(n = 20, mean = 40, sd = 3)), #cbind(rnorm(n = 20, mean = 7, sd = 5),rnorm(n = 20, mean = 45, sd = 2)) #id<-c(rep(1,20),rep(2,20)) #id[sample(c(1:40))[1:10]]<-0 #cross<-biodecrypt.cross(mat,id) #plot(mat,type="n") #biodecrypt.plot(cross)
## Not Run ## Create an example for a dataset #mat<-rbind(cbind(rnorm(n = 20, mean = 1, sd = 4),rnorm(n = 20, mean = 40, sd = 3)), #cbind(rnorm(n = 20, mean = 7, sd = 5),rnorm(n = 20, mean = 45, sd = 2)) #id<-c(rep(1,20),rep(2,20)) #id[sample(c(1:40))[1:10]]<-0 #cross<-biodecrypt.cross(mat,id) #plot(mat,type="n") #biodecrypt.plot(cross)
The function biodecrypt.optimise analyses the output of biodecrypt.wrap. By default, a combination of MIR^2+NIR+NUR is used as a penalty value for the different combinations of the parameters (providing a higher importance to MIR). The exponents can be changed by the user. Since the method showing the lowest penalty in cross-validation might not necessarily be the optimal value for the final analysis, all the combinations showing a penalty value not higher than a certain threshold compared with the analysis showing the lowest penalty should be considered as similarly good. We provided a value of 10 as a default, representing a variation of about 3 for each addendum of the penalty. The optimal parameters can then be calculated as mean values of distance ratio, alpha and buffer among those used in these cross-validation analyses, weighted by 1/penalty in order to provide an increasing contribution to the solutions showing the lowest penalty values.
biodecrypt.optimise(tab,coef=c(2,1,1), penalty=10)
biodecrypt.optimise(tab,coef=c(2,1,1), penalty=10)
tab |
A matrix obtained with biodecrypt.wrap. |
coef |
The three exponents to be applied to MIR, NIR and NUR, respectively, to calculate the penalties. |
penalty |
The penalty threshold for inclusion in the calculation. |
ratio |
The optimized ratio value. |
buffer |
The optimized buffer value. |
alpha |
The optimized alpha value. |
MIR |
The weighted average MIR among selected combinations. |
NIR |
The weighted average MIR among selected combinations. |
NUR |
The weighted average MIR among selected combinations. |
Leonardo Dapporto
Platania L. et al. Assigning occurrence data to cryptic taxa improves climatic niche assessments: biodecrypt, a new tool tested on European butterflies. Glocal Ecology and Biogeography (2020).
#See the example provided in biodecrypt.wrap
#See the example provided in biodecrypt.wrap
The function plots the results of biodecrypt and biodecrypt.cross analyses. It provides plot with circles with different colours to identify different kinds of records. Records known a priori can be distinguished in the plot from records attributed by biodecrypt as likely belonging to a given species or as NUR (or MIR and NIR in biodecrypt.cross).
biodecrypt.plot(x,minsize=0.3,pchid=1,cexid=0.1,square=0.001,col=c("red","darkgreen", "blue","purple"), attributed=c("fade","points"), NUR="black", fading=50, ... )
biodecrypt.plot(x,minsize=0.3,pchid=1,cexid=0.1,square=0.001,col=c("red","darkgreen", "blue","purple"), attributed=c("fade","points"), NUR="black", fading=50, ... )
x |
An object obtained by biodecrypt or biodecrypt.cross |
minsize |
The size of the dots to be plotted. |
pchid |
The pch of the points marking known cases in case when attributed="points". |
cexid |
The size of the points marking known cases in case when attributed="points". |
square |
The size of square grid to which occurrence are collapsed and organized in pies. If the value is lower than data resolution then records are not grouped in pies. |
col |
The colours to be attributed to species: 1...n. |
attributed |
The method to plot known records. Using attributed="fade" will make the attributed dots paler than known cases based on fading (see below). Using attributed="points" will plot a balck dot to distinguish known cases. |
NUR |
The colour for NUR records after biodecrypt. |
fading |
The degree of fading for the colours of records attributed by biodecrypt if attributed="fading" (100 makes the points white). |
... |
other parameters of the default plot |
The function adds dots to a previous plot (usually a map). The records with a priori known attribution (1...n in id) are marked with a point inside the dots ( attributed="points") or by fading the colour of the dots for the records that have been attributed by biodecrypt (attributed="fading"). In the results of biodecrypt.cross, MIR are represented as black dots and NIR as white dots. For biodecrypt black default colour for NUR can be changed.
a plot
Leonardo Dapporto
Platania L. et al. Assigning occurrence data to cryptic taxa improves climatic niche assessments: biodecrypt, a new tool tested on European butterflies. Glocal Ecology and Biogeography (2020).
#See examples in biodecrypt and biodecrypt.cross
#See examples in biodecrypt and biodecrypt.cross
biodecrypt.view is no longer supported. Please use biodecrypt followed by biodecrypt.plot to achieve the same result.
The function biodecrypt.wrap wraps the biodecrypt.cross analysis by using all possible combinations of a series of distance ratio, alpha and buffer values to compare their resulting MIR, NIR and NUR.
biodecrypt.wrap(mat,id,alpha=c(1,5,10,15),alphamat=NULL,ratio=c(2,3,4,5), buffer=c(0,40,80,120,160),fraction=0.95, partCount=10, checkdist=T, clipToCoast="terrestrial", proj="+proj=longlat +datum=WGS84", minimum=7, map=NULL,xlim=NULL,ylim=NULL,main=NULL,save=T,name="res_cross.txt",runs=10)
biodecrypt.wrap(mat,id,alpha=c(1,5,10,15),alphamat=NULL,ratio=c(2,3,4,5), buffer=c(0,40,80,120,160),fraction=0.95, partCount=10, checkdist=T, clipToCoast="terrestrial", proj="+proj=longlat +datum=WGS84", minimum=7, map=NULL,xlim=NULL,ylim=NULL,main=NULL,save=T,name="res_cross.txt",runs=10)
mat |
A matrix for longitude and latitude (in decimal degrees) for all records. |
id |
A vector indicating species membership of each record (in the same order of mat). Identified records are indicated with 1,2..n, unidentified records with 0. |
alpha |
A vector indicating the initial alpha values. It will be the same for all species |
alphamat |
A matrix indicating different alpha values for different species (optional). |
ratio |
The values of ratio. |
buffer |
The values of buffer. |
fraction |
The minimum fraction of occurrences that must be included in polygon. |
partCount |
The maximum number of disjunct polygons that are allowed.. |
checkdist |
Logical, if TRUE cases attributed to a given species based on relative distance from hulls but closer to an identified record of another species are not attributed. |
clipToCoast |
Either "no" (no clipping), "terrestrial" (only terrestrial part of range is kept) or "aquatic" (only non-terrestrial part is clipped). |
proj |
the projection information for mat. In this version, the default is the only supported option. |
minimum |
The minimum number of specimens required to build alpha hulls. If the number of identified specimens is lower, convex hulls are calculated to improve procedure stability. |
map |
A map to be plotted during the procedure to show the separation progress. |
xlim |
Longitude boudaries for the map. |
ylim |
latitude boudaries for the map. |
main |
The name to be plotted on the graph |
save |
Logical, if TRUE a result table is saved after each biodecrypt.cross run |
name |
The name of the saved file |
runs |
The number of runs among which the cases are randomly assigned as non-attributed records |
The resulting table can be passed to biodecrypt.optimise to compute the best combination of alpha, buffer and ratio.
table |
The result table indicating for each cross validation test the MIR, NIR and NUR values together with the used ratio, buffer and alpha values. |
Leonardo Dapporto
Platania L. et al. Assigning occurrence data to cryptic taxa improves climatic niche assessments: biodecrypt, a new tool tested on European butterflies. Glocal Ecology and Biogeography (2020).
# Create an example for a dataset mat<-rbind(cbind(rnorm(n = 20, mean = 1, sd = 4),rnorm(n = 20, mean = 40, sd = 3)), cbind(rnorm(n = 20, mean = 7, sd = 5),rnorm(n = 20, mean = 45, sd = 2))) id<-c(rep(1,20),rep(2,20)) id[sample(c(1:40))[1:10]]<-0 ## Not run: wrap_data_fast<-biodecrypt.wrap(mat,id, alpha=c(1,4), ratio=2, buffer=20, runs=2) ## End(Not run) ## Not run: parameters<-biodecrypt.optimise(wrap_data_fast$table,penalty=10) #Make the example with default 10 runs and more values ## Not run: wrap_data<-biodecrypt.wrap(mat,id, alpha=c(1,4), ratio=c(2,4), buffer=c(20,50)) ## End(Not run) ## Not run: parameters<-biodecrypt.optimise(wrap_data$table) #inspect the optimised parameters ## Not run: parameters #Use different alpha for the two species #alpha for first ## Not run: alpha1<-c(1,3) #alpha for second ## Not run: alpha2<-c(1,5) ## Not run: alphamat<-cbind(alpha1,alpha2) ## Not run: wrap_data<-biodecrypt.wrap(mat,id, alphamat=alphamat, ratio=c(2,4), buffer=c(20,50)) ## End(Not run) ## Not run: parameters<-biodecrypt.optimise(wrap_data$table, penalty=20) #inspect the optimised parameters ## Not run: parameters
# Create an example for a dataset mat<-rbind(cbind(rnorm(n = 20, mean = 1, sd = 4),rnorm(n = 20, mean = 40, sd = 3)), cbind(rnorm(n = 20, mean = 7, sd = 5),rnorm(n = 20, mean = 45, sd = 2))) id<-c(rep(1,20),rep(2,20)) id[sample(c(1:40))[1:10]]<-0 ## Not run: wrap_data_fast<-biodecrypt.wrap(mat,id, alpha=c(1,4), ratio=2, buffer=20, runs=2) ## End(Not run) ## Not run: parameters<-biodecrypt.optimise(wrap_data_fast$table,penalty=10) #Make the example with default 10 runs and more values ## Not run: wrap_data<-biodecrypt.wrap(mat,id, alpha=c(1,4), ratio=c(2,4), buffer=c(20,50)) ## End(Not run) ## Not run: parameters<-biodecrypt.optimise(wrap_data$table) #inspect the optimised parameters ## Not run: parameters #Use different alpha for the two species #alpha for first ## Not run: alpha1<-c(1,3) #alpha for second ## Not run: alpha2<-c(1,5) ## Not run: alphamat<-cbind(alpha1,alpha2) ## Not run: wrap_data<-biodecrypt.wrap(mat,id, alphamat=alphamat, ratio=c(2,4), buffer=c(20,50)) ## End(Not run) ## Not run: parameters<-biodecrypt.optimise(wrap_data$table, penalty=20) #inspect the optimised parameters ## Not run: parameters
This dataset represents occurrence data of butterfly species in 30 West-Mediterranean islands
data(dataisl)
data(dataisl)
A data frame with 30 observations (islands) on 123 binary variables (species).
Leonardo Dapporto and Roger Vila
Dapporto L., Ramazzotti M., Fattorini S., Talavera G., Vila R., Dennis R. "recluster: an unbiased clustering procedure for beta-diversity turnover" Ecography (2013), 36:1070-1075.
This dataset represents a series of virtual faunas in different sites
data(datamod)
data(datamod)
A data frame with 9 observations (sites) on 31 binary variables (species).
Dapporto L., Ramazzotti M., Fattorini S., Talavera G., Vila R., Dennis R. "recluster: an unbiased clustering procedure for beta-diversity turnover" Ecography (2013), 36:1070-1075.
This dataset represents an output for a multiscale bootstrap composed of 30 scales (x1-x30).
data(dataisl)
data(dataisl)
A data frame with 29 nodes (rows) and 30 different scales of bootstrap(columns). NAs values represent collapsed nodes
Leonardo Dapporto
Dapporto L., Ramazzotti M., Fattorini S., Talavera G., Vila R., Dennis R. "recluster: an unbiased clustering procedure for beta-diversity turnover" Ecography (2013), 36:1070-1075.
The analysis of different aspects of biodiversity requires specific algorithms. For example, in regionalisation analyses, the high frequency of ties and zero values in dissimilarity matrices produced by Beta-diversity turnover produces hierarchical cluster dendrograms whose topology and bootstrap supports are affected by the order of rows in the original matrix. Moreover, visualisation of biogeographical regionalisation can be facilitated by a combination of hierarchical clustering and multi-dimensional scaling. The recluster package provides robust techniques to visualise and analyse pattern of biodiversity and to improve occurrence data for cryptic taxa.
Package: | recluster |
Type: | Package |
Version: | 3.0 |
Date: | 2020-05-09 |
License: | GPL (>= 2.0) |
Leonardo Dapporto, Matteo Ramazzotti, Simone Fattorini, Roger Vila, Gerard Talavera, Roger H.L. Dennis Maintainer: Leonardo Dapporto <[email protected]>
Dapporto L., Ramazzotti M., Fattorini S., Talavera G., Vila R., Dennis R. "recluster: an unbiased clustering procedure for beta-diversity turnover" Ecography (2013), 36:1070-1075.
Dapporto, L., Fattorini, S., Voda, R., Dinca, V., Vila, R. "Biogeography of western Mediterranean butterflies: combining turnover and nestedness components of faunal dissimilarity." J Biogeogr (2014), 41: 1639-1650.
Dapporto L., Voda R., Dinca V., Vila R. "Comparing population patterns for genetic and morphological markers with uneven sample sizes. An example for the butterfly Maniola jurtina" Methods Ecol Evol (2014), 5, 834-843.
Platania L., Menchetti M. Dinca V., Corbella C., Kay-Lavelle I., Vila R., Wiemers M., Schweiger O., Dapporto L. "Assigning occurrence data to cryptic taxa improves climatic niche assessments: biodecrypt, a new tool tested on European butterflies". Glocal Ecology and Biogeography (2020).
https://github.com/leondap/recluster
#load model data provided with the package ## Not run: data(datamod) #explore zero and tied values in the data set simpdiss<- recluster.dist(datamod) recluster.hist(simpdiss) #create and view unbiased consensus tree (100 constree_full<-recluster.cons(datamod, tr=10, p=1) plot(constree_full$cons,direction="downwards") #compute and view node strength recluster.node.strength(datamod, tr=10) #create and view unbiased consensus tree (50 constree_half<-recluster.cons(datamod, tr=10, p=0.5) plot(constree_half$cons, direction="downwards") #the latter is the correct tree tree<-constree_half$cons #perform and view bootstrap on nodes boot<-recluster.boot(tree, datamod, tr=10, p=0.5, boot=50) recluster.plot(tree,boot) #perform and view multiscale bootstrap on nodes multiboot<- recluster.multi(tree, datamod, tr=10, boot=50, levels=2, step=1) recluster.plot(tree,multiboot,low=1,high=2, direction="downwards") #project and plot a bi-dimensional plot in the RGB colour space sordiss<- recluster.dist(datamod,dist="sorensen") points<-cmdscale(sordiss) col<-recluster.col(points) recluster.plot.col(col) #inspect explained diversity for different cuts of a tree tree<-recluster.cons(datamod, dist="sorensen",tr=10, p=0.5) expl_div<-recluster.expl.diss(tree$cons,sordiss) expl_div #Select cut #4 and group data in RGB space ncol<-recluster.group.col(col,expl_div$matrix[,4]) #Plot mean values for clusters recluster.plot.col(ncol$aggr) #Plot mean colours for sites in the geographic space lat<-c(2,2,2,1,3,1,1,3,3) long<-c(1,5,3,3,3,1,5,1,5) recluster.plot.sites.col(long, lat, ncol$all,text=TRUE) #Use recluster.region procedure on island butterflies data(dataisl) simpson<-recluster.dist(dataisl) turn_cl<-recluster.region(simpson,tr=10,rettree=TRUE) turn_cl #Select solution with three cluster and plot the tree. plot(turn_cl$tree[[2]]) turn_cl$grouping #Perform a procrustes with uneven sample size #Create and plot a target matrix ex1 <-rbind(c(1,5),c(5,5),c(3,4),c(3,6)) plot(ex1,col=c(1:4),pch=19,xlim=c(0,6),ylim=c(0,6),cex=2) #Create and plot a matrix to be rotated. Only the points 1-4 are shared ex2<-rbind(c(3,1),c(3,3),c(2.5,2),c(3.5,2),c(3,4)) plot(ex2,col=c(1:5),pch=19,xlim=c(0,6),ylim=c(0,6),cex=2) #Perform the procrustes on points 1-4 #Apply the transformation to point 5 of ex2 and plot the matrices procr1<-recluster.procrustes(ex1,ex2,num=4) plot(procr1$X,col=c(1:4),pch=19,xlim=c(-4,4),ylim=c(-4,4),cex=2) plot(procr1$Yrot,col=c(1:5),pch=19,xlim=c(-4,4),ylim=c(-4,4),cex=2) # Create an example for biodecrypt mat<-rbind(cbind(rnorm(n = 20, mean = 1, sd = 4),rnorm(n = 20, mean = 40, sd = 3)), cbind(rnorm(n = 20, mean = 7, sd = 5),rnorm(n = 20, mean = 45, sd = 2))) id<-c(rep(1,20),rep(2,20)) id[sample(c(1:40))[1:10]]<-0 # Perform biodecrypt with default parameters # alpha gets high to include 95 attribution<-biodecrypt(mat,id, clipToCoast="no") #plot the results plot(mat,type="n") biodecrypt.plot(attribution) ## End(Not run)
#load model data provided with the package ## Not run: data(datamod) #explore zero and tied values in the data set simpdiss<- recluster.dist(datamod) recluster.hist(simpdiss) #create and view unbiased consensus tree (100 constree_full<-recluster.cons(datamod, tr=10, p=1) plot(constree_full$cons,direction="downwards") #compute and view node strength recluster.node.strength(datamod, tr=10) #create and view unbiased consensus tree (50 constree_half<-recluster.cons(datamod, tr=10, p=0.5) plot(constree_half$cons, direction="downwards") #the latter is the correct tree tree<-constree_half$cons #perform and view bootstrap on nodes boot<-recluster.boot(tree, datamod, tr=10, p=0.5, boot=50) recluster.plot(tree,boot) #perform and view multiscale bootstrap on nodes multiboot<- recluster.multi(tree, datamod, tr=10, boot=50, levels=2, step=1) recluster.plot(tree,multiboot,low=1,high=2, direction="downwards") #project and plot a bi-dimensional plot in the RGB colour space sordiss<- recluster.dist(datamod,dist="sorensen") points<-cmdscale(sordiss) col<-recluster.col(points) recluster.plot.col(col) #inspect explained diversity for different cuts of a tree tree<-recluster.cons(datamod, dist="sorensen",tr=10, p=0.5) expl_div<-recluster.expl.diss(tree$cons,sordiss) expl_div #Select cut #4 and group data in RGB space ncol<-recluster.group.col(col,expl_div$matrix[,4]) #Plot mean values for clusters recluster.plot.col(ncol$aggr) #Plot mean colours for sites in the geographic space lat<-c(2,2,2,1,3,1,1,3,3) long<-c(1,5,3,3,3,1,5,1,5) recluster.plot.sites.col(long, lat, ncol$all,text=TRUE) #Use recluster.region procedure on island butterflies data(dataisl) simpson<-recluster.dist(dataisl) turn_cl<-recluster.region(simpson,tr=10,rettree=TRUE) turn_cl #Select solution with three cluster and plot the tree. plot(turn_cl$tree[[2]]) turn_cl$grouping #Perform a procrustes with uneven sample size #Create and plot a target matrix ex1 <-rbind(c(1,5),c(5,5),c(3,4),c(3,6)) plot(ex1,col=c(1:4),pch=19,xlim=c(0,6),ylim=c(0,6),cex=2) #Create and plot a matrix to be rotated. Only the points 1-4 are shared ex2<-rbind(c(3,1),c(3,3),c(2.5,2),c(3.5,2),c(3,4)) plot(ex2,col=c(1:5),pch=19,xlim=c(0,6),ylim=c(0,6),cex=2) #Perform the procrustes on points 1-4 #Apply the transformation to point 5 of ex2 and plot the matrices procr1<-recluster.procrustes(ex1,ex2,num=4) plot(procr1$X,col=c(1:4),pch=19,xlim=c(-4,4),ylim=c(-4,4),cex=2) plot(procr1$Yrot,col=c(1:5),pch=19,xlim=c(-4,4),ylim=c(-4,4),cex=2) # Create an example for biodecrypt mat<-rbind(cbind(rnorm(n = 20, mean = 1, sd = 4),rnorm(n = 20, mean = 40, sd = 3)), cbind(rnorm(n = 20, mean = 7, sd = 5),rnorm(n = 20, mean = 45, sd = 2))) id<-c(rep(1,20),rep(2,20)) id[sample(c(1:40))[1:10]]<-0 # Perform biodecrypt with default parameters # alpha gets high to include 95 attribution<-biodecrypt(mat,id, clipToCoast="no") #plot the results plot(mat,type="n") biodecrypt.plot(attribution) ## End(Not run)
Given an initial tree and a data matrix, this function computes bootstrap for nodes. Each tree used for bootstrap is constructed by re-sampling the row order several times and by applying a consensus rule as done by recluster.cons. The number of sampled columns (species) can be varied.
recluster.boot(tree, mat, phylo = NULL, tr = 100, p = 0.5, dist = "simpson", method = "average", boot = 1000, level = 1)
recluster.boot(tree, mat, phylo = NULL, tr = 100, p = 0.5, dist = "simpson", method = "average", boot = 1000, level = 1)
tree |
A reference phylo tree for sites presumably constructed with recluster.cons function. |
mat |
The matrix used to construct the tree. |
phylo |
An ultrametric and rooted tree for species phylogeny having the same labels of the mat columns. Only required for phylogenetic beta-diversity indices. |
tr |
The number of trees to be included in the consensus. |
p |
A numeric value between 0.5 and 1 giving the proportion for a clade to be represented in the consensus tree. |
dist |
A beta-diversity index (the Simpson index by default) included in recluster.dist or any custom binary dissimilarity to be specified according to the syntax of designdist function of the vegan package. |
method |
Any clustering method allowed by hclust. |
boot |
The number of trees used for bootstrap computation. |
level |
The ratio between the number of species to be included in the analysis and the original number of species in the mat matrix. |
Computation can be time consuming due to the high number of trees required for analysis. It is suggested to assess the degree of row bias by recluster.hist and recluster.node.strength to optimize the number of required consensus trees before starting the analysis.
A vector indicating the percentage of bootstrap trees replicating each original node.
Leonardo Dapporto and Matteo Ramazzotti
Dapporto L., Ramazzotti M., Fattorini S., Talavera G., Vila R., Dennis R. "recluster: an unbiased clustering procedure for beta-diversity turnover" Ecography (2013), 36:1070-1075.
data(datamod) tree<-recluster.cons(datamod,tr=10) boot<-recluster.boot(tree$cons,tr=5,boot=50,datamod) recluster.plot(tree$cons,boot,direction="downwards") data(treemod) tree<-recluster.cons(datamod,treemod, dist="phylosort", tr=10) boot<-recluster.boot(tree$cons, datamod, treemod,tr=5,boot=50) recluster.plot(tree$cons,boot,direction="downwards")
data(datamod) tree<-recluster.cons(datamod,tr=10) boot<-recluster.boot(tree$cons,tr=5,boot=50,datamod) recluster.plot(tree$cons,boot,direction="downwards") data(treemod) tree<-recluster.cons(datamod,treemod, dist="phylosort", tr=10) boot<-recluster.boot(tree$cons, datamod, treemod,tr=5,boot=50) recluster.plot(tree$cons,boot,direction="downwards")
This function projects a two dimensional matrix into a RGB space with red, green, yellow and blue at its four corners. RGB combination for each case corresponding to its position in this space is provided together with new coordinates.
recluster.col(mat,st=TRUE,rot=TRUE)
recluster.col(mat,st=TRUE,rot=TRUE)
mat |
A matrix containing two dimensional coordinates for cases. |
st |
Logical, if TRUE then values in axes are standardized between 0 and 1, if FALSE then original values are maintained. |
rot |
Logical, if TRUE then the axis with highest variance is oriented on the x-axis. |
A matrix with the first two columns representing the coordinates and the third, fourth and fifth representing the red, green and blue components, respectively.
Leonardo Dapporto and Matteo Ramazzotti
Kreft H., Jetz, W. "A framework for delineating biogeographic regions based on species distributions" J Biogeogr (2010),37: 2029-2053.
Dapporto, L., Fattorini, S., Voda, R., Dinca, V., Vila, R. "Biogeography of western Mediterranean butterflies: combining turnover and nestedness components of faunal dissimilarity." J Biogeogr (2014), 41: 1639-1650.
data(datamod) sordiss<- recluster.dist(datamod,dist="sorensen") points<-cmdscale(sordiss) col<-recluster.col(points) col
data(datamod) sordiss<- recluster.dist(datamod,dist="sorensen") points<-cmdscale(sordiss) col<-recluster.col(points) col
This function creates a series of trees by resampling the order of sites in the original dissimilarity matrix. Then, it computes a consensus among them. The resulting tree is independent of the original row order.
recluster.cons(mat, phylo = NULL, tr = 100, p = 0.5, dist = "simpson", method = "average", blenghts=TRUE, select=FALSE)
recluster.cons(mat, phylo = NULL, tr = 100, p = 0.5, dist = "simpson", method = "average", blenghts=TRUE, select=FALSE)
mat |
A matrix containing sites (rows) and species (columns) or any dissimilarity matrix. |
phylo |
An ultrametric and rooted tree for species phylogeny having the same labels as in mat columns. Only required to compute phylogenitic beta-diversity indexes. |
tr |
The number of trees to be used for the consensus. |
p |
A numeric value between 0.5 and 1 giving the proportion for a clade to be represented in the consensus tree. |
dist |
A beta-diversity index (the Simpson index by default) included in recluster.dist or any custom binary dissimilarity to be specified according to the syntax of designdist function of the vegan package. |
method |
Any clustering method allowed by hclust. |
blenghts |
A logical indicating if non-negative least squares branch lengths should be computed. |
select |
A logical indicating if only trees having a fit higher than the median value in the least squares regression should be included in the consensus analysis. |
According to the primitive "consensus" function from the "ape" package, p must range between 0.5 and 1. Select = TRUE can allow lowering polytomies by removing trees with topology showing particularly low correlation with the distance matrix. Row names are required.
cons |
The consensus tree, an object of class phylo. |
trees |
The trees used to construct the final consensus tree. |
RSS |
The Residual Sum of Squares for the trees resulting if select=TRUE. |
Leonardo Dapporto and Matteo Ramazzotti
Dapporto L., Ramazzotti M., Fattorini S., Talavera G., Vila R., Dennis R. "recluster: an unbiased clustering procedure for beta-diversity turnover" Ecography (2013), 36:1070-1075.
#Faunistic beta diversity data(datamod,treemod) tree<-recluster.cons(datamod,tr=10) plot(tree$cons,direction="downwards") #Phylogenetic beta diversity tree_p<-recluster.cons(datamod,treemod,dist="phylosort",tr=10) plot(tree_p$cons, direction="downwards")
#Faunistic beta diversity data(datamod,treemod) tree<-recluster.cons(datamod,tr=10) plot(tree$cons,direction="downwards") #Phylogenetic beta diversity tree_p<-recluster.cons(datamod,treemod,dist="phylosort",tr=10) plot(tree_p$cons, direction="downwards")
This function computes dissimilarity matrices based on the two most popular partitions of faunistic and phylogenitic beta-diversity. In particular Jaccard = beta3 + richness (Carvalho et al. 2012), Jaccard = Jturnover + Jnestedness (Baselga, 2012) and Sorensen = Simpson + nestedness (Baselga 2010) for faunistic indexes and Unifrac = Unifrac_turn + Unifrac_PD and PhyloSor = PhyloSor_turn + Phylosor_PD (Leprieur et al. 2012). Any other binary index can be included in brackets by using the syntax of designdist function of the vegan package.
recluster.dist(mat, phylo=NULL, dist="simpson")
recluster.dist(mat, phylo=NULL, dist="simpson")
mat |
A matrix containing sites (rows) and species (columns). |
phylo |
An ultrametric and rooted phylogenetic tree for species having the same labels as in mat columns. Only required for phylogenitic beta-diversity indexes. |
dist |
One among the 14 beta-diversity indexes |
Syntax for binary indices in vegdist: J, number of common species; A and B, number of species exclusive of the first and of the second site.
An object of class dist (see vegan:designdist for further details)
Leonardo Dapporto and Matteo Ramazzotti
Baselga A. "Partitioning the turnover and nestedness components of beta diversity." Global Ecol Biogeogr (2010), 19: 134-143.
Carvalho J. C., Cardoso P., Gomes P. "Determining the relative roles of species replacement and species richness differences in generating beta-diversity patterns." Global Ecol Biogeogr (2012), 21: 760-771.
Leprieur F., Albouy C., De Bortoli J., Cowman P.F., Bellwood D.R., Mouillot D. "Quantifying Phylogenetic Beta Diversity: Distinguishing between 'True' Turnover of Lineages and Phylogenetic Diversity Gradients." Plos One (2012), 7
This function computes the fraction of the distances contained in a dissimilarity matrix which is explained by a clustering solution of the elements. The value is obtained by computing the sum of all the dissimilarity values among elements belonging to different clusters and divided by the sum of all the cells of the original dissimilarity matrix.
recluster.expl(mat, clust)
recluster.expl(mat, clust)
mat |
A dissimilarity matrix |
clust |
A clustering solution for the cases contained in the dissimilarity matrix. |
A number ranging between 0 and 1 indicating the fraction of explained dissimilarity.
Leonardo Dapporto
Holt, B.G. et al "An Update of Wallace's Zoogeographic Regions of the World." Science, 339:74-78.
data(datamod) sor_tree<- recluster.cons(datamod, dist="sorensen") sor_diss <- recluster.dist (datamod, dist="sorensen") expl_diss <- recluster.expl.diss (sor_tree$cons,sor_diss) expl_diss
data(datamod) sor_tree<- recluster.cons(datamod, dist="sorensen") sor_diss <- recluster.dist (datamod, dist="sorensen") expl_diss <- recluster.expl.diss (sor_tree$cons,sor_diss) expl_diss
This function cuts a phylogenetic tree at all its nodes, and provides membership for each element in the series of resulting clusters and computes the fraction of dissimilarity explained by each solution.
recluster.expl.diss(tree, dist, maxcl=NULL, mincl=NULL, maxnode=NULL, expld=TRUE)
recluster.expl.diss(tree, dist, maxcl=NULL, mincl=NULL, maxnode=NULL, expld=TRUE)
tree |
A phylo tree |
dist |
A dissimilarity matrix. |
maxcl |
A custom number indicating the solution with the minimum number of clusters. If NULL the minimum number of clusters is returned. |
mincl |
A custom number indicated the solution with the maximum number of clusters. If NULL the maximum number of clusters is returned |
maxnode |
A custom number indicated the most external node for the cut. If NULL all the nodes will be cut |
expld |
A logical. If TRUE then the matrix for explained dissimilarity is computed. |
When polytomic nodes are involved in a cut the number of clusters at that cut could increase more than one unit. It is also possible that at the first cut more than two cluster are identified, it is thus possible to obatin a first solution showing a higher number of clusters then the miminum number included in mincl. Holt at al. (2013) identified levels of explained dissimilarity to be used as a reliable threshold to assess a tree cut. When cases are highly numerous maxnode can be set in order to avoid a very long computation keeping in mind that a cut at node 6 can produce solutions with >6 clusters
matrix |
A matrix indicating cluster membership of each site in each cut of the tree. |
expl.div |
A vector indicating the explained dissimilarity for each cut. |
nclust |
A vector indicating the number of clusters resulting from each cut. |
Leonardo Dapporto
Dapporto L., Ciolli G., Dennis R.L.H., Fox R., Shreeve, T.G. "A new procedure for extrapolating turnover regionalization at mid?small spatial scales, tested on B ritish butterflies." Methods in Ecology and Evolution (2015), 6:1287-1297.
data(datamod) sor_tree<- recluster.cons(datamod, dist="sorensen") sor_diss <- recluster.dist (datamod, dist="sorensen") expl_diss <- recluster.expl.diss (sor_tree$cons,sor_diss) expl_diss
data(datamod) sor_tree<- recluster.cons(datamod, dist="sorensen") sor_diss <- recluster.dist (datamod, dist="sorensen") expl_diss <- recluster.expl.diss (sor_tree$cons,sor_diss) expl_diss
This function computes some indexes of genetic differentiation based on a distance matrix and on a vector for populations.
recluster.fst(dist,vect,setzero=F,setnazero=F)
recluster.fst(dist,vect,setzero=F,setnazero=F)
dist |
A distance matrix. |
vect |
A vector indicating population membership. Cases must be in the some order of the distance matrix. |
setzero |
A logical indicating if negative values should be set to zero |
setnazero |
A logical indicating if NA values should be set to zero |
There has been a large dabate around FST like indexes. Two main indexes are culcalated by this function: the absolute differentiation (Dst) and the standardized differentiation (Gst) (Nei, 1987) .Dst is calculated as: Dst = Ht - Hs where Ht represents the average distances among all the specimens in the sample, and Hs is the average of the intra-area (or intra-sub-area) distances. Thus, Dst represents the average genetic differentiation among areas in p-distance units. Gst is a standardized index defined as: Gst = Dst/Ht representing the fraction of the total genetic differentiation encompassed by the differentiation among areas (Nei, 1987). This index ranges from negative values to 1 (complete differentiation). Negative values in Gst and Dst (intra-area differentiation higher than inter-area differentiation) can have different subtle meanings, but are most often generated as bias due to relatively small sample sizes; usually they are set to zero (Meirmans & Hedrick, 2011) and we applied this solution. In the species showing no mutations in the sample, Gst returns a NA value (while Dst equals to zero). These cases can be also set to zero The use of Dst and Gst has been debated as a measure of population diversification for extremely variable markers (as micro-satellites) as it tends to underestimate differentiation among populations and to strongly depend on intra-population variability (Jost, 2008; Whitlock, 2011). D and G-st indices are less affected by high values of Hs
Ht |
The average distances among all the specimens in the sample. |
lengthHt |
The number of distances among all the specimens in the sample. |
Hs |
The average distances among the specimens of the same populations. |
lengthHs |
The number of distances among the specimens of the same populations. |
Dst |
The Dst value. |
Gst |
The Gst value. |
D |
The D value. |
G1st |
The G'st value. |
Leonardo Dapporto
Jost L. "GST and its relatives do not measure differentiation." Mol Ecol (2008), 17:4015-4026.
Meirmans P. G., Hedrick P. W. "Assessing population structure: FST and related measures: Invited Technical Reviwev." Mol Ecol Res (2011), 11: 5-18.
Nei M. Molecular evolutionary genetics (1987), Columbia University Press.
Whitlock M.C. "G'ST and D do not replace FST." Mol Ecol (2011), 20: 1083-1091.
datavirtual<-data.frame(replicate(10,sample(0:1,30,rep=TRUE))) dist<-recluster.dist(datavirtual) population<-c(rep(1,20),rep(2,20),rep(3,20)) recluster.fst(dist,population)
datavirtual<-data.frame(replicate(10,sample(0:1,30,rep=TRUE))) dist<-recluster.dist(datavirtual) population<-c(rep(1,20),rep(2,20),rep(3,20)) recluster.fst(dist,population)
This function computes pairwise indexes of genetic differentiation among populations based on a distance matrix and on a vector for populations.
recluster.fst.pair(dist,vect,setzero=F,setnazero=F)
recluster.fst.pair(dist,vect,setzero=F,setnazero=F)
dist |
A distance matrix. |
vect |
A vector indicating population membership. Cases must be in the some order of the distance matrix. |
setzero |
A logical indicating if negative values should be set to zero |
setnazero |
A logical indicating if NA values should be set to zero |
The formulas used for pairwise calculations between i and j populations are Dstij = Htij - Hsij Gstij = Dstij/Ht Dij = (Dstij/(1-Hsij))*2 G'stij = Gstij/((1-Hsij)/(1+Hsij)) see also recluster.fst for a discussion of indexes
Dstm |
The Dst distance matrix. |
Gstm |
The Gst distance matrix. |
Dm |
The D distance matrix. |
G1stm |
The G'st distance matrix. |
Leonardo Dapporto
Jost L. "GST and its relatives do not measure differentiation." Mol Ecol (2008), 17:4015-4026.
Meirmans P. G., Hedrick P. W. "Assessing population structure: FST and related measures: Invited Technical Reviwev." Mol Ecol Res (2011), 11: 5-18.
Nei M. Molecular evolutionary genetics (1987), Columbia University Press.
Whitlock M.C. "G'ST and D do not replace FST." Mol Ecol (2011), 20: 1083-1091.
datavirtual<-data.frame(replicate(20,sample(0:1,60,rep=TRUE))) dist<-recluster.dist(datavirtual) population<-c(rep(1,20),rep(2,20),rep(3,20)) recluster.fst.pair(dist,population)
datavirtual<-data.frame(replicate(20,sample(0:1,60,rep=TRUE))) dist<-recluster.dist(datavirtual) population<-c(rep(1,20),rep(2,20),rep(3,20)) recluster.fst.pair(dist,population)
This function computes barycenters and their RGB colours for cases belonging to the same group from an original RGB colour matrix obtained by recluster.col.
recluster.group.col(mat,member)
recluster.group.col(mat,member)
mat |
An inherited matrix from recluster.col containing the original RGB colour space. |
member |
A vector indicating group membership for each case. |
aggr |
A matrix in the recluster.col format with mean values for coordinates and RGB colours for groups. |
all |
A matrix in the recluster.col format reporting mean RGB colours of the group of each original case. |
Leonardo Dapporto and Matteo Ramazzotti
Kreft H., Jetz, W. "A framework for delineating biogeographic regions based on species distributions" J Biogeogr (2010),37: 2029-2053.
Dapporto, L., Fattorini, S., Voda, R., Dinca, V., Vila, R. "Biogeography of western Mediterranean butterflies: combining turnover and nestedness components of faunal dissimilarity." J Biogeogr (2014), 41: 1639-1650.
data(datamod) sordiss<- recluster.dist(datamod,dist="sorensen") points<-cmdscale(sordiss) col<-recluster.col(points) group<-c(1,2,3,3,3,1,2,1,2) ncol<-recluster.group.col(col,group) recluster.plot.col(ncol$aggr)
data(datamod) sordiss<- recluster.dist(datamod,dist="sorensen") points<-cmdscale(sordiss) col<-recluster.col(points) group<-c(1,2,3,3,3,1,2,1,2) ncol<-recluster.group.col(col,group) recluster.plot.col(ncol$aggr)
This function creates a histogram with the values of a dissimilarity matrix where the number of cells with zero value are explicitely showed in the first bar. Moreover, it provides the percentage of cells having equal values in the matrix.
recluster.hist(x)
recluster.hist(x)
x |
A dissimilarity matrix. |
An histogram with supplementary information. The first bar only shows the zero values.
Leonardo Dapporto and Matteo Ramazzotti
Dapporto L., Ramazzotti M., Fattorini S., Talavera G., Vila R., Dennis R. "recluster: an unbiased clustering procedure for beta-diversity turnover" Ecography (2013), 36:1070-1075.
data(datamod) simpdiss<- recluster.dist(datamod) recluster.hist(simpdiss)
data(datamod) simpdiss<- recluster.dist(datamod) recluster.hist(simpdiss)
This function helps to understand different behaviours of node supports in multiscale bootstrap by i) plotting trends of support values in different bootstrap scales, ii) identifying the boostrap scale with highest diversification between two groups of nodes and iii) identifying nodes into two classes according to the best bootstrap level identified in (ii) and ploting their mean support values.
recluster.identify.nodes(mat, low=TRUE)
recluster.identify.nodes(mat, low=TRUE)
mat |
A matrix containing nodes (rows) and bootstrap levels (columns) as obtained by recluster.multi. |
low |
A logical value indicating if lower scales should be favoured in the selection. |
This function recognizes nodes showing different trends of support in multiscale bootstrap. In the analysis of turnover in biogeography some nodes may show a substantial increase in support in a multiscale bootstrap. Areas connected by these nodes may host a few species responsible for turnover, but the biogeographic pattern with respect is clear. Other nodes may show a slow (or no) increase in support. In this case, the links among areas can be considered as uncertain. Partitioning Around Medioids is used to identify two classes of nodes at each level, then the bootstrap scale showing the best diversification in two classes is identified by silhouette scores weighted by differences in mean values between classes. If "low" is set to TRUE the function favours low scales.
A plot with bootstrap supports and their means (diamonds) for the best combination of two groups of nodes (black and red).
scale |
The best bootstrap scale to identify two groups of nodes. |
nodes |
A vector containing classification for nodes in the best bootstrap scale. |
Leonardo Dapporto and Matteo Ramazzotti
Dapporto L., Ramazzotti M., Fattorini S., Talavera G., Vila R., Dennis R. "recluster: an unbiased clustering procedure for beta-diversity turnover" Ecography (2013), 36:1070-1075.
data(multiboot) recluster.identify.nodes(multiboot)
data(multiboot) recluster.identify.nodes(multiboot)
This function identifies a line in a configuration based on different criteria and produces its slope and intercept values. I can be used together with recluster.rotate to rotate a configuration based on a custom line.
recluster.line(mat,type="maxd",X1=NULL,X2=NULL)
recluster.line(mat,type="maxd",X1=NULL,X2=NULL)
mat |
The bidimensional configuration. |
type |
The type of line to be computed: "maxd" is the line connecting the most distant points, "regression" is the regression line between X and Y values, "points" is the line connecting two custom points of the configuration (X1 and X2). |
X1 |
The row number in mat of the first custom point. |
X2 |
The row number in mat of the second custom point. |
m |
The slope of the line. |
q |
The intercept of the line. |
Leonardo Dapporto
Dapporto L., Voda R., Dinca V., Vila R. "Comparing population patterns for genetic and morphological markers with uneven sample sizes. An example for the butterfly Maniola jurtina" Methods Ecol Evol (2014), 5, 834-843.
data(dataisl) #Compute bidimensional representation for islands pcoa<-cmdscale(recluster.dist(dataisl)) #Compute the line lin<-recluster.line(pcoa)
data(dataisl) #Compute bidimensional representation for islands pcoa<-cmdscale(recluster.dist(dataisl)) #Compute the line lin<-recluster.line(pcoa)
Given an initial tree and a data matrix, this function computes bootstrap for nodes as done by recluster.boot. Different levels of bootstrap can be computed by varying the proportions of species sampled from the original matrix.
recluster.multi(tree, mat, phylo = NULL, tr = 100, p = 0.5, dist = "simpson", method = "average", boot = 1000, levels = 2, step = 1)
recluster.multi(tree, mat, phylo = NULL, tr = 100, p = 0.5, dist = "simpson", method = "average", boot = 1000, levels = 2, step = 1)
tree |
A reference phylo tree for sites presumably constructed with recluster.cons function. |
mat |
The matrix used to construct the tree. |
phylo |
An ultrametric and rooted phylo tree for species having the same labels as in mat columns. Only required for phylogenetic beta-diversity indexes. |
tr |
The number of trees to be included in the consensus. |
p |
A numeric value between 0.5 and 1 giving the proportion for a clade to be represented in the consensus tree. |
dist |
One among the twelve beta-diversity indexes |
method |
Any clustering method allowed by hclust. |
boot |
The number of trees used for bootstrap computation. |
levels |
The number of levels to be used in multiscale bootstrap. |
step |
The increase in ratio between the first level (x1) and the next ones. |
Computation can be time consuming. It is suggested to assess the degree of row bias by recluster.hist and recluster.node.strength to optimize the number of consensus trees before starting the analysis.
A matrix indicating the percentage of bootstrap trees replicating each node for each level.
Leonardo Dapporto and Matteo Ramazzotti
Dapporto L., Ramazzotti M., Fattorini S., Talavera G., Vila R., Dennis R. "recluster: an unbiased clustering procedure for beta-diversity turnover" Ecography (2013), 36:1070-1075.
data(datamod) tree<-recluster.cons(datamod,tr=10) multiboot<-recluster.multi(tree$cons,tr=10,boot=50,datamod,levels=2,step=1) recluster.plot(tree$cons,multiboot,1,2,direction="downwards")
data(datamod) tree<-recluster.cons(datamod,tr=10) multiboot<-recluster.multi(tree$cons,tr=10,boot=50,datamod,levels=2,step=1) recluster.plot(tree$cons,multiboot,1,2,direction="downwards")
This function helps to understand the magnitude of row bias by computing a first tree with the original order of areas. Then it creates a default series of six trees by recluster.cons with increasing consensus rule from 50
recluster.node.strength(mat, phylo = NULL, dist = "simpson", nodelab.cex=0.8, tr = 100, levels=6, method = "average", ...)
recluster.node.strength(mat, phylo = NULL, dist = "simpson", nodelab.cex=0.8, tr = 100, levels=6, method = "average", ...)
mat |
A matrix containing sites (rows) and species (columns). |
phylo |
An ultrametric and rooted phylogenetic tree for species having the same labels as in mat columns. Only required for phylogenitic beta-diversity indexes. |
tr |
The number of trees to be used for the consensus. |
dist |
A beta-diversity index (the Simpson index by default) included in recluster.dist or any custom binary dissimilarity to be specified according to the syntax of designdist function of the vegan package. |
nodelab.cex |
the cex() parameter for controlling the size of the labels on the nodes (see |
levels |
The number of levels of different consensus threshold to be used. |
method |
Any clustering method allowed by hclust. |
... |
Arguments to be passed to plot.phylo methods, see the ape package manual and |
It has to be noted that values obtained by this function are not bootstrap supports for nodes but a crude indication of the magnitude of the row bias. Nodes with low value in this analysis can have strong bootstrap support and vice versa. This preliminary analysis can avoid that the use of a strict consensus (100
A cluster with percentages of recurrence over different consensus runs for each node.
Leonardo Dapporto and Matteo Ramazzotti
Dapporto L., Ramazzotti M., Fattorini S., Talavera G., Vila R., Dennis R. "recluster: an unbiased clustering procedure for beta-diversity turnover" Ecography (2013), 36:1070-1075.
data(datamod) recluster.node.strength(datamod, tr=10)
data(datamod) recluster.node.strength(datamod, tr=10)
This function produces plots for recluster trees and assignes single or pairs of support values belonging to single or multiscale analyses.
recluster.plot(tree, data, low = 1, high = 0, id=NULL, nodelab.cex=0.8, direction="downwards",...)
recluster.plot(tree, data, low = 1, high = 0, id=NULL, nodelab.cex=0.8, direction="downwards",...)
tree |
A phylo tree presumably constructed with recluster.cons function. |
data |
A matrix belonging to recluster.multi. |
id |
A vector used to mark node supports (low and high) with different colours. Such classificarion is presumably made by recluster.identify.nodes. |
low |
The low scale level for which bootstrap values should be indicated in the tree. |
high |
The high scale level for which bootstrap values should be indicated in the tree. |
nodelab.cex |
the cex() parameter for controlling the size of the labels on the nodes (see |
direction |
the |
... |
Arguments to be passed to plot.phylo methods, see the ape package manual and |
This function allows to print on a tree, one or two labels for bootstrap values and optimize their layout.
This is done with the nodelabels
ape function, by specifying the adj
parameters in the appropriate way.
A plot representing the tree with pairs of bootstrap values, below (usually x1 BP above) and high, above.
Leonardo Dapporto and Matteo Ramazzotti
Dapporto L., Ramazzotti M., Fattorini S., Talavera G., Vila R., Dennis R. "recluster: an unbiased clustering procedure for beta-diversity turnover" Ecography (2013), 36:1070-1075.
data(datamod) tree<-recluster.cons(datamod, tr=10) boot<-recluster.boot(tree$cons,datamod, tr=10, boot=50) recluster.plot(tree$cons,boot,direction="downwards")
data(datamod) tree<-recluster.cons(datamod, tr=10) boot<-recluster.boot(tree$cons,datamod, tr=10, boot=50) recluster.plot(tree$cons,boot,direction="downwards")
This function plots a matrix obtained by recluster.col in the RGB space.
recluster.plot.col(mat,cext=0.3,cex=1,cex.axis=0.7,cex.lab=0.8,pch=16,text=TRUE, add=F,xlim=NULL,ylim=NULL,ylab="Axis 2",xlab="Axis 1",...)
recluster.plot.col(mat,cext=0.3,cex=1,cex.axis=0.7,cex.lab=0.8,pch=16,text=TRUE, add=F,xlim=NULL,ylim=NULL,ylab="Axis 2",xlab="Axis 1",...)
mat |
A matrix inherited by recluster.col. |
cext |
Dimension for labels of row names. |
cex |
Dimension of dots. |
cex.axis |
Dimension of axis labels. |
cex.lab |
Dimension of labels. |
text |
A logical indicating if row names should be plotted. |
pch |
The shape of the dots (See par()). |
add |
A logical indicating if the plot should be added to a precedent graph. |
xlim |
The limit values for x-axis, if NULL the values in the orignal matrix is used. |
ylim |
The limit values for y-axis, if NULL the values in the orignal matrix is used. |
ylab |
The label of the y-axis |
xlab |
The label of the x-axis |
... |
See par() for other graphical parameters |
A colour plot.
Leonardo Dapporto and Matteo Ramazzotti
Kreft H., Jetz, W. 2010. "A framework for delineating biogeographic regions based on species distributions" J Biogeogr (2010),37: 2029-2053.
Dapporto, L., Fattorini, S., Voda, R., Dinca, V., Vila, R. "Biogeography of western Mediterranean butterflies: combining turnover and nestedness components of faunal dissimilarity." J Biogeogr (2014), 41: 1639-1650.
data(datamod) sordiss<- recluster.dist(datamod,dist="sorensen") points<-cmdscale(sordiss) col<-recluster.col(points) recluster.plot.col(col)
data(datamod) sordiss<- recluster.dist(datamod,dist="sorensen") points<-cmdscale(sordiss) col<-recluster.col(points) recluster.plot.col(col)
This function plots the values of the cells of a matrix in grey scale.
recluster.plot.matrix(mat)
recluster.plot.matrix(mat)
mat |
A dissimilarity matrix. |
A plot of cell values.
Leonardo Dapporto and Matteo Ramazzotti
Dapporto, L., Fattorini, S., Voda, R., Dinca, V., Vila, R. "Biogeography of western Mediterranean butterflies: combining turnover and nestedness components of faunal dissimilarity." J Biogeogr (2014), 41: 1639-1650.
data(datamod) simpdiss<- recluster.dist(datamod) recluster.plot.matrix(simpdiss)
data(datamod) simpdiss<- recluster.dist(datamod) recluster.plot.matrix(simpdiss)
This function groups cases based on a space grid in a user defined set of coordinates (usually longitude and latitude) and plot them in pies using RGB colours. The function can either use an output from recluster.col function or compute colours based on any distance matrix where the cases are in the same order as in the latitude and longitude data.
recluster.plot.pie(long, lat, mat=NULL, distance=NULL, loc=NULL, areas=NULL, square=2, map=NULL,add=FALSE,minsize=NULL,proportional=T,xlim=NULL,ylim=NULL, main=NULL,xlab=NULL,ylab=NULL,...)
recluster.plot.pie(long, lat, mat=NULL, distance=NULL, loc=NULL, areas=NULL, square=2, map=NULL,add=FALSE,minsize=NULL,proportional=T,xlim=NULL,ylim=NULL, main=NULL,xlab=NULL,ylab=NULL,...)
long |
A vector indicating longitude for cases. |
lat |
A vector indicating latitude for cases. |
mat |
A matrix inherited by recluster.col. |
distance |
A dissimilarity matrix for cases. |
loc |
A list of localities to group cases, if available. |
square |
The grid to be used to divide cases into groups (2 degrees latitude and longitude by default). |
areas |
An additional vector to divide groups (e.g. islands versus continents). |
map |
A map to be plotted. |
add |
A logical. If TRUE then the points are added to an existing graph. |
minsize |
Dimension for the dimension of a single-case pie. |
proportional |
A logical. If TRUE then the point area is proportional to the number of cases. |
xlim |
Limits of the plot in the x-axis. |
ylim |
Limits of the plot in the y-axis. |
main |
The title of the graph. |
xlab |
The label of x-axis |
ylab |
The label of y-axis |
... |
See par() for other graphical parameters |
A colour plot.
Leonardo Dapporto
Hernandez Roldan J.L., Dapporto L., Dinca V, Vicente J.C., Hornett E.A., Sichova J., Lukhtanov V.L., Talavera G. & Vila, R. Integrative analyses unveil speciation linked to host plant shift in Spialia butterflies. Molecular Ecology (2016) 25: 4267-4284.
# create a virtual dataset and a corresponding distance matrix lat<-runif(50,min=20,max=40) long<-runif(50,min=20,max=40) datavirtual<-data.frame(replicate(20,sample(0:1,50,rep=TRUE))) dist<-recluster.dist(datavirtual) # Make a plot using a custom distance recluster.plot.pie(long,lat,distance=dist,xlab="Longitude",ylab="Latitude") # Make a plot using a recluster.col matrix colours<-recluster.col(cmdscale(dist)) recluster.plot.pie(long,lat,mat=colours,xlab="Longitude",ylab="Latitude") # Make points of equal size recluster.plot.pie(long,lat,mat=colours,xlab="Longitude", proportional=FALSE, ylab="Latitude") # Reduce the grid recluster.plot.pie(long,lat,distance=dist,square=1, xlab="Longitude",ylab="Latitude") # Reduce the size of the plots recluster.plot.pie(long,lat,distance=dist,xlab="Longitude",ylab="Latitude", minsize=0.5) # Use a custom colour matrix pcoa<-cmdscale(dist) colour<-recluster.col(pcoa) recluster.plot.col(colour) recluster.plot.pie(long,lat,mat=colour,xlab="Longitude",ylab="Latitude") # Include an additional factor for separating dots in groups(e.g. two continents) continent<-rep(1,50) continent[which(long>25)]<-2 recluster.plot.pie(long,lat,distance=dist,xlab="Longitude",ylab="Latitude", areas=continent)
# create a virtual dataset and a corresponding distance matrix lat<-runif(50,min=20,max=40) long<-runif(50,min=20,max=40) datavirtual<-data.frame(replicate(20,sample(0:1,50,rep=TRUE))) dist<-recluster.dist(datavirtual) # Make a plot using a custom distance recluster.plot.pie(long,lat,distance=dist,xlab="Longitude",ylab="Latitude") # Make a plot using a recluster.col matrix colours<-recluster.col(cmdscale(dist)) recluster.plot.pie(long,lat,mat=colours,xlab="Longitude",ylab="Latitude") # Make points of equal size recluster.plot.pie(long,lat,mat=colours,xlab="Longitude", proportional=FALSE, ylab="Latitude") # Reduce the grid recluster.plot.pie(long,lat,distance=dist,square=1, xlab="Longitude",ylab="Latitude") # Reduce the size of the plots recluster.plot.pie(long,lat,distance=dist,xlab="Longitude",ylab="Latitude", minsize=0.5) # Use a custom colour matrix pcoa<-cmdscale(dist) colour<-recluster.col(pcoa) recluster.plot.col(colour) recluster.plot.pie(long,lat,mat=colour,xlab="Longitude",ylab="Latitude") # Include an additional factor for separating dots in groups(e.g. two continents) continent<-rep(1,50) continent[which(long>25)]<-2 recluster.plot.pie(long,lat,distance=dist,xlab="Longitude",ylab="Latitude", areas=continent)
This function plots the RGB dots belonging to a matrix obtained by recluster.col on a user defined set of coordinates (usually longitude and latitude) for original sites.
recluster.plot.sites.col (long, lat, mat, cext = 0.3, cex = 1, cex.axis = 0.7, cex.lab = 0.8, text = FALSE, pch=21, add = FALSE,...)
recluster.plot.sites.col (long, lat, mat, cext = 0.3, cex = 1, cex.axis = 0.7, cex.lab = 0.8, text = FALSE, pch=21, add = FALSE,...)
long |
A vector indicating longitude for cases. |
lat |
A vector indicating latitude for cases. |
mat |
A matrix inherited by recluster.col. |
text |
A logical indicating if row names should be plotted. |
cext |
Dimension for row names. |
cex |
Dimension of dots. |
cex.axis |
Dimension of axis labels. |
cex.lab |
Dimension of labels. |
add |
A logical. If TRUE then the points are added to an existing graph. |
pch |
The symbol to use when plotting points |
... |
See par() for other graphical parameters |
A colour plot.
Leonardo Dapporto and Matteo Ramazzotti
Dapporto, L., Fattorini, S., Vod?, R., Dinc?, V., Vila, R. "Biogeography of western Mediterranean butterflies: combining turnover and nestedness components of faunal dissimilarity." J Biogeogr (2014), 41: 1639-1650.
data(datamod) sordiss<- recluster.dist(datamod, dist="sorensen") lat<-c(2,2,2,1,3,1,1,3,3) long<-c(1,5,3,3,3,1,5,1,5) points<-cmdscale(sordiss) col<-recluster.col(points) recluster.plot.sites.col(long, lat, col,text=TRUE)
data(datamod) sordiss<- recluster.dist(datamod, dist="sorensen") lat<-c(2,2,2,1,3,1,1,3,3) long<-c(1,5,3,3,3,1,5,1,5) points<-cmdscale(sordiss) col<-recluster.col(points) recluster.plot.sites.col(long, lat, col,text=TRUE)
This function computes a procrustes analysis (as done by the vegan procrustes function) but it also allows including a subset of cases shared between the two matrices and some unshared cases. The shared cases must be listed first and in the same order in the two matrices. Moreover, the number of shared cases must be indicated. The function applies a procrustes analysis by scaling, mirroring ad rotating the second matrix to minimizing its dissimilarity from the first on the basis of shared cases. Then, the same transformation is applied to the unshared cases of the second matrix. Finally, it allows including the matrices of coordinates for variables as obtained, for example, by PCA.
recluster.procrustes(X, Y, Yv=FALSE, num=nrow(X), scale = TRUE, ...)
recluster.procrustes(X, Y, Yv=FALSE, num=nrow(X), scale = TRUE, ...)
X |
Target matrix. |
Y |
Matrix to be rotated. |
Yv |
Matrix of variables for the matrix to be rotated. |
num |
number of shared cases between the target matrix and the matrix to be rotated (by default all). |
scale |
number of shared cases between the target matrix and the matrix to be rotated (by default all). |
... |
See procrustes() for other parameters |
recluster.procrustes uses the vegan function procrustes to rotate a configuration (Y) to maximum similarity with another target matrix configuration (X) on the basis of a series of shared objects (rows). These objects must be in the same order in the two X and Y matrices. In case of additional cases (rows) in both the X and Y matrices, the same transformation is applied to the case of the Y matrices which are not shared with X. Moreover, the same transformation can be applied to an additional Yv matrix likely representing the coordinates of variables as obtained for example by PCA or other ordination methods. The functions returns an object of the class "procrustes" as implemented in vegan.
Yrot |
Rotated matrix Y. |
X |
Target matrix. |
Yvrot |
Rotated matrix of variables Yv. |
ss |
Sum of squared differences between X and Yrot on the basis of shared objects. |
rotation |
Orthogonal rotation matrix on the basis of shared objects. |
translation |
Translation of the origin on the basis of shared objects. |
scale |
Scaling factor on the basis of shared objects. |
xmean |
The centroid of the target on the basis of shared objects. |
Leonardo Dapporto
Dapporto L., Voda R., Dinca V., Vila R. "Comparing population patterns for genetic and morphological markers with uneven sample sizes. An example for the butterfly Maniola jurtina" Methods Ecol Evol (2014), 5, 834-843.
#Create and plot a target matrix ex1 <-rbind(c(1,5),c(5,5),c(3,4),c(3,6)) plot(ex1,col=c(1:4),pch=19,xlim=c(0,6),ylim=c(0,6),cex=2) #Create and plot a matrix to be rotated. Only the points 1-4 are shared ex2<-rbind(c(3,1),c(3,3),c(2.5,2),c(3.5,2),c(3,4)) plot(ex2,col=c(1:5),pch=19,xlim=c(0,6),ylim=c(0,6),cex=2) #Perform the procrustes and plot the matrices procr1<-recluster.procrustes(ex1,ex2,num=4) plot(procr1$X,col=c(1:4),pch=19,xlim=c(-4,4),ylim=c(-4,4),cex=2) plot(procr1$Yrot,col=c(1:5),pch=19,xlim=c(-4,4),ylim=c(-4,4),cex=2)
#Create and plot a target matrix ex1 <-rbind(c(1,5),c(5,5),c(3,4),c(3,6)) plot(ex1,col=c(1:4),pch=19,xlim=c(0,6),ylim=c(0,6),cex=2) #Create and plot a matrix to be rotated. Only the points 1-4 are shared ex2<-rbind(c(3,1),c(3,3),c(2.5,2),c(3.5,2),c(3,4)) plot(ex2,col=c(1:5),pch=19,xlim=c(0,6),ylim=c(0,6),cex=2) #Perform the procrustes and plot the matrices procr1<-recluster.procrustes(ex1,ex2,num=4) plot(procr1$X,col=c(1:4),pch=19,xlim=c(-4,4),ylim=c(-4,4),cex=2) plot(procr1$Yrot,col=c(1:5),pch=19,xlim=c(-4,4),ylim=c(-4,4),cex=2)
This function is specifically designed to facilitate regionalization analysis in cases where zero and tied values are particularly frequent. This often occurs when using turnover indices at small or intermediate spatial scales where large barriers are absent. The function requires a matrix as input, with areas in rows and species occurrence (1,0) in columns. It also allows for the inclusion of a phylogenetic tree to compute phylogenetic beta-diversity.
The indices used are those supported by recluster.dist, but custom indices can also be introduced (see recluster.dist). Alternatively, a dissimilarity matrix generated by any function can be provided. The function requires input for a custom number of trees (default n=50) and a range of mincl-maxcl values (default 2-3), indicating the number of regions to be identified. Clustering methods implemented in hclust are supported, as well as Partition Around Medoids (PAM) and DIANA. The default method, ward.2D, typically offers the best performance, but ward.D, complete linkage clustering, PAM, and DIANA may also perform well.
The function generates n trees by randomly reordering the original row order. These trees are then cut at different nodes (from the mincl-1th to the maxcl-1th node), resulting in an increasing number of clusters. The function compares clustering solutions at the same cut levels across different resampled trees, producing a dissimilarity matrix between areas based on how often each pair of areas appears in different clusters across the different tree solutions at the same cut level. This dissimilarity is standardized by the number of resampled trees, yielding values from 0 (for pairs of areas always in the same cluster) to 1 (for pairs never in the same cluster).
A final hierarchical clustering is applied to generate an interval of maxcl-mincl. Since the user-defined number of clusters may not exactly match the mean number of clusters obtained from the tree cuts, the clustering solution for each k value is selected from the dissimilarity matrix closest to the mean number of clustering solutions.
recluster.region (mat,tr=50,dist="simpson",method="ward.D2", members=NULL, phylo=NULL, mincl=2,maxcl=3, rettree=FALSE,retmat=FALSE,retmemb=FALSE)
recluster.region (mat,tr=50,dist="simpson",method="ward.D2", members=NULL, phylo=NULL, mincl=2,maxcl=3, rettree=FALSE,retmat=FALSE,retmemb=FALSE)
mat |
A binary presence-absence community matrix or any dissimilarity matrix. |
tr |
The number of trees to be included in the consensus. |
dist |
One among the beta-diversity indexes allowed by recluster.dist or a custom binary dissimilarity specified according to the syntax of designdist function of the vegan package. Not required when the input is a dissimilarity matrix. |
method |
Any clustering method allowed by hclust but also "pam" and "diana". |
members |
For hclust methods, a vector. |
phylo |
An ultrametric and rooted phylogenetic tree for species having the same labels as in mat columns. Only required for phylogenetic beta-diversity indices. |
mincl |
The minimum number of regions requested |
maxcl |
The maximum number of regions requested |
rettree |
Logical, if TRUE the final trees are returned. |
retmat |
Logical, if TRUE the new dissimilarity matrices are returned. |
retmemb |
Logical, if TRUE the memberships for areas in different random trees is returned. |
Like other evaluators for goodness of clustering solutions, the funtion provides silhouette values and the explained dissimilarity. The explained dissimilarity (sensu Holt et al. 2013) is represented by the ratio between sums of mean dissimilarities among members of different clusters and the sum of all dissimilarities of the matrix. This value clearly tends to 1 when all areas are considered as independent groups. Silhouette width measures the strength of any partition of objects from a dissimilarity matrix by comparing the minimum distance between each cell and the most similar cell belonging to any other cluster and the mean distance between that cell and the others belonging to the same cluster (see silhouette function in the cluster package). Silhouette values range between -1 and +1, with a negative value suggesting that most cells are probably located in an incorrect cluster.
memb |
An array with different matrices indicating for each area (rows) the membership in each random tree (columns) in each cut (matrix). |
matrices |
The new dissimilarity matrices. Up-right cells provided as NAs. |
nclust |
Mean number of clusters among random trees obtained by different cuts. |
solutions |
A matrix providing number of clusters for each solution (k), the associated mean number of clusters obtained by cuts (clust), the silhouette (silh) value and the explained dissimilarity (ex.diss). |
grouping |
A matrix indicating cluster membership of each site in each solution for different numbers of clusters. |
Leonardo Dapporto
Dapporto L. et al. A new procedure for extrapolating turnover regionalization at mid-small spatial scales, tested on British butterflies. Methods Ecol Evol (2015), 6, 1287-1297
data(dataisl) simpson<-recluster.dist(dataisl) turn_cl<-recluster.region(simpson,tr=10,rettree=TRUE) #plot the three for three clusters plot(turn_cl$tree[[2]]) #inspect cluster membership turn_cl$grouping
data(dataisl) simpson<-recluster.dist(dataisl) turn_cl<-recluster.region(simpson,tr=10,rettree=TRUE) #plot the three for three clusters plot(turn_cl$tree[[2]]) #inspect cluster membership turn_cl$grouping
This function rotates the points of a configuration to a new configuration where a line identified by its intercept and its angular coefficient is rotated to become horizontal. The function can also flip or centre a configuration
recluster.rotate(table,m=FALSE,q=FALSE,flip="none",centre=TRUE)
recluster.rotate(table,m=FALSE,q=FALSE,flip="none",centre=TRUE)
table |
The bidimensional configuration. |
m |
The line slope. |
q |
The line intercept |
flip |
The kind of flip, no flip, "none"; "hor", flip horizontally; "ver", flip vertically; "both", flip vertically and horizontally. |
centre |
A logical. If TRUE the configuration, after transformation is centered to the mean X and Y values. |
table2 |
The transformed bidimensional configuration. |
Leonardo Dapporto
Dapporto L., Voda R., Dinca V., Vila R. "Comparing population patterns for genetic and morphological markers with uneven sample sizes. An example for the butterfly Maniola jurtina" Methods Ecol Evol (2014), 5, 834-843.
data(dataisl) #Compute bidimensional representation for islands pcoa<-cmdscale(recluster.dist(dataisl)) plot (pcoa) #Compute the line lin<-recluster.line(pcoa) transf<-recluster.rotate(pcoa,m=lin$m,q=lin$q) plot(transf)
data(dataisl) #Compute bidimensional representation for islands pcoa<-cmdscale(recluster.dist(dataisl)) plot (pcoa) #Compute the line lin<-recluster.line(pcoa) transf<-recluster.rotate(pcoa,m=lin$m,q=lin$q) plot(transf)
This function evaluates the amount of variation maintained by a bidimensional configuration after the elements are reduced to the barycentres according to a grouping variable. If elements of different groups are randomly scattered in the configuration, almost all barycentres are expected to attain a rather central position with respect to the original elements, which would result in a small mean distance between barycentres. Conversely, if the elements of different groups are strictly clustered in the representation, the distances among barycentres are expected to be similar to the distances among original elements.
recluster.test.dist(mat1,mat2,member,perm=1000,elev=2)
recluster.test.dist(mat1,mat2,member,perm=1000,elev=2)
mat1 |
The bidimensional configuration before computing barycentres for groups. |
mat2 |
The bidimensional configuration after computing barycentres for groups. |
member |
A vector indicating group membership for each element. |
perm |
The number of permutations. |
elev |
The power of distances (by default 2:squared distances). |
The function produces a ratio between the mean squared pairwise distance for all elements and the mean squared pairwise distance for barycentres. This ratio is calculated for the overall configuration and for the two axes separately. The function also provides a test for the significance of the variation preserved by barycentres by creating a custom number of matrices (1000 by default) by randomly sampling the original vector defining groups. Then it computes the frequency of mean squared distance ratios in random configurations higher than the observed ratio.
ratio |
The ratio between mean distances among original elements and barycentres over the overall configuration. |
ratioX |
The ratio between mean distances among original elements and barycentres on the X axis. |
ratioY |
The ratio between mean distances among original elements and barycentres on the Y axis. |
test |
The permutation test for variation maintained over the overall configuration. |
testX |
The permutation test for variation maintained along the X axis. |
testY |
The permutation test for variation maintained along the Y axis. |
Leonardo Dapporto
Dapporto L., Voda R., Dinca V., Vila R. "Comparing population patterns for genetic and morphological markers with uneven sample sizes. An example for the butterfly Maniola jurtina" Methods Ecol Evol (2014), 5, 834-843.
data(dataisl) #Define groups of islands memb<-c(2,3,5,7,5,3,1,1,2,5,1,3,1,1,5,2,2,1,2,4,1,3,1,5,2,1,7,6,1,1,1) #Compute bidimensional representation for elements pcoa<-cmdscale(recluster.dist(dataisl)) bar<-aggregate(pcoa~memb,FUN="mean")[,2:3] # test if the variation has been significantly lost recluster.test.dist(pcoa,bar,memb,perm=100)
data(dataisl) #Define groups of islands memb<-c(2,3,5,7,5,3,1,1,2,5,1,3,1,1,5,2,2,1,2,4,1,3,1,5,2,1,7,6,1,1,1) #Compute bidimensional representation for elements pcoa<-cmdscale(recluster.dist(dataisl)) bar<-aggregate(pcoa~memb,FUN="mean")[,2:3] # test if the variation has been significantly lost recluster.test.dist(pcoa,bar,memb,perm=100)
This phylogenetic tree has been created based on known phylogeny of butterflies at family and subfamily level and on COI sequences at genus and species level. Branch lenghts have been calculated by Graphen method
data(treemod)
data(treemod)
A phylogenetic tree of butterfly species occurring on Western Mediterranean islands.
Gerard Talavera and Roger Vila
Dapporto L., Ramazzotti M., Fattorini S., Talavera G., Vila R., Dennis R. "recluster: an unbiased clustering procedure for beta-diversity turnover" Ecography (2013), 36:1070-1075.
This phylogenetic tree has been created from the datamod dataset representing a series of virtual faunas in different sites
data(treemod)
data(treemod)
A phylogenetic tree of 31 species taken from 9 sites.
Gerard Talavera
Dapporto L., Ramazzotti M., Fattorini S., Talavera G., Vila R., Dennis R. "recluster: an unbiased clustering procedure for beta-diversity turnover" Ecography (2013), 36:1070-1075.