### GENERAL REPOSITORY ###

Title of repository: Data and code underlying the article "Forest Disturbance and Recovery in Peruvian Amazonia"

Creators: 
Requena Suarez, Daniela 	(1);	ORCID: 0000-0002-3081-6882
Rozendaal, Danaë 		(2,3);	ORCID: 0000-0002-3007-3222
De Sy, Niki 			(1); 	ORCID: 0000-0003-3647-7866
Decuyper, Mathieu 		(4,5);	ORCID: 0000-0002-1713-8562
Málaga, Natalia 		(1);	ORCID: 0000-0002-0545-4664
Durán Montesinos, Patricia	(6) 	
Arana Olivos, ALexs 		(6); 	ORCID: 0000-0002-1248-5194
De la Cruz Paiva, Ricardo 	(6)
Martius, Christopher 		(7); 	ORCID: 0000-0002-6884-0298
Herold, Martin 			(8); 	ORCID: 0000-0003-0246-6886

Institutions: 
1. Laboratory of Geo-Information Science and Remote Sensing, Wageningen University & Research, Wageningen, the Netherlands
2. Plant Production Systems Group, Wageningen University & Research, Wageningen, the Netherlands
3. Centre for Crop Systems Analysis, Wageningen University & Research, Wageningen, the Netherlands
4. Forest Ecology and Forest Management Group, Wageningen University & Research, Wageningen, the Netherlands
5. Centre for International Forestry Research and World Agroforestry (CIFOR-ICRAF), Nairobi, Kenya
6. Servicio Nacional Forestal y de Fauna Silvestre (SERFOR), Ministerio de Desarrollo Agrario y Riego (MIDAGRI), Lima, Peru
7. Center for International Forestry Research (CIFOR) Germany gGmbH, Bonn, Germany
8. Helmholtz Center Potsdam GFZ German Research Centre for Geosciences, Section 1.4 Remote Sensing and Geoinformatics, Potsdam, Germany

Related publication: 
Requena Suarez, D., Rozendaal, D., De Sy, N., Decuyper, M., Málaga, N., Durán Montesions, P., Arana Olivos, A., De la Cruz Paiva, R., Martius, C., Herold, M. (2023), Forest Disturbance and Recovery in Peruvian Amazonia. Global Change Biology.

Spatial coverage: 
Peruvian Amazonia, Peru. 
Dana
Temporal coverage: 
1984-2019

Details: 
The data and code in this repository can be used to reproduce the analysis Requena Suarez et al. (2023), "Forest Disturbance and Recovery in Peruvian Amazonia". Sensitive information such as original plot codes, plot locations and tree species data are unavailable in this repository. Spatial datasets used in this study are accessible from the sources cited in Table 1 of the main study. Estimation of disturbance and time since disturbance was done using the AVOCADO algorithm (Decuyper et al, 2022, https://doi.org/10.1016/j.rse.2021.112829), using Landsat imagery downloaded from Google Earth Engine. The underlying code for AVOCADO can be found in the following GitHub repository: https://github.com/MDecuy/AVOCADO, as well as a tutorial: https://www.pucv.cl/uuaa/labgrs/proyectos/avocado. 

Key words: 
tropical forests, disturbance intensity, time since disturbance, aboveground biomass (AGB), species richness, National Forest Inventory (NFI)


Funding: 
This research is part of CIFOR's Global Comparative Study on REDD+ (www.cifor.org/gcs). The funding partners that have supported this research include the Norwegian Agency for Development Cooperation (Norad), the Australian Department of Foreign Affairs and Trade (DFAT), the European Commission (EC), the International Climate Initiative (IKI) of the German Federal Ministry for the Environment, Nature Conservation and Nuclear Safety (BMU), the United Kingdom Department for International Development (UKAID), and the CGIAR Research Program on Forests, Trees and Agroforestry (CRP-FTA), with financial support from the donors contributing to the CGIAR Fund. 

 
### DATASETS ###

This repository contains the following data files: 
1. full_dataset.csv
2. dataset_AGB_dist.csv
3. dataset_relAGB_similarity_dist.csv
4. dataset_Srare_dist.csv
5. dataset_relSrare_dist.csv
6. dataset_deadtrees.csv

Description of files and variables:
The following datasets contains plot-level estimates of Aboveground Biomass (AGB), Necromass and Relative Species Richness in SERFOR subplots in Peruvian Amazonia. Using data on time since disturbance and disturbance intensity using the AVOCADO algorithm described in Decuyper et al (2022), a subset of "disturbed plots" was used to explore the effects of disturbance and environmental conditions and human use on AGB, relative AGB (AGB %r), species richness, relative species richness (%r) and relative recovery of similarity in species composition. These files contain the underlying data of this study, and are described below.

1. full_dataset.csv is the underlying file containing all 1840 plots included in this study, alongside the plot-level structural and diveristy estimates, as well as disturbance estimates derived using the AVOCADO algorithm. It contains the following columns: 

plot_ecozone			Forest types in which the plot is located, called "Ecozonas" by SERFOR (MINAGRI, 2016) (HI = Zona Hidromórfica / Tropical Wetlands; SA = Selva Alta Accessible / Accessible Montane Forests ; SD = Selva Baja de Difícil Acceso / Inaccessible Montane Forests ; SB = Selva Baja / Lowland Forests)
new_cod_plot			Anonymised plot cluster code
new_cod_subplot3		Anonymised plot code 
forest_category_2		Forest disturbance category, derived from AVOCADO
AGB_CWD_ha			Plot-level Aboveground Biomass (AGB); in Mg/ha
Basal_area_m2_ha		Plot-level Basal Area (BA), in m2/ha
time_since_disturbance		Years since detected disturbance, derived from AVOCADO; in years
Absolute_anomaly		NDMI Anomaly at the time of detected disturbance, derived from AVOCADO; no units
similarity_mean			Similarity in species composition towards undisturbed levels, with values ranging from 0 to 1; no units
Srare				Rarefied species richness up to 10 stems; number of species

2. dataset_AGB_dist.csv is the underlying dataset to assess the efects of disturbance, environmental conditions and human use on aboveground biomass (AGB) in disturbed forest plots. This dataset was scaled for 283 plots located in areas with detected disturbance. It contains the following columns: 

new_cod_plot			Anonymised plot cluster code	
new_cod_subplot3		Anonymised plot code 	
AGB_CWD_ha			Plot-level Aboveground Biomass (AGB); in Mg/ha	
CWD				Climatic Water Deficit (mm.yr-1); scaled	
slope				Slope (in degrees) ; scaled	
nitrogen_0.5cm_mean		Total nitrogen (N) soil content (g.kg-1); scaled	
tc.5000m			Surrounding tree cover in a 5k radius (%); scaled	
nrst_navigwtrway_road		Nearest navigable waterway or road (in km) ; scaled	
time_since_disturbance		Years since detected disturbance, derived from AVOCADO (yr); scaled	
time_since_disturbance_ln	Natural Logarithm of years since detected disturbance; scaled
Absolute_anomaly		NDMI Anomaly at the time of detected disturbance, derived from AVOCADO; scaled	
time_since_disturbance_unstd	Years since detected disturbance, derived from AVOCADO; in years
Absolute_anomaly_unstd		NDMI Anomaly at the time of detected disturbance, derived from AVOCADO; no units
forest_category_2		Forest disturbance category, derived from AVOCADO

3. dataset_relAGB_similarity_dist.csv is the underlying dataset used to assess the effects of disturbance, environmental conditions and human use on the recovery of aboveground biomass (AGB (%r)) and recovery in species composition (sp. composition (%r)) in disturbed forest plots. This dataset was scaled for 208 plots located in areas with detected disturbance and with at least one undisturbed plot within its cluster. It contains the following columns: 

new_cod_plot			Anonymised plot cluster code	
new_cod_subplot3		Anonymised plot code			
rel_AGB				Percent recovery of aboveground biomass towards values found in nearby undisturbed forests (AGB (%r)); in %	
adj_similarity_mean_prc		Adjusted similarity in species composition towards undisturbed levels, considering a mean similarity between undisturbed plots of 0.37; in %
CWD				Climatic Water Deficit (mm.yr-1); scaled	
slope				Slope (in degrees) ; scaled	
nitrogen_0.5cm_mean		Total nitrogen (N) soil content (g.kg-1); scaled	
tc.5000m			Surrounding tree cover in a 5k radius (%); scaled	
nrst_navigwtrway_road		Nearest navigable waterway or road (in km) ; scaled	
time_since_disturbance		Years since detected disturbance, derived from AVOCADO (yr); scaled	
time_since_disturbance_ln	Natural Logarithm of years since detected disturbance; scaled
Absolute_anomaly		NDMI Anomaly at the time of detected disturbance, derived from AVOCADO; scaled	
time_since_disturbance_unstd	Years since detected disturbance, derived from AVOCADO; in years
Absolute_anomaly_unstd		NDMI Anomaly at the time of detected disturbance, derived from AVOCADO; no units
forest_category_2		Forest disturbance category, derived from AVOCADO
undisturbed_plots		Number of undisturbed plots within the same cluster

4. dataset_Srare_dist.csv is the underlying dataset used to assess the effects of disturbance, environmental conditions and human use on tree species richness in disturbed forest plots. This dataset was scaled for 213 plots located in areas with detected disturbance and with at least 10 stems. It contains the following columns: 

new_cod_plot			Anonymised plot cluster code	
new_cod_subplot3		Anonymised plot code
Srare				Rarefied species richness up to 10 stems; number of species
CWD				Climatic Water Deficit (mm.yr-1); scaled	
slope				Slope (in degrees) ; scaled	
nitrogen_0.5cm_mean		Total nitrogen (N) soil content (g.kg-1); scaled	
tc.5000m			Surrounding tree cover in a 5k radius (%); scaled	
nrst_navigwtrway_road		Nearest navigable waterway or road (in km) ; scaled	
time_since_disturbance		Years since detected disturbance, derived from AVOCADO (yr); scaled	
time_since_disturbance_ln	Natural Logarithm of years since detected disturbance; scaled
Absolute_anomaly		NDMI Anomaly at the time of detected disturbance, derived from AVOCADO; scaled	
time_since_disturbance_unstd	Years since detected disturbance, derived from AVOCADO; in years
Absolute_anomaly_unstd		NDMI Anomaly at the time of detected disturbance, derived from AVOCADO; no units
forest_category_2		Forest disturbance category, derived from AVOCADO


5. dataset_relSrare_dist.csv is the underlying dataset used to assess the effects of disturbance, environmental conditions and human use on the recovery of tree species richness (sp. richness (%r)) in disturbed forest plots. This dataset was scaled for 165 plots located in areas with detected disturbance, with at least 10 stems, and with at least one undisturbed forest plot with at least 10 steams and within its plot cluster. It contains the following columns: 

new_cod_plot			Anonymised plot cluster code	
new_cod_subplot3		Anonymised plot code
rel_Srare			Percent recovery of species richness towards values found in nearby undisturbed forests (sp. richness (%r)); in %	
CWD				Climatic Water Deficit (mm.yr-1); scaled	
slope				Slope (in degrees) ; scaled	
nitrogen_0.5cm_mean		Total nitrogen (N) soil content (g.kg-1); scaled	
tc.5000m			Surrounding tree cover in a 5k radius (%); scaled	
nrst_navigwtrway_road		Nearest navigable waterway or road (in km) ; scaled	
time_since_disturbance		Years since detected disturbance, derived from AVOCADO (yr); scaled	
time_since_disturbance_ln	Natural Logarithm of years since detected disturbance; scaled
Absolute_anomaly		NDMI Anomaly at the time of detected disturbance, derived from AVOCADO; scaled	
time_since_disturbance_unstd	Years since detected disturbance, derived from AVOCADO; in years
Absolute_anomaly_unstd		NDMI Anomaly at the time of detected disturbance, derived from AVOCADO; no units
forest_category_2		Forest disturbance category, derived from AVOCADO
undisturbed_plots		Number of undisturbed plots within the same cluster and with at least 10 stems

6. dataset_deadtrees.csv is the underlying dataset used to explore predominance of dead trees and stumps across all plots. This dataset contains information on necromass for the 1840 plots included in this study. It contains the following columns: 

new_cod_plot			Anonymised plot cluster code	
new_cod_subplot3		Anonymised plot code
Necromass_Basal_area_m2_ha	Basal area of dead trees and stumps; in m2.ha-1
Necromass_Basal_area_m2_ha_stumps	Basal area of stumps; in m2.ha-1
deadtrees_total			Total number of dead trees and stumps	
deadtrees_standing		Total number of standing dead trees
deadtrees_stump			Total number of stumps
tree_n				Total number of live trees
Basal_area_m2_ha		Plot-level Basal Area (BA); in m2/ha
forest_category_2		Forest disturbance category, derived from AVOCADO
perc_dead_trees_n		Relative number of dead trees and stumps in relation to the total number of dead and live trees and stumps; in %
perc_dead_trees_BA		Relative basal area of dead trees and stumps in relation to the total basal area of dead and live trees and stumps; in %
	
References: 
MINAGRI (2016). "Memoria Descriptiva del Mapa de Ecozonas, Inventario Nacional Forestal y de Fauna Silvestre (INFFS)-Peru". Tech. rep. SERFOR, 32.

### CODE ###

Statistical analyses were carried out using R version 4.1.0 (R Core Team 2021-05-18). This repository contains the following code: 

1. Peru_NFI_finalanalysis_UR2_Fig6_Fig7_Fig8.R 
2. Peru_NFI_Manuscript_Figure4.R
3. Peru_NFI_Manuscript_Figure5.R
4. Reply to reviewers.R
5. Peru_NFI_AGB_Supplementary_deadtreesandstumps_UR2.R - Analysis on dead trees and stumps. 

Description of files:

1. Peru_NFI_finalanalysis_UR2_Fig6_Fig7_Fig8.R	
Will read all csv files except for dataset_necromass.csv to create all mixed-effects models used in this study and perform the standardise coefficient analysis. Furthermore, it will predict AGB change rates as well as range values of AGB, AGB(%r), species richness, species richness (%r) and recovery in species composition towards undistrubed values reported in this study. Finally, it will also summarise the results as shown in Figures 6, 7 and 8 in the main study. The following packages (incl. dependencies) will need to be installed: plyr, dplyr, lme4, lmerTest, Hmisc, corrplot, performance, sjPlot, sjlabelled, sjmisc, ggplot2, grid. 

2. Peru_NFI_Manuscript_Figure4.R 
Will read full_dataset.csv to create Figure 4b and Figure 4d, which visually explores the distribution of time since disturbance and disturbance intensity across all plots located in areas of detected disturbance. The following packages (incl. dependencies) will need to be installed: plyr, dplyr, ggplot2. 

3. Peru_NFI_Manuscript_Figure5.R 
Will read all csv files except for dataset_necromass.csv to create Figure 5, which visually shows the distribution of aboveground biomass (AGB), its recovery (AGB (%r)), species richness, its recovery (species richness (%r)) and the recovery of species composition (species composition (%r)) for all plots located in disturbed and undisturbed forests. The following packages (incl. dependencies) will need to be installed: plyr, dplyr, ggplot2. The following packages (incl. dependencies) will need to be installed: plyr, dplyr, ggplot2, ggpubr, ggsignif, grid, gridExtra. 

4. Reply to reviewers.R 
Will read full_dataset.csv, dataset_AGB_dist.csv and dataset_relAGB_similarity_dist.csv to perform extra analysis made during the revision process, including testing collinearity b/w AGB and CWD, testing for differences b/w disturbed and undisturbed plots, and testing for similarities in working with aboeground biomass (AGB) and basal area (BA). The following packages (incl. dependencies) will need to be installed: corrplot, plyr, dplyr, Hmisc.
 
5. Peru_NFI_AGB_Supplementary_deadtreesandstumps_UR2.R 
Will read dataset_deadtrees.csv to create Table S1 and Figure S1 in Supporting information, and testing for differences b/w disturbed and undisturbed plots in relation to the predominance of dead trees and stumps. The following packages (incl. dependencies) will need to be installed: plyr, dplyr, lme4, ggplot2, sjPlot.


 