Farmers make a wide range of decisions regarding the management of their crops, involving pest management, planting/harvest dates, fertilization, irrigation, and, as we focus on in this study, crop rotation. These decisions are, along with external factors that fall outside farmers’ control, such as weather, likely to affect crop performance and yield substantially. A rigorous quantitative understanding of the factors, including farmer management decisions, that affect crop yield is an essential prerequisite for developing management strategies that maximize yield. There are several possible mechanisms by which the crops previously grown in a field can affect crop yield. First, different crops have different effects on the nutrient composition of the soil, so the identities of crops previously grown in a field can affect nutrient availability and crop yield. For example, nitrogen-limited crops can benefit from rotation with nitrogen-fixing legumes, and phosphorus nutrition in California cotton is shaped by whether or not the previous crop received phosphorus fertilizer. Second, certain crops may increase the local abundance of particular insect pests and pathogens. Since different crops are often susceptible and resistant to different pathogens and pests, the identities of the crops recently grown in a field can affect yield. For example, if one crop increases local abundances of an insect pest that also attacks a second crop, planting the second crop immediately following the first may lead to decreased yield resulting from attack from the built up local pest population. In contrast,raspberry container size such a yield depression could potentially be averted if the second crop were planted following a crop that does not lead to local accumulation of the pest.
In monocultures of wheat, substantial yield declines have been noted and attributed to the buildup of the soil-borne fungal pathogen Gaeumannomyces graminis. Third, many studies have shown that a field’s crop rotational history can strongly affect weed densities. Numerous other mechanistic explanations for the yield effects of crop rotation have also been suggested. Crop rotation has been practiced for thousands of years; evidence for its inception dates back to ancient Roman and Greeksocieties. Experimental studies on the effects of crop rotation first appeared in the early 20th century, revealing that growing crops in rotation led to increased crop yields of up to 100% compared to continuous planting of a single crop. Interest in the yield effects of crop rotation waned during the middle of the 20th century, due to the increasing availability of cheap fertilizers, insecticides, and herbicides. However, crop rotation continues to be a relevant and important practice; low-input farming remains desirable due to the costs of fertilizers and pesticides, and fertilizer and pesticide applications can often not fully compensate for the benefits afforded by crop rotation. In addition, the significant environmental and public health concerns surrounding fertilizer and pesticide use highlight the desirability of methods of increasing crop yield through alternative methods such as crop rotation. The effects of rotational histories on yield are well understood for some crops, such as corn, where rotation is recognized to be crticial in avoiding the buildup of corn root worms. However, for many crops, the direction, magnitude, and mechanism of the effect of crop rotational histories on crop yield remain poorly understood.
Cotton is one such crop. Experimental field studies of the effect of crop rotation on cotton yield have demonstrated increased cotton yield, compared to continuous cultivation of cotton, when cotton is grown in rotation with sorghum, corn, and wheat. Despite these useful results, only a small subset of possible rotations has been studied, experiments have been restricted to plots significantly smaller than typical commercial cotton fields, and mechanisms for these effects remain poorly understood. To help address these limitations, we seek to expand upon this work by exploring the effects of crop rotational histories on yield in commercial cotton fields in California, using an ‘‘ecoinformatics’’ approach capitalizing on existing observational data gathered by growers and professional agricultural pest consultants. In recent years, there has been a surge in research and interest involving the rapidly emerging field of ‘‘big data.’’ The big data movement has been fueled by several developments, including a dramatic increase in the magnitude of data generation, an improved ability to cheaply store, manipulate, and explore massive datasets, and the development of new analytic methods. Most importantly, the movement has been driven by a growing realization that existing data, and data generated as a byproduct of our everyday lives, can be leveraged to explore key questions about nature and human behavior, even if the data were not collected for this purpose. Ecoinformatics is a nascent field focused on harnessing the power of big data to address questions in environmental biology. Ecoinformatics approaches typically involve the analysis of large datasets, the synthesis of diverse data sources, and the analysis of pre-existing, observational datasets. In some commercial agricultural settings, farmers, along with hired consultants, collect a great deal of regular data about their fields that are used to guide real-time crop management decisions, such as the timing of pesticide applications.
By capitalizing on data that are already generated as a byproduct of commercial agriculture, ecoinformatics provides a low-cost means of obtaining a large dataset that can be used to explore key questions in agricultural biology, some of which might be too difficult or too costly to explore experimentally. Furthermore, the large size of datasets created for ecoinformatics can afford greater statistical power than could possibly be generated through experimental work. Experimentally studying the yield effects of crop rotational histories is challenging for several reasons. There are a plethora of possible rotational histories, which means that a large number of treatments would be required to explore the space of possible rotational histories thoroughly. Furthermore, experimentally studying effects of crop rotations requires experiments spanning several growing seasons, which may be logistically challenging. Finally, in order to maintain realism and applicability to commercial fields, which are typically quite large, sizeable experimental plots would be required, especially in light of research suggesting that landscape composition as far as 20 km from a focal field can affect the densities of agricultural pests in that field. While yield effects of non-mobile factors such as soil characteristics may be readily detected through small plot experimentation, the effects of highly mobile arthropods may only be detected at much larger spatial scales. An ecoinformatics approach offers attractive solutions to these challenges. Since we analyze a large preexisting dataset that includes over a thousand records,raspberry plant container a diversity of the possible crop rotational histories already exists in the dataset. In addition, our dataset spans 11 years of data, so the data span the temporal scale necessary to ask questions regarding effects of multi-year rotational histories. And, since the data come from the exact setting where we wish to apply our results, the data are realistic and capture the appropriate spatial scale of commercial agriculture. First, we sought to identify which crop rotational histories are associated with increased and decreased cotton yield, and to quantify these yield effects. We then explored possible explanations for the yield effects identified in the previous step by examining the associations between crop rotational histories and pest abundance.We employed a hierarchical Bayesian modeling approach, fitting linear mixed models to explore our questions about the effects of crop rotational histories on cotton yield. Mixed models combine the use of random effects and fixed effects, making them ideally suited for analysis of data that are structured, or clustered, in some known way, such that separate observations from within clusters are expected to be similar to one another. When we model a source of clustering using a random effect, we assume that each cluster-specific parameter was drawn from a common distribution, and we estimate the parameters of this distribution from the data. We use this common distribution as the prior when calculating the posterior distribution of each cluster-specific parameter. The parameters of the distribution of cluster-specific parameters have posteriors that are estimated from the data, typically after assuming uninformative priors for the hyperparameters. Using a common, empirical prior for all cluster-specific parameters allows pooling of information across clusters, so that data from all clusters can help inform estimates of every other per-cluster parameter. Assuming all clusters are the same introduces high bias and tends to underfit the data, whereas estimating fixed effects for each cluster introduces high variance and tends to overfit the data; however, using a random effect provides an optimal compromise between introducing bias and introducing variance.
In this dataset, there are several plausible sources of clustering. 1. First, we expect the data to be clustered by field, since there likely exist field-specific factors that affect yield, such as soil characteristics, local climate, and grower agronomic and pest management practices. We controlled for variable yield potential between fields by including field identity as a random effect in our models. Random effects allow pooling of information across clusters, so they are particularly useful when there are few observations from some clusters – a situation in which it is difficult to accurately estimate each percluster parameter with only the data from that one cluster. Since there are three or fewer records for 78% of the fields in our database, we feel that including field as a random effect was preferable to trying to estimate field-specific fixed effects with very few observations per field. Additionally, including field as a random effect provides a straightforward way to make predictions for fields not represented in our database. Since modeling field as a random effect involves sample a field-specific parameter from this distribution if we wish to make predictions about a previously unobserved field. Uncertainty in this field-specific parameter can be propagated by simulating many samples from this distribution, while simultaneously accounting for uncertainty in the parameters of this distribution. However, if we were to model field as a fixed effect, we would not estimate a distribution of field-specific parameters. We would only estimate parameters for the specific fields in our database, leaving us with no obvious way to make inferences about new fields. 2. Second, we expect that our data are clustered by year, since there is substantial between-year variability in climate, particularly in the winter and early spring. Climatic variables can affect crop performance, planting date, and insect pest populations, all of which can in turn affect cotton yield. To control for and quantify variation in yield due to year-specific factors, we included year as a random effect in our models. Our reasons for including year as a random effect are the same as those for field: there are few observations from some years, and we may wish to make predictions for future years not covered by the existing database. All models were fit using a No-U-Turn Sampler variant of Hamiltonian Markov Chain Monte Carlo implemented in Stan version 1.3.0, accessed through the rstan packing in R. We ran three chains from random initializations, each with 10,000 samples, and discarded the first 5,000 samples from each as burn-in. Inferences were based upon the remaining 15,000 samples. We checked convergence by making sure that ^ R, an estimate of the potential scale reduction of the posterior if sampling were to be infinitely continued, was near 1.To explore the yield effects of the crop grown in the same field the previous year, we fit a linear mixed model with yield as the response variable. The predictor variable of primary interest was the identity of the crop grown in that field the previous year, which was included as a fixed effect. Given that we are working with an observational dataset, a critical step in order to make meaningful inferences about the variable of primary interest – the crop grown the year before – was to control, to the extent possible, for potentially confounding variables that could generate spurious correlations and taint the validity of our inferences about crop rotation. To control for variable yield potential between fields and years, field and year were included in the model as random effects. The field terms control for the possibility that some fields may have higher yield potential due to their location, soil characteristics, or growing practices; the year terms control for the substantial year-to-year variation in cotton yield, which likely results from yearly weather differences. A term indicating cotton species was included in the model to account for yield differences between cotton species.