The clearest distinction in this figure appeared to be whether the growing sites are for-profit or nonprofit

For instance, 75 percent of the articles returned in our Web of Science search were published after 2009 and 18 percent were published in 2017-18. A more recent search of these terms in April of 2018 returns 1622 records revealing a continued growth in literature on urban agriculture. Of these records, journal articles dominate . Other records include book reviews, article reviews, proceedings papers, and meeting abstracts. The main contributing journals included Land Use Policy , Landscape & Urban Planning , Agriculture & Human Values , Sustainability , and Local Environment ; however, the sources were quite diverse. Each record represents a single document and together they form the corpus used to build the reference topic model. Prior to processing, we removed any stop words, punctuation, and URLs. We performed LDA topic modelling using the MALLET program . We produced various topic models using three granularities , and used the models with the greatest log likelihood . We then examined their topic composition and removed topics dominated by non-meaning-bearing terms including time and location indicators and general publication information . These topics were identified using the alpha hyperparameter, where relatively high values indicated that the topic was common throughout the corpus and therefore not meaningful for examining differences within our sample. After these adjustments, we determined that the 25-topic model was ideal for analysis using personal expert knowledge on urban agriculture literature. The reference topic model was created in order to perform inference on content produced by urban agriculture growing sites and regional organizations in San Diego County – in other words, to interpret the content produced by the key actors identified above . We created a corpus including all textual content from the websites of agencies in our sample,plant pot with drainage with content from each of the 48 observations contained in a single document in the corpus.

Textual content included any written descriptions on the website including history, mission and vision statements, program descriptions, excluding locations, contact, and event info. For growing sites associated with larger organizations or institutions, we also collected basic descriptive content from the parent website. By applying the reference model to all the documents, each document is characterized in terms of topic composition, allowing comparisons among documents . The output of the inferencing process is a document-topic distribution matrix, from which we computed a matrix of cosine similarities among documents. In order to visualize these similarities, we used a dimensionality reduction technique known as multidimensional scaling . In the resulting output, each document is described as a 2-D point in Cartesian coordinates, where proximity relates to similarity. The resulting discursive map displayed the inferenced website corpus, with each point representing a single growing site or organization. The location of each point relates to its particular topic composition. The distance between points is indicative of their discursive similarity – the closer two points are in the discursive map, the more similar their topic composition; the farther apart, the more dissimilar. We investigated this map, but also created a series of variations, altering the symbology of the discursive map to reflect particular features of the sites. This allowed us to examine the connections between characteristics like growing methods and topic composition. We were also interested in discovering clusters among the data points, and so we utilized k-means clustering to identify meaningful groups in our data . K-means is a heuristic algorithm that attempts to partition aninput dataset into k groups, allowing researchers to explore clusters within a dataset. Our data seemed to occupy primarily three quadrants in the discursive map, and so we chose to identify three classes. This algorithm was run for 1,000 iterations and the results with the lowest sum of squared errors – a metric that explains the difference between each observation and its corresponding k-means centroid – were chosen as representative. This analysis complemented our visual analysis of symbology patterns. The growing methods symbology illustrating the practices used by growing sites revealed a distinct, but blurry pattern between motivation and practice.

When analyzing the map using this symbology, a general pattern emerged in which technologically-advanced sites tended to group in the top-left quadrant of the map with two outliers: Go Green Agriculture and Archi’s Acres. The absence of innovation in these outliers’ top-three loadings suggested that other topics precede technology in how these growing sites describe themselves despite their use of advanced technologies. Generally, soil-based sites occupied the right side of the discursive map; however, soil-based farms such as Suzie’s Farm, Good Taste Farm, and Point Loma Farms were grouped in with the soilless sites. Growing site and organization descriptions of their processes did drive their location on the discursive map. For instance, the soilless sites often described the inventive and underrepresented practices they use to grow produce in the urban environment. However, the content did not end there. Other topics like social movements, climate change, and food access were also present among these sites. We saw a similar trend with sites using a community gardening model. When we explored the entire topic loadings of growing sites and organizations, ignoring practice-based topics like innovation and community gardening topics, we saw that the clusters have far more similarities than differences. Interestingly, these soilless sites are typically affiliated with businesses as opposed to nonprofits which dominate the right side of the map, where most soil-based sites are located . Indeed, we expected that business and nonprofit website content would vary and these results provide evidence to that effect. San Diego Food System Alliance, the leading regional nonprofit organization, is located in the center of the map. This location is not surprising in the context of neoliberal governance in which cities and regional organizations are more focused on building consensus and supporting apolitical agendas, rather than taking on political causes .The affiliation symbology illustrating the relationship between institutional affiliation and content was less coherent than the other symbologies displayed in previous figures, but still offered important insights. Growing sites were affiliated with a variety of institutions including schools, churches, organizations hosting training and educational programs, and for-profit businesses.

Education sites were located throughout the map suggesting that training and skill-building are not major dividing factors in discourse. In other words, many different types of organizations claim to focus on education. However, church, community, and school gardens tended to concentrate in the top-right section of the map, which is typically associated with soil-based community gardens. However, it cannot be assumed that the for-profit sites lack social mission. For example, Archi’s Acres, a for-profit hydroponic farm in Escondido, includes a social enterprise function focusing on training veterans in hydroponic farming. Sundial Farms, a veteran- and immigrant-owned, hydroponic farm in the Innovation cluster, is a direct result of this program. This social function features prominently in its website content: “At Archi’s, we believe a key aspect of successful business is how it meets its responsibility to the community in which it operates and the customers which make up its marketplace. We do this by integrating into our business model an opportunity to support others including our military service members and veterans.” This broader social mission may explain its topic loadings and the absence of innovation as a primary topic. The overall uniqueness of this growing site may explain its peripheral location in the discursive map. Solutions Farms, an aquaponic operation associated with Solutions for Change,growing blueberries in pots was the only nonprofit located in the for-profit dominated section of the map. The organization aims to alleviate family homelessness in the county through skill development, including training in aquaponic farming. However, innovation is the primary topic in their content, influencing their location among other sites whose discourse is focused on innovation.Multivariate clustering was performed on the discursive map to identify clusters in the sites and group them accordingly. Figure 8 contains the k-means results including three classes . Transitional sites were identified by creating a 4-class result . The topic compositions of sites in each cluster were examined and the clusters were given descriptive names reflecting their dominant topics : Innovation, Community, and Access. The transitional sites – those that broke off into their own group in the 4-class result – were signified using an overlaid line pattern. These sites were close to or straddled the center axes of the map.The Innovation cluster was distinct from the other clusters. The predominant topic amongst this group was innovation, which includes words and phrases like rooftop farming, zero-acreage farming, soilless, aquaponics, buildings, hydroponic, vertical, greenhouses, indoor, and technology as well as production, yield, growth, and quality. Unsurprisingly, all of the technologically-advanced sites resided in this cluster with the exception of Valley View Farms, which experiments with hydroponics, but focuses primarily on animal farming. Among the topic loadings in this group were community gardening, food access, social movements, climate change, water management, food production, and food security. This cluster also consisted primarily of for-profit growing sites with the exception of Roger’s Community Garden located on the University of California, San Diego campus. An interesting outlier is Go Green Agriculture, a hydroponic farm, which is located on the border of the Community cluster. This location is likely driven by its top topics, which include community gardening, location, and climate change, which are well-represented in both the Innovation and Community cluster. The Community cluster emphasized connections with local residents, primarily promoting home and community gardening – community gardening was the most prevalent topic in this cluster.

Although, this cluster overlaped considerably with the Access cluster, there was a clear emphasis on environmental topics including ecosystem conservation, water management, location, water contamination, innovation, and climate change. The social movement topic was also prevalent throughout this cluster with many of its sites expressing a dedication to alternative forms of organization. It is also worth noting that the socio-economic characteristics of the two neighborhoods are also quite different. Southeastern San Diego, specifically zip-code 92102 where Mt. Hope Community Garden is located, is a primarily Hispanic community, followed by White , African American , and Asian . The median income is at $42,464 with only 24 percent of the population exceeding $75,000 annually . The sites and organizations in this cluster also placed considerably less emphasis on environmental topics in favor of more social topics including public health, food production, and urban greening. Still, topics like ecology and climate change were present suggesting that environmental and social concerns were not mutually exclusive. The sites in the Access cluster were also predominantly affiliated with educational and training programs. Two particularly interesting examples are UrbanLife Farms and Second Chance Youth Garden. Both growing sites are wings of social justice organizations that offer job training and skills development for youth living in City Heights and Southeastern San Diego – communities that have seen considerable disinvestment and suffer from high unemployment . Other growing sites like Rolling Hills Grammar School and Literacy Garden and Olivewood Gardens and Learning Center also focus on youth programming. Not all the growing sites in this cluster work with youth. New Roots Farm concentrates on providing resettled refugees with land for farming, small-business training, and nutrition education to help them adjust to a new life away from their home country. This mission guided its topic loading of food security, social movements, and food access. The five urban agriculture supporting organizations we surveyed spanned the Community and Access clusters. Slow Food San Diego, Slow Food Urban San Diego, San Diego Roots Sustainable Food Project , and San Diego Community Garden Network are located in the Community cluster. San Diego Food System Alliance was located at the border between the Community and Access clusters suggesting that food access was a more prominent topic for the organization. Further, its central position illustrated the consensus focus of the organization, which caters to a diverse group of actors including politicians, businesses, and nonprofit organizations. Overall, the placement of the organizations made sense as they are nonprofit facilitators for other sites aimed at broader social goals like increasing food access and building community.

Hypatia automatically deploys clustering experiments to account for all of these challenges

Local tasks fetch from the file system or co-located database, remote tasks fetch their data over the network between EC and PC. Hypatia estimates the input/output dataset transfer time to the remote in two ways. We use the first when Hypatia has no history on the job type in the database, i.e., when the job is run for the first time. knowledge of job’s tasks is available, Hypatia uses the job’s input and output dataset sizes and the number of concurrent connections, to estimate transfer time using the iPerf lookup table with values representing time in seconds needed to transfer data from the edge to remote cloud for a dataset of a size given in kilobytes and a number of concurrent connections. The lookup table provides fast access but may introduce error because the data in the table is a “snapshot” in time. Table 5.1 shows a snapshot of a Hypatia lookup table. Files of different sizes, listed on the left, are sent over the network with a variable number of concurrent connections, listed across the top. The data is produced using iPerf to measure the network performance between the EC and PC , when Hypatia is first started. Each of the numbers in the table is an average of over 10 profiling runs. For job types that Hypatia has seen , Hypatia uses the average time measured across past tasks of the same type to estimate the transfer time. This computation takes longer than a simple table lookup but uses recent history to makes predictions . Hypatia launches a set of virtual machine instance types to the EC and PC,plant pot with drainage the number of each is specified by the user in the job description. Instance type names map to the amount of memory and compute resources that each provides. We deploy one Hypatia worker to each processor. Thus, we view each compute resource in terms of the number of workers it can support. The Hypatia queue is placed on the local instance and has two queues: local and remote. Based on the load split ratio, Hypatia places tasks in the local or remote queues. Workers then pull tasks from their assigned queue for execution on a first-come-first-served basis.

Given a Job J with n tasks and D dataset, di is the amount of data that must be transferred if the job is to use the PC. The scheduling plan uses this value, the network bandwidth between the EC and PC , the number of concurrent connections required to saturate the link, the number of available workers in each cloud, and the current state of the queues. The scheduler computes the ratio of the number of tasks that will execute locally and remotely such that time to completion for the job is minimized. Hypatia estimates the time to completion for EC tasks as the average time to complete past tasks of a similar type , and the state of the EC queue, which is expressed as the lag caused by unfinished tasks in the EC queue . The estimate for PC tasks includes a time estimate for data transfer. In addition, Hypatia uses the lag in the PC queue instead of the EC queue for this estimate.To evaluate the efficacy of the Hypatia scheduler, we execute multiple workloads across multi-tier deployments and measure the time to completion per job. For the experiments, we consider one edge instance and two sets of public cloud instances. On the edge, we use an m3.xlarge instance with 4CPUs and 2GiB memory. In the public cloud, our first instance set consists of a single “free tier” t2.medium instance type with 2 CPUs and 4GiB of memory. Our second public cloud set consists of an m5ad.xlarge instance type with 4 CPUs and 16GiB of memory. Our experiments consider deployments with 2, 4, 8, and 12 CPUs in the public cloud. Our workloads consist of two machine learning applications that we developed in the previous chapters of this dissertation: linear regression and k-means clustering. For each experiment, we evaluate the performance of the mixed load against executing all of the jobs on the edge cloud or all of the jobs in the public cloud . We refer to Hypatia deployments as mixed because Hypatia will schedule job tasks on both the edge cloud and the public cloud when doing so reduces time to completion.Since our jobs consist of tasks with different parameters, their time to transfer data and to perform the computation differs across tasks. To empirically evaluate this effect, for each set of machine learning models, we use two sets of jobs: uniform containing tasks having the same parameters, and variable containing tasks having different parameters . The statistics for each machine learning algorithm, for the variable or uniform sets, for EC and PC workers are listed in Table 5.3.

We present mean and standard deviation for the time it takes to transfer the data needed for the task and to process the task .The first experiment uses jobs that have 900 tasks, where each task is given a set of parameters that it then uses to compute the coefficients of a linear regression model based on two time series. The length of the time series defines the dataset size, and thus the computation time per task. Jobs are either uniform or variable task sets as defined above. The uniform linear regression tasks take as input a one month time series of 5-minute interval measurements . The variable job tasks take as input time series that vary between one day and one month worth of 5-minute measurements. For this experiment, the EC has 4 workers, and the PC has 2, 4, or 6 workers. Figure 5.3 shows the average time in seconds it took to complete a job with 900 uniform tasks using 4 EC workers and either 2 , 4 , or 6 PC workers , on average across 5 jobs. For each set of results, the first three bars show the average time in seconds that it takes to complete a task using mixed , local , and remote deployments, respectively. The second three bars per set show the average time in seconds that Hypatia estimates that each deployment should have taken, which we discuss later in this chapter. In every case, the mixed Hypatia workload performs best for time to completion for the workload. For uniform jobs, the mixed load using 2 remote workers finishes in 176 seconds on average, while EC-only takes 207s and PC-only takes 1160s on average, respectively. With 6 PC workers, the mixed workload takes 144s, EC-only takes 199s, and PC-only takes 387s on average. The results also show that as the number of PC workers increases, Hypatia is able to accurately split jobs between the two resource sets and time to completion decreases: 176s for PC2, 160s for PC4, and 144s for PC6 on average. Data for linear regression experiments is stored in the database running on a separate instance within the local edge cloud . This makes data transfer from an EC worker much faster than the transfer from the PC worker. The average data transfer time for EC workers is 0.85s and for PC workers is 2.42s. Processing time is 0.03s for EC and 0.13s for PC workers on average. For the variable jobs in Figure 5.3 ,black plastic planting pots the mixed workload takes an average of 102s with 2 remote workers, 91s with 4 remote workers, and 80s with 6 remote workers. Even though we expect EC workload to be the same across all 3 experiments in this figure since local workers don’t change, we still see some variation with workloads taking on average 110s in PC2, 105s in PC4 and 105s in PC6. The total time of the workload is significantly less than for uniform workloads because many jobs employ much smaller input data sets sizes.

The average data transfer time is 0.47s for EC and 1.6s for PC workers, while computation takes 0.02s for EC and 0.12s for PC workers, respectively. We see that for this particular job type, the PC workers take more time to fetch and process tasks. To investigate this further, we next change the instance type from the free tier t2.medium to use m5ad.xlarge instead – which we note is 4.4 times more expensive monetarily. Figure 5.4 shows results for the linear regression application with uniform jobs and with remote instances having 4, 8, and 12 workers, respectively. Only using 12 PC workers, can we observe remote experiments outperforming EC-only workloads. Similarly, with the increased number of workers the scheduler picked a correct split to minimize the time to completion, which was 158s, 122s, 100s, for 4, 8 and 12 PC workers, respectively. Local only experiments took on average 195s, while remote took 566s, 296s, and 187s respectively.The mean computation time of the uniform workload is 0.19s for EC and 0.31s for PC workers, with a standard deviations of 0.04s and 0.05s respectively. For the variable workload, the mean computation time for EC workers was 2.86s with a standard deviation of 5.09s, while PC workers have a mean of 4.36s and a standard deviation of 8.13s. Like for the linear regression application, Hypatia is able to deploy the Kmeans workload to achieve the shortest time to completion on average. As the number of PC workers increases, Hypatia adapts to employ the extra computational power by using the high-overhead communication link sparingly. The average time to completion is 306s, 238s, and 198s for mixed variable workload for 4, 8, and 12 PC workers, respectively. For EC-only variable workload, we see 423s, 430s, and 424s for three different repeats of the same experiment. PC-only workloads took 778s with PC4, 423s with PC8, and 316s with PC12. For uniform mixed workloads, the average time to completion is 69s, 56s, and 43s for 4, 8, and 12 PC workers, respectively. EC-only workload took 84s on average while PC-only workloads took 250s, 121s, and 82s on average.With more remote workers we observe that PC-only outperforms EC-only.

These results reflect the importance of considering both data transfer and computation time when deciding when to use remote, public/private cloud resources in multi-tier settings. Moreover, they show that Hypatia is able to adapt to different resource configurations to achieve the best time to completion for different machine learning applications automatically.With this dissertation, we present Hypatia – a scalable system for distributed, data-driven IoT applications. Hypatia ingresses data from disparate sensors and systems , and provides a wide range of analytics, visualization, and recommendation services with which to process this data and extract actionable insights. With a few examples of commonly used machine learning algorithms, like clustering and regression, we provide abstractions that make it easy to plug in different algorithms that are of interest to agronomists and other specialists who work with datasets that can benefit from their locality. Hypatia integrates an intelligent scheduler that automatically splits analytics applications and workloads across the edge, private, and public cloud systems to minimize the time to completion, while accounting for the cost of data transfer and remote computation. We use Hypatia to investigate K-means clustering consider different methods for computing correlation, using large numbers of trials , and cluster degeneracy.It then scores the clustering results using Bayesian Information Criterion to provide users with recommendations as to the “best” clustering. We validate the system using synthesized data sets with known clusters for validation, and then use it to analyze and scale measurements of electrical conductivity of soil from a large number of farms. We compare our approach to the state of the art in clustering for EC data and show that our work significantly outperforms it. We also show that the system is easy to use by experts and novices and provides a wide range of visualization options for analysts. We next extend the system with support for data ingress from sensors and develop a new approach for “virtualizing” sensors to extend their capability. Specifically, we show that it is possible to estimate outdoor temperature accurately from the processor temperature of simple, low-cost, single-board computers .