Hypatia automatically deploys clustering experiments to account for all of these challenges

Local tasks fetch from the file system or co-located database, remote tasks fetch their data over the network between EC and PC. Hypatia estimates the input/output dataset transfer time to the remote in two ways. We use the first when Hypatia has no history on the job type in the database, i.e., when the job is run for the first time. knowledge of job’s tasks is available, Hypatia uses the job’s input and output dataset sizes and the number of concurrent connections, to estimate transfer time using the iPerf lookup table with values representing time in seconds needed to transfer data from the edge to remote cloud for a dataset of a size given in kilobytes and a number of concurrent connections. The lookup table provides fast access but may introduce error because the data in the table is a “snapshot” in time. Table 5.1 shows a snapshot of a Hypatia lookup table. Files of different sizes, listed on the left, are sent over the network with a variable number of concurrent connections, listed across the top. The data is produced using iPerf to measure the network performance between the EC and PC , when Hypatia is first started. Each of the numbers in the table is an average of over 10 profiling runs. For job types that Hypatia has seen , Hypatia uses the average time measured across past tasks of the same type to estimate the transfer time. This computation takes longer than a simple table lookup but uses recent history to makes predictions . Hypatia launches a set of virtual machine instance types to the EC and PC,plant pot with drainage the number of each is specified by the user in the job description. Instance type names map to the amount of memory and compute resources that each provides. We deploy one Hypatia worker to each processor. Thus, we view each compute resource in terms of the number of workers it can support. The Hypatia queue is placed on the local instance and has two queues: local and remote. Based on the load split ratio, Hypatia places tasks in the local or remote queues. Workers then pull tasks from their assigned queue for execution on a first-come-first-served basis.

Given a Job J with n tasks and D dataset, di is the amount of data that must be transferred if the job is to use the PC. The scheduling plan uses this value, the network bandwidth between the EC and PC , the number of concurrent connections required to saturate the link, the number of available workers in each cloud, and the current state of the queues. The scheduler computes the ratio of the number of tasks that will execute locally and remotely such that time to completion for the job is minimized. Hypatia estimates the time to completion for EC tasks as the average time to complete past tasks of a similar type , and the state of the EC queue, which is expressed as the lag caused by unfinished tasks in the EC queue . The estimate for PC tasks includes a time estimate for data transfer. In addition, Hypatia uses the lag in the PC queue instead of the EC queue for this estimate.To evaluate the efficacy of the Hypatia scheduler, we execute multiple workloads across multi-tier deployments and measure the time to completion per job. For the experiments, we consider one edge instance and two sets of public cloud instances. On the edge, we use an m3.xlarge instance with 4CPUs and 2GiB memory. In the public cloud, our first instance set consists of a single “free tier” t2.medium instance type with 2 CPUs and 4GiB of memory. Our second public cloud set consists of an m5ad.xlarge instance type with 4 CPUs and 16GiB of memory. Our experiments consider deployments with 2, 4, 8, and 12 CPUs in the public cloud. Our workloads consist of two machine learning applications that we developed in the previous chapters of this dissertation: linear regression and k-means clustering. For each experiment, we evaluate the performance of the mixed load against executing all of the jobs on the edge cloud or all of the jobs in the public cloud . We refer to Hypatia deployments as mixed because Hypatia will schedule job tasks on both the edge cloud and the public cloud when doing so reduces time to completion.Since our jobs consist of tasks with different parameters, their time to transfer data and to perform the computation differs across tasks. To empirically evaluate this effect, for each set of machine learning models, we use two sets of jobs: uniform containing tasks having the same parameters, and variable containing tasks having different parameters . The statistics for each machine learning algorithm, for the variable or uniform sets, for EC and PC workers are listed in Table 5.3.

We present mean and standard deviation for the time it takes to transfer the data needed for the task and to process the task .The first experiment uses jobs that have 900 tasks, where each task is given a set of parameters that it then uses to compute the coefficients of a linear regression model based on two time series. The length of the time series defines the dataset size, and thus the computation time per task. Jobs are either uniform or variable task sets as defined above. The uniform linear regression tasks take as input a one month time series of 5-minute interval measurements . The variable job tasks take as input time series that vary between one day and one month worth of 5-minute measurements. For this experiment, the EC has 4 workers, and the PC has 2, 4, or 6 workers. Figure 5.3 shows the average time in seconds it took to complete a job with 900 uniform tasks using 4 EC workers and either 2 , 4 , or 6 PC workers , on average across 5 jobs. For each set of results, the first three bars show the average time in seconds that it takes to complete a task using mixed , local , and remote deployments, respectively. The second three bars per set show the average time in seconds that Hypatia estimates that each deployment should have taken, which we discuss later in this chapter. In every case, the mixed Hypatia workload performs best for time to completion for the workload. For uniform jobs, the mixed load using 2 remote workers finishes in 176 seconds on average, while EC-only takes 207s and PC-only takes 1160s on average, respectively. With 6 PC workers, the mixed workload takes 144s, EC-only takes 199s, and PC-only takes 387s on average. The results also show that as the number of PC workers increases, Hypatia is able to accurately split jobs between the two resource sets and time to completion decreases: 176s for PC2, 160s for PC4, and 144s for PC6 on average. Data for linear regression experiments is stored in the database running on a separate instance within the local edge cloud . This makes data transfer from an EC worker much faster than the transfer from the PC worker. The average data transfer time for EC workers is 0.85s and for PC workers is 2.42s. Processing time is 0.03s for EC and 0.13s for PC workers on average. For the variable jobs in Figure 5.3 ,black plastic planting pots the mixed workload takes an average of 102s with 2 remote workers, 91s with 4 remote workers, and 80s with 6 remote workers. Even though we expect EC workload to be the same across all 3 experiments in this figure since local workers don’t change, we still see some variation with workloads taking on average 110s in PC2, 105s in PC4 and 105s in PC6. The total time of the workload is significantly less than for uniform workloads because many jobs employ much smaller input data sets sizes.

The average data transfer time is 0.47s for EC and 1.6s for PC workers, while computation takes 0.02s for EC and 0.12s for PC workers, respectively. We see that for this particular job type, the PC workers take more time to fetch and process tasks. To investigate this further, we next change the instance type from the free tier t2.medium to use m5ad.xlarge instead – which we note is 4.4 times more expensive monetarily. Figure 5.4 shows results for the linear regression application with uniform jobs and with remote instances having 4, 8, and 12 workers, respectively. Only using 12 PC workers, can we observe remote experiments outperforming EC-only workloads. Similarly, with the increased number of workers the scheduler picked a correct split to minimize the time to completion, which was 158s, 122s, 100s, for 4, 8 and 12 PC workers, respectively. Local only experiments took on average 195s, while remote took 566s, 296s, and 187s respectively.The mean computation time of the uniform workload is 0.19s for EC and 0.31s for PC workers, with a standard deviations of 0.04s and 0.05s respectively. For the variable workload, the mean computation time for EC workers was 2.86s with a standard deviation of 5.09s, while PC workers have a mean of 4.36s and a standard deviation of 8.13s. Like for the linear regression application, Hypatia is able to deploy the Kmeans workload to achieve the shortest time to completion on average. As the number of PC workers increases, Hypatia adapts to employ the extra computational power by using the high-overhead communication link sparingly. The average time to completion is 306s, 238s, and 198s for mixed variable workload for 4, 8, and 12 PC workers, respectively. For EC-only variable workload, we see 423s, 430s, and 424s for three different repeats of the same experiment. PC-only workloads took 778s with PC4, 423s with PC8, and 316s with PC12. For uniform mixed workloads, the average time to completion is 69s, 56s, and 43s for 4, 8, and 12 PC workers, respectively. EC-only workload took 84s on average while PC-only workloads took 250s, 121s, and 82s on average.With more remote workers we observe that PC-only outperforms EC-only.

These results reflect the importance of considering both data transfer and computation time when deciding when to use remote, public/private cloud resources in multi-tier settings. Moreover, they show that Hypatia is able to adapt to different resource configurations to achieve the best time to completion for different machine learning applications automatically.With this dissertation, we present Hypatia – a scalable system for distributed, data-driven IoT applications. Hypatia ingresses data from disparate sensors and systems , and provides a wide range of analytics, visualization, and recommendation services with which to process this data and extract actionable insights. With a few examples of commonly used machine learning algorithms, like clustering and regression, we provide abstractions that make it easy to plug in different algorithms that are of interest to agronomists and other specialists who work with datasets that can benefit from their locality. Hypatia integrates an intelligent scheduler that automatically splits analytics applications and workloads across the edge, private, and public cloud systems to minimize the time to completion, while accounting for the cost of data transfer and remote computation. We use Hypatia to investigate K-means clustering consider different methods for computing correlation, using large numbers of trials , and cluster degeneracy.It then scores the clustering results using Bayesian Information Criterion to provide users with recommendations as to the “best” clustering. We validate the system using synthesized data sets with known clusters for validation, and then use it to analyze and scale measurements of electrical conductivity of soil from a large number of farms. We compare our approach to the state of the art in clustering for EC data and show that our work significantly outperforms it. We also show that the system is easy to use by experts and novices and provides a wide range of visualization options for analysts. We next extend the system with support for data ingress from sensors and develop a new approach for “virtualizing” sensors to extend their capability. Specifically, we show that it is possible to estimate outdoor temperature accurately from the processor temperature of simple, low-cost, single-board computers .