Additionally, bio-markers of proliferation and cell health such as Pax7, MyoD, and Myogenin may be measured to improve the robustness of predictions and correlations across assays. None of these metrics will aid in optimization if a sufficient model of the relationship between cell growth, media cost, and overall process cost is not considered. Therefore, a techno-economic model of the process is needed to tie together the large-scale production process to bench-top measurements. Secondly, further “white-box” studies that focus on the meta bolomics of the cell lines would be very useful in defining the upper / lower bounds and important factors of these DOE studies. Developing robust cell lines adapted to serum-free conditions would open up the design space for use in DOE studies because very poorly growing cells are difficult to optimize in DOE studies. In general, white-box or traditional studies act to constrain the complexity of future DOE studies, so must be conducted in collaboration with DOE. Experimental optimization of physical and biological processes is a difficult task. To address this, sequential surrogate models combined with search algorithms have been employed to solve nonlinear high-dimensional design problems with expensive objective function evaluations. In this article , a hybrid surrogate framework was built to learn the optimal parameters of a diverse set of simulated design problems meant to represent real-world physical and biological processes in both dimensionality and nonlinearity. The framework uses a hybrid radial basis function/genetic algorithm with dynamic coordinate search response, utilizing the strengths of both algorithms. The new hybrid method performs at least as well as its constituent algorithms in 19 of 20 high-dimensional test functions,plastic pots for planting making it a very practical surrogate framework for a wide variety of optimization design problems.
Experiments also show that the hybrid framework can be improved even more when optimizing processes with simulated noise.The design and optimization of modern engineering systems often requires the use of high-fidelity simulations and/or field experiments. These black box systems often have nonlinear responses, high dimensionality, and have many local optima. This makes these systems costly and time consuming to model, understand, and optimize when simulations take hours or experiments performed in the lab require extensive time and resources. The first attempt to improve over experimental optimization methods, such as ‘one-factor-at-atime’ and random experiments was through the field of Design of Experiments . Techniques in DOE have been adapted to many computational and experimental fields in order to reduce the number of samples needed for optimization. These methods often involve performing experiments or simulations at the vertices of the design space hypercube. Full-Factorial Designs are arguably the simplest to implement, where data is collected at all potential combinations of parameters p for all levels l requiring l p samples in total. Even when l = 2 the number of experiments or simulations quickly becomes infeasible so Fractional-Factorial Designs using l p−k experiments for k ‘generators’ are often used to reduce the burden. While such designs are more efficient, they have lower resolution than full designs and confound potentially important interaction effects. Therefore, DOE techniques are often combined with Response Surface Methodology to iteratively move the sampling location, improve model fidelity as more data is collected, and focus experiments in regions of interest. Stochastic optimization methods such as Genetic Algorithms , Particle Swarm Optimization, and Differential Evolution have also been used to explore design spaces and perform optimization on both simulated and experimental data, often requiring fewer experiments than traditional DOE-RSM techniques. The quickly developing field of surrogate optimization attempts to leverage more robust modeling techniques or Kriging / Gaussian Process models to optimize nonlinear systems.
They often employ a stochastic, uncertainty-based, or Bayesian search algorithm to intelligently select new sample points to query for experimentation or simulation. Due to the variety of modeling techniques and search algorithms available, hybrid algorithms, which attempt to leverage each methods strengths, have proliferated. These hybrid approaches usually involve taking ensembles of surrogate models and asking each surrogate for its best set of predicted query points. New queries are then conducted at these points, often weighted in favor of regions/surrogates with low sample variance or optimal response values. The drawback of many of these algorithms is that they are not always generalizable to design problems of diverse dimensionality and nonlinearity. A surrogate optimization algorithm is presented here, which uses an evolving RBF model and hybrid search algorithm. This search algorithm selects half of its query points using a Euclidean distance metric truncated to provide diversity in suggested query points. This is based on a neural network genetic algorithm developed for bio-process optimization, which has been shown to be more efficient than traditional DOE-RSM methods . The other half of the query points are selected using a dynamic coordinate search for response surface methods algorithm based on work developed for computationally expensive simulation.The performance of the NNGADYCORS hybrid algorithm is tested against NNGA and DYCORS separately. Further evaluation is performed to probe potentially useful extensions of the hybrid algorithm to address simulated experimental noise, to improve algorithm convergence over time, and to address cases in which certain groups of parameters have a greater influence on the response values than others. The NNGA algorithm is based on a RBF-assisted GA. The NNGA uses an RBF model to suggest points that are close to but not directly on top of optima, using a truncated genetic algorithm .
One advantage that GAs have over gradient-based methods is that their randomness allows them to efficiently explore both global and local regions of optimality. This makes them very attractive for an optimization framework attempting to look for global optima while facing uncertainty associated with a sparsely explored parameter space, and thus untrustworthy RBF models. This framework is shown in Figure 2.1 and the TGA is illustrated in Figure 2.2. First, a database of inputs X and outputs Y of No total queries is collected . An RBF model is constructed using the training regime discussed in Section 2.2.1. Next, a TGA is run using a randomly initiated population of potential query points with the goal of minimizing the RBF predicted output. In each iteration of the TGA, queries expected to perform the best survive a culling process and have their information propagated into the next iteration by a pairing, crossover and random mutation step. After each iteration, the best predicted query is recorded. When the average normalized Euclidean distance between the TGA’s current predicted best query and its next N −1 predicted best queries, dav,norm, is less than or equal to the critical distance parameter CD = 0.2,drainage for plants in pots the TGA is considered to be converged and submits this list of N best points for potential querying . This TGA is run a total of kmax = 4 times, and its query selections from all rounds of TGA queried to give the next set of data for simulation or experiments.The NNGA-DYCORS algorithm was tested against its constituent algorithms, NNGA, and DYCORS individually. Examining the performance of the constituent algorithms , the NNGA algorithm consistently works well in high dimensions , while the DYCORS algorithm performs better in low dimensions . This was the case both over time and at the final optimal query points . Given these differences in performance, it stands to reason that a hybrid approach would provide a sensible route to a more robust algorithm that could be used on a wider variety of dimensions. As seen in Figure 2.3, the hybrid NNGA-DYCORS often outperforms or performs similarly to the next best constituent algorithm in each experiment. This is reinforced by the data in Tables A.1 and A.2, where the final optimum of the hybrid NNGA-DYCORS is lessthan or equal to the final optimum of the next best constituent algorithm in 19 of 20 experiments . An optimum may be considered better if its upper bound is less than the mean of another algorithm’s optimum.
While this is a rough approximation of the comparative performance of the algorithm, it strongly indicates that the NNGA-DYCORS is robust on a wide variety of problem sets and dimensions. In intermediate cases , the NNGA-DYCORS continued to outperform or perform as well as its most competitive constituent algorithm, showing its usefulness in design optimization problems where it is not obvious a priori what dimensionality counts as ’high’ and ’low’. To test the effect of random noise on the ability of the surrogate optimization algorithms to find optimal parameters, a random noise ewas added to the output of the simulation. It is common practice, especially in noisy, low-data, and data-sparse models, to improve the out-of-sample generalizability by model selection procedures such as cross-validation to avoid overfitting. To address the issues with stochasticity in these experiments this, a hyperparameter optimization loop for the number of nodes nnodes in the RBF model was added to the NNGADYCORS algorithm, where cross-validation over the database was used to select the optimal nnodes. In this case we deliberately trade higher bias for lower variance to reduce overfitting. As can be seen in Figure 2.4, application of a node optimization scheme improved the learner’s performance over the regular scheme in nearly all cases. It should be noted that in these experiments, the linear tail of the RBF was excluded, so Equation 2.3 was modified to be Φλ = Y and was solved. There is a seemingly infinite number of modeling techniques, search optimization algorithms, and initialization/infill strategies in the literature to facilitate optimizing expensive objective functions. However, the characteristics of the experimental system and design space are never really known a priori, so having an algorithm that is more efficient than traditional methods and able to work with a wide variety of problems is advantageous. Therefore, the goal of this article was to develop a surrogate optimization framework that could be successfully applied to test problems with a wide range of dimensionality and degrees of nonlinearity. The NNGA-DYCORS algorithm runs two surrogate optimization algorithms in parallel. The NNGA uses a Euclidean distance-based metric to truncate a genetic algorithm, whose best members are k-means cluster distilled into a final query list. This acts as a global optimization process because the internal genetic algorithm searches over the entire design space. The DYCORS algorithm perturbs the best previous queries using a dynamic Gaussian distribution, where the perturbations are adjusted based on cumulative success and the total number of queries in the database. Thus, DYCORS acts as a local search method in the region defined by a Gaussian centred at its best queries. Both arms of the hybrid algorithm use an RBF for prediction.The result was that the NNGA-DYCORS hybrid algorithm was statistically equal to or outperformed its constituent algorithms in the 19 of 20 test problems. This demonstrates the robustness of the NNGA-DYCORS, as it performs as a best case scenario on a variety of test problem dimensions and shapes. This is important because, in real experimental problems, one does not know the shape of the surface a priori, highlighting the utility of a generalizable optimization framework such as the NNGA-DYCORS. In addition, it is never clear what constitutes a ‘high’ and ‘low’ – dimensionality design problem, so an algorithm that performs well in arbitrary dimensions should have large practical value. The DYCORS algorithm was already shown to be competitive compared to other heuristics, and the NNGA was demonstrated to be significantly more efficient than traditional experimental optimization methods. It stands to reason that this hybrid framework should extend the usefulness of both algorithms to test problems of arbitrary dimensionality and degree of nonlinearity. Using a node optimization scheme to reduce model variance during query selection improves hybrid algorithm performance, especially for noisy surfaces . Practitioners should therefore consider built-in regularization to avoid overfitting of the data when dealing with expensive, data-sparse and noisy systems. Optimizing the number of nodes was specific to this RBF variant, but the optimization loop in Section 3.2 could be applied to any model hyperparameter. In the next set of experiments, the method of making the NNGA-DYCORS convergence parameters dynamic during query selection did not improve performance. This indicates that it may not be fruitful to pursue extensive algorithm parameter adjustments/heuristics for this algorithm, and there is little sensitivity in the selection of algorithm convergence parameters on the outcome, unlike the results in previous articles on the subject. Finally, to mimic typical engineering scenarios where response sensitivity varies with the inputs, the test functions were scaled with a sensitivity vector.