Five adaption configurations using Mini-ImageNet, three PV settings, and AFD are proposed. As shown in Figure 4, S1 uses a general dataset in base-training and meta-learning, then uses target dataset in test, which is the adaptation from one domain to another, denoted in Formula 3. S2 uses a general dataset in base-training, target dataset in meta-learning and test, which is denoted in Formula 4. S3 uses target dataset in three stages, which is denoted in Formula 5. S4 uses general dataset in base-training, similar-target dataset in meta learning, and target dataset in test, which is denoted in Formula 6. When AFD is used in test, PV is considered as a similar domain as the target domain, because they are both associated with leaf diseases of the plants. S5 uses the similar target dataset in base-training and meta-learning, and target domain dataset in test, which is denoted in Formula 7. S1, S4, S5 are cross-domain, and S2, S3 are intra-domain. According to the definitions of SD and TD, e2, e3, e5, e6, e8, e9 are intra-domain experiments, because the data used in meta-learning and test is from the same dataset. The results are shown in Table 4 and Figure 5A. In PV-Split-2, the accuracy of e5 is better than e4 and e6. In PV-Split-3, the accuracy of e8 is better than e7 and e9. What the two settings have in common is that the disease classes belong to different plants. To the diverse species cases, S2 is better than S1 and S3. Especially when the number of species is bigger, the superiority of S2 is more obvious. As listed, e6 gets close to e5, but e8 is much better than e9, which means that the general dataset is better supported when the testing data is more diverse. A broad prior knowledge is very useful for adapting to diverse target. However, in PV-Split-1, e3 is the best one by using S3 because the testing data belongs to the same plant. So, the features of testing data are intensive and the general date in base-training is not helpful.
Oppositely, the data belonging to the same dataset is easier for adaption. In short, to the intra domain cases,hydroponic nft gully if the testing classes are of super-classes, S2 is the best strategy. If the testing classes are sub-classes, S3 is the best strategy. Experiments e1, e4, e7, e10, e11, e12 are cross-domain cases. e1, e4, e7, e10 are the experiments with the worst results in their respective data settings by using S1, due to the big gap between the general domain and target domain. Comparing e10, e11 and e12, e11 has the highest accuracy by using S4, which are shown in Table 4 and Figure 5B. e12 is not as good as e11 because too intensive features extracted from monotonous samples leads to weaker adaptation. S4 is the best training strategy for cross-domain cases, which uses general dataset in base-training to learn the prior knowledge in a wide range, and uses similar-target dataset in meta-learning for adapting to new domain smoothly.Ablation experiments e13–e22 are conducted to show the positive effects of CMSFF module and CA module, respectively. The results are listed in Table 5. Under four data configurations: PV-Setting-1, PV-Setting-2, PV-Setting-3, and AFD, we execute 8 experiments. The training settings are listed: Mini-ImageNet is used in base-training; backbone network is Resnet12; distance metric is cosine similarity; training strategy is S2 and S4. Taking e2, e5, e8, e11 as the baseline, the CMSFF module is added and the results of e13, e15, e19, e21 show the improvement of CMSFF. e14, e18, e20, and e22 indicate that CA has further improved the performances on the basis of CMSFF. e15 and e17 are used to compare the PMSFF module with the CMSFF module, and the results show that CMSFF outperforms PMSFF.Sub-class is defined as the classes belong to the same entry class. The PV-Setting-1 and AFD are sub-class classification examples. Sub-class classification is also named as fine-grained vision categorization which aims to distinguish subordinate categories within entry level categories. Because the samples belonging to the same super-class are similar with each other, sub-class classification is a challenging problem. In Table 4, the PV-setting-1 is the lowest accuracy group among the three PV-settings, as the samples all belong to tomato and are indistinguishable.
The results of AFD group are worse than PV-Setting-1, which is not only because of Subclass reason, also due to cross-domain and in-wild setting of images. Even if the images of AFD are already pre-processed, the backgrounds of images are still different from PV. Also, the illumination condition, resolution, photography devices are all different. Intuitively, the gap of features from SD to TD causes the accuracy declining.N-way and K-shot are the configurations of the task that indicate the difficulty of the task. Given a fixed K, the accuracy decreases as N increases. The result of PV-split-1 with N-way, 10-shot is shown in Figure 5C. The accuracy drops down from 85.39% to 64.35% as N-way increases from 3 to 10. All experimental results listed in Table 4 are executed with fixed 5-way, which indicates that regardless of the data configurations, all experiments follow the common trend: accuracy increases with the number of shots. The accuracy sharply increases as the Shot increases from 1-shot to 5-shot, and tends to be stable when the Shot is larger than 10. After the shot is larger than 20, the growth is not significant. From 1-shot to 50-shot, the increase of accuracy ranges from at least 10% to a maximum of 32%. The results show that the accuracy increases with the number of shot and decreases with the number of way. More ways means higher complexity, and more shots means more supporting information. In existing researches, the N−way is set to 5 generally. In application scenarios, the N is determined by the number of target categories and should not be limited to 5. For example, a plant may have more than five diseases, then the ways should the same as the number of diseases that may occur in the specific scenario. N-way and K-shot are a pair with trade off relationship. When expanding novel classes, we can increase the number of shots as compensation to maintain accuracy. For a new class to be identified, it is acceptable to collect 10 to 50 samples as its support set. However, the positive relationship of shots and accuracy is not linear. The increase of accuracy as Kshot has ceiling. When the K is larger than 30, the accuracy is still growing but very slowly. In this work, we compared three distance metrics: dot product, cosine similarity, and Euclidean distance.This is because even if there is no parameter to be trained in this module, the losses calculated from the distance measurement still affect the parameter updates in the iterations.
An appropriate distance metric significantly helps in improving the performance of classification, clustering process etc. Cosine similarity hits the best performance, as shown in Table 6 and in Figure 5D. The reason is that the vectors obtained from encoder are high dimensional vectors. The cosine similarity has often been used to counteract the problem of Euclidean distance in high dimensional space. The normalization in cosine similarity also has positive effect. In this work, we compared different backbone networks: Convnet4 , AlexNet , Resnet12, Resnet18, Resnet50, Resnet101 , DenseNet , MobileNet-V2 . The Convnet4 is the classical architecture used in FSL which stacks four blocks of convolutional calculation. Different networks include different sizes of trainable parameters. The trainable parameters are more in base-training than in meta learning because the base-training classifier is removed in meta-learning. The size of trainable parameters, learning rate , training time, and epochs in the two training stages are listed in Table 7. e25–e31 are conducted with the configuration: Mini-ImageNet is used in base-training and PV-2- 22 is used in meta-learning. The different number of iterations is due to the different convergence speed in meta-learning. The performances of the backbone networks are listed in Table 8. Resnet12 and Resnet50 outperform the other networks,aeroponic tower garden system with Resnet12 being more efficient. In base-training and meta-learning, we use the validation data to test the accuracy of 5-way, 1-shot tasks which is shown in Figure 6. The black numbers on the black lines are the best accuracy in base-training, and the black numbers on the red lines are the best accuracy in meta-learning. The lifting ranges of accuracy in meta-learning are marked in red numbers. It is shown that the model trained in base-training stage already has the identification ability with few shots to some extent, even without training with tasks in meta-learning. However, in base training, the model is already convergent by training with image wise data, and the accuracy of task testing no longer increases. In fact, the model still has space to improve. Based on this, in meta learning, by using task-wise data, the accuracy has been further promoted around 20% to 30%. In recent years, the architectures of networks go deeper and deeper. Some researchers proposed a question that do we really need so deep networks? Our results show that a medium sized network outperforms other networks in this task. We summarized two reasons: In CNNs, the simpler and more basic features are learnt in shallower layers, the more abstract and complex features are learnt from deeper layers. From shallower layers to deeper layers, the features transition from edges, lines, and colors, to textures and patterns, to complex graphics, even to specific objects. For our specific task, even humans rely more on color, shape, and texture for disease identification. Hence, the too deep networks may be not critical meaningful.
FSL is the kind of learning task with limited data-scale. For a deeper network, it always has large number of parameters needed to be updated. In the data limitation condition, too deep network could meet insufficient updating of parameters in back propagation due to the too long back propagation path. In parameter updating, shallower networks are more flexible, while the deeper networks look bulky. In short, it does not mean that deeper networks always outperform shallower networks. The size of network should match the specific task and data resources.In order to show the superiority of our method, we conducted several experiments to compare with some recent related researches. Argüeso et al. used Siamese Network, Triplet Network, and PV as their experimental material. They set a different data splitting: 32 classes are used for training and the rest six classes for testing. They listed results of three methods: transfer learning, Siamese Network, and Triplet Network. Their backbone network is Inception-V3. In order to be comparable, we executed the experiments with the same data setting as their work. Mini-ImageNet is used in base-training, 32 classes of PV are used in meta-learning, and the rest 6 classes are used in test. The results of e32–e34 are shown in Table 9. We also compared with Li and Chao . They proposed a Semi-supervised FSL approach. The baseline is a typical fine-tuning model. The Single SS adds Semi-supervised step on the top of baseline. The Iterative SS adds one more Semisupervised step on the top of Single SS. PV was also used as their experimental material and set to three splits. Each split has 28 classes for training and the rest 10 classes for testing. They compared with Argüeso et al. too. We also conducted experiments by our methods with the same data settings as Li and Chao . The results of e35–e43 are shown in Table 9. All the comparison results are shown in Figure 7. The data settings of the two references are different from our data settings. The results indicate that our method outperforms the existing works with all data settings, which means that our method is superior and robust.The method learning from few samples is very promising in plant disease recognition, which has wide range of potential application scenarios for its saving of cost on data. When expanding the range of application, a well established model of FSL can easily generalize to novel species or diseases without retraining and providing large scale training data. However, some existing limitations of the FSL itself and the specific applied areas are needed to be considered.