Winter wheat yield prediction using UAV-based multivariate time series data and variate-independent tokenization

Authors: Yan Ge , Zhichang Zhu , Shichao Jin , Jingrong Zang , Ruinan Zhang , Zhaoyu Zhai* , et al.

Journal: Plant Phenomics

Impact Factor: 6.4

Type: Journal Paper

Tags: Yield prediction

DOI: https://doi.org/10.1016/j.plaphe.2025.100039

Abstract

The breeding of high-yield wheat varieties is needed to ensure food security. Accurately and rapidly predicting wheat yield at the plot level via UAVs would enable breeders to identify meaningful genotypic variations and select superior lines, thus accelerating the selection of climate-adapted high-yield varieties. Although current prediction models have already utilized multivariate time series data, these models usually adopt a simple concatenation operation to embed all the raw data, resulting in low prediction accuracy. To address these limitations, we propose an improved transformer-based wheat yield prediction model with a variate-independent tokenization approach. The proposed variate-independent tokenization approach facilitates the embedding of 14 vegetation indices and 28 morphological traits via the feature dimension, enabling the learning of variatecentric representations. We also apply a multivariate attention mechanism to evaluate the contribution of each variate and capture the multivariate correlation. Extensive experiments are conducted to verify the effectiveness of our model, including comparisons across 3 nitrogen treatments, 2 years, and 56 wheat varieties. We also compare our model with state-of-the-art approaches. The experimental results indicate that our model achieves the optimal prediction performance, with an R2 of 0.862, surpassing those of the classical recurrent neural network and transformer variants. We also confirm that combining both the vegetation indices and morphological traits is advantageous over using single-source data for the prediction task, achieving an approximately 4 % prediction performance gain. In conclusion, this study provides a novel approach for utilizing an improved transformer model and multivariate time series data to quantitatively predict plot-level wheat yield, thus enabling the rapid selection of high-yield varieties for breeding.

Key Contributions

  • RGB and multispectral data are used to predict wheat yield. In total, 14 vegetation indices and 28 morphological traits were extracted from the raw data. The dataset covers 7 key growth stages of wheat with 10 UAV flights. We confirm that combining both data sources is advantageous over using single-source data for the prediction task.
  • We propose an improved transformer model with a variateindependent tokenization approach. This configuration supports optimal prediction, reflected by an R2 of 0.862. This approach also allows us to evaluate the importance of each variate from the feature dimension, as well as to analyze the correlation among multivariate data.
  • The improved transformer model is validated over multiple wheat varieties and fertilizer treatments. It exhibits robust prediction performance to unseen data (i.e., data collected in different years), enabling quantification for the high-yield wheat variety breeding task.

Citation

@article{ge2025winter,
  title={Winter wheat yield prediction using UAV-based multivariate time series data and variate-independent tokenization},
  author={Ge, Yan and Zhu, Zhichang and Jin, Shichao and Zang, Jingrong and Zhang, Ruinan and Li, Qing and Sun, Zhuangzhuang and Liu, Shouyang and Xu, Huanliang and Zhai, Zhaoyu},
  journal={Plant Phenomics},
  volume={7},
  number={2},
  pages={100039},
  year={2025},
  publisher={Elsevier}
}