[1] K. Benidis, S. S. Rangapuram, V. Flunkert, Y. Wang, D. Maddix, C. Turkmen, J. Gasthaus,
M. Bohlke-Schneider, D. Salinas, L. Stella, F.-X. Aubet, L. Callot, and T. Januschowski,
Deep learning for time series forecasting: Tutorial and literature survey, 2022.
[2] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and
I. Polosukhin, Attention is all you need, in Proceedings of the 31st International Conference
on Neural Information Processing Systems, 2017, pp. 1–12.
[3] J. Kenton, R. Rajpurkar, J. Hinton, and J. L. Ba, BERT: Pre-training of deep bidirectional
transformers for language understanding, in Proceedings of the 2019 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, pp. 1–12.
[4] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, An image is worth
16x16 words: Transformers for image recognition at scale, in Proceedings of the IEEE/CVF
International Conference on Computer Vision, 2021, pp. 1–12.
[5] X. Dong, J. Li, D. Yu, F. Seide, and M. L. Seltzer, Speech recognition with deep learning: A
review, IEEE Transactions on Audio, Speech, and Language Processing, 26 (2018), pp. 1–13.
[6] B. Lim and S. Zohren, Deep learning for time series forecasting: A survey, Journal of
Forecasting, 40 (2021), pp. 1–23.
[7] Y. Tay, D. Bahri, D. Metzler, D. Juan, Z. Zhao, and C. Zheng, Efficient transformers for
natural language processing, in Proceedings of the 2022 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies,
2022, pp. 1–12.
[8] J. Torres, D. Hadjout, A. Sebaa, F. Mart´ınez-Alvarez, and A. Troncoso, ´ Deep learning for
time series forecasting: A survey, Journal of Forecasting, 40 (2021), pp. 1–23.
[9] S. Tuli, S. K. Singh, S. K. Singh, and R. Buyya, Anomaly detection in time series data using
deep learning, Journal of Intelligent Information Systems, 58 (2022), pp. 1–15.
[10] X. Kong, Z. Chen, W. Liu, K. Ning, L. Zhang, S. M. Marier, Y. Liu, Y. Chen, and F. Xia,
Deep learning for time series forecasting: A survey, International Journal of Machine Learning and Cybernetics, (2025).
[11] Y. W. Xiong, K. Tang, M. Ma, J. Zhang, J. Xu, and T. Li, Modeling temporal dependencies
within the target for long-term time series forecasting, arXiv preprint arXiv:2406.04777v2
[cs.LG], 2024.
[12] Y. Wu, L. Zhang, and Y. Zhang, Deep learning for time series forecasting: A review, Journal
of Forecasting, 40 (2021), pp. 1–23.
[13] X. Xu, J. Li, L. Zhang, and Y. Zhang, Anomaly detection in time series data using deep
learning, Journal of Intelligent Information Systems, 58 (2022), pp. 1–15.
[14] A. Casolaro, V. Capone, G. Iannuzzo, and F. Camastra, Deep learning for time
series forecasting: Advances and open problems, Information, 14 (2023), pp. 598.
DOI:10.3390/info14110598.
[15] G. Zerveas, L. Zhang, and Y. Zhang, Deep learning for time series classification: A review,
Journal of Forecasting, 40 (2021), pp. 1–23.
[16] P. Lara-Ben´ıtez, M. Carranza-Garc´ıa, and J. C. Riquelme, An experimental review on deep
learning architectures for time series forecasting, International Journal of Neural Systems,
31 (2021).
[17] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, Informer: Beyond
efficient transformer for long sequence time-series forecasting, Journal of Machine Learning
Research, (2021), pp. 1–23.
[18] Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, A time series is worth 64 words:
Long-term forecasting with transformers, in Proceedings of the International Conference on
Learning Representations (ICLR), 2023, pp. 1–12.
[19] Y. Wang, J. Li, Y. Zhang, H. Xiong, and W. Zhang, FEDformer: Frequency enhanced decomposed transformer for long-term time series forecasting, in Proceedings of the International
Conference on Learning Representations (ICLR), 2022, pp. 1–12.
[20] Y. Wu, S. Li, S. Zhang, J. Li, H. Xiong, and W. Zhang, Autoformer: Decomposition transformers for long-term time series forecasting, in Proceedings of the International Conference
on Learning Representations (ICLR), 2022, pp. 1–12.
[21] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by backpropagating errors, Nature, 323 (1986), pp. 533–536.
[22] K. Cho, B. van Merri¨enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and
Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, in Proceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing (EMNLP), 2014, pp. 1724–1734.
[23] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, 9 (1997),
pp. 1735–1780.
[24] S. Bai, J. Z. Kolter, and V. Koltun, An empirical evaluation of generic convolutional and
recurrent networks for sequence modeling, arXiv preprint arXiv:1803.01271, 2018.
[25] Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, Patch time series transformer in
Hugging Face - Getting started, Hugging Face Blog, 2023.
[26] Towards Data Science, PatchTST: A breakthrough in time series forecasting, Towards Data
Science, 2023.
[27] R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and practice, 3rd ed.,
OTexts: Melbourne, Australia, 2021. OTexts.com/fpp3.
[28] C. Bergmeir and J. M. Ben´ıtez, On the use of cross-validation for time series predictor
evaluation, Information Sciences, 191 (2012), pp. 192–213. DOI:10.1016/j.ins.2011.12.028.
[29] J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization, Journal of
Machine Learning Research, 13 (2012), pp. 281–305.
[30] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter, Efficient
and robust automated machine learning, in NeurIPS, 2019.
[31] Optuna Development Team, Optuna hyperparameter optimization guide, Documentation,
https://optuna.org/, 2023.
[32] Ray Team, Hyperparameter tuning with Ray Tune, Documentation,
https://docs.ray.io/en/latest/tune/, 2023.
[33] D. Maclaurin, D. Duvenaud, and R. P. Adams, Gradient-based hyperparameter optimization
through reversible learning, in Proceedings of the 32nd International Conference on Machine
Learning (ICML), 2015. https://proceedings.mlr.press/v37/maclaurin15.html.
[34] T. Bollerslev, R. F. Engle, and J. M. Wooldridge, A capital asset pricing model with timevarying covariances, Journal of Political Economy, 96 (1988). DOI:10.1086/261527.
[35] M. Asai, C.-L. Chang, and M. McAleer, Realized volatility and MGARCH models: A review,
Econometrics, 9 (2021).
[36] R. Taleblou and P. Mohajeri, Modeling the daily volatility of oil, gold, dollar, bitcoin and
Iranian stock markets: An empirical application of a nonlinear space state model, Iranian
Economic Review, (2023).
[37] G. Kastner and S. Fruhwirth-Schnatter, ¨ Ancillarity-sufficiency interweaving strategy (ASIS)
for boosting MCMC estimation of stochastic volatility models, Computational Statistics Data
Analysis, 76 (2014), pp. 408–423.
[38] F. Chollet, Deep learning with Python, 2nd ed., Manning Publications, Shelter Island, NY,
2021.
[39] M. Rezaei, N. Neshat, A. Jafari Nodoushan, and A. Ahmadzadeh, The artificial neural
networks for investigation of correlation between economic variables and stock market indices,
Journal of Mathematics and Modeling in Finance, 3 (2023), pp. 19–35.
[40] M. Abdollahzadeh, A. Baagherzadeh Hushmandi, and P. Nabati, Improving the accuracy of
financial time series prediction using nonlinear exponential autoregressive models, Journal
of Mathematics and Modeling in Finance, 4 (2024), pp. 159–173.
[41] M. Goldani, Comparative analysis on forecasting methods and how to choose a suitable one:
Case study in financial time series, Journal of Mathematics and Modeling in Finance, 3
(2023), pp. 37–61.