| تعداد نشریات | 61 |
| تعداد شمارهها | 2,202 |
| تعداد مقالات | 17,953 |
| تعداد مشاهده مقاله | 55,146,874 |
| تعداد دریافت فایل اصل مقاله | 28,869,227 |
Sensitivity assessing to data volume for forecasting: introducing similarity methods as suitable ones in feature selection methods | ||
| Journal of Mathematics and Modeling in Finance | ||
| دوره 4، شماره 2، اسفند 2024، صفحه 115-134 اصل مقاله (588.55 K) | ||
| نوع مقاله: Research Article | ||
| شناسه دیجیتال (DOI): 10.22054/jmmf.2024.81735.1145 | ||
| نویسندگان | ||
| Mahdi Goldani* 1؛ Soraya Asadi Tirvan2 | ||
| 1Faculty of Literature and Humanities, Hakim Sabzevari University, Sabzevar, Iran | ||
| 2Department of Energy Economics, Allameh Tabatabai University, Tehran, Iran | ||
| چکیده | ||
| In predictive modeling, overfitting poses a significant risk, particularly when the feature count surpasses the number of observations, a common scenario in highdimensional datasets. To mitigate this risk, feature selection is employed to enhance model generalizability by reducing the dimensionality of the data. This study evaluates the stability of feature selection techniques with respect to varying data volumes, focusing on time series similarity methods. Utilizing a comprehensive dataset that includes the closing, opening, high, and low prices of stocks from 100 high-income companies listed in the Fortune Global 500, this research compares several feature selection methods, including variance thresholds, edit distance, and Hausdorff distance metrics. Numerous feature selection methods were investigated in literature. Selecting the more accurate feature selection methods in order to forecast can be challenging [1]. So, this study examines the most well-known feature selection methods’ performance in different data sizes. The aim is to identify methods that show minimal sensitivity to the quantity of data, ensuring robustness and reliability in predictions, which is crucial for financial forecasting. Results indicate that among the tested feature selection strategies, the variance method, edit distance, and Hausdorff methods exhibit the least sensitivity to changes in data volume. These methods, therefore, provide a dependable approach to reducing feature space without significantly compromising predictive accuracy. This study highlights the effectiveness of time series similarity methods in feature selection and underlines their potential in applications involving fluctuating datasets, such as financial markets or dynamic economic conditions. | ||
| کلیدواژهها | ||
| feature selection؛ sample size؛ overfitting؛ similarity methods | ||
| مراجع | ||
|
[1] Y. Hmamouche, P. Przymus, A. Casali, and L. Lakhal, GFSM: a feature selection method for improving time series forecasting, Int. J. Adv. Syst. Meas., (2017). [2] E. W. Newell and Y. Cheng, Mass cytometry: blessed with the curse of dimensionality, Nat. Immunol., 17 (2016), pp. 890–895. doi:10.1038/ni.3485. [3] B. Remeseiro and V. Bolon-Canedo, A review of feature selection methods in medical applications, Comput. Biol. Med., 112 (2019). doi:10.1016/j.compbiomed.2019.103375. [4] E. Erguner ¨ Ozko¸c ¨ , Clustering of Time-Series Data, IntechOpen, (2021). doi:10.5772/intechopen.84490. [5] A. Alqahtani, M. Ali, X. Xie, and M. W. Jones, Deep Time-Series Clustering: A Review, Electronics, 10 (23) (2021), 3001. doi:10.3390/electronics10233001. [6] J. L. Vermeulen, Geometric similarity measures and their applications [dissertation], Utrecht University, (2023). [7] H. Xie, J. Li, and H. Xue, A survey of dimensionality reduction techniques based on random projection, arXiv, (2017). Available from: https://arxiv.org/abs/1706.04371. [8] X. Zhu, Y. Wang, Y. Li, Y. Tan, G. Wang, and Q. Song, A new unsupervised feature selection algorithm using similarity-based feature clustering, Comput. Intell., 35 (1) (2019), pp. 2–22. doi:10.1111/coin.12192. [9] P. Mitra, C. A. Murthy, and S. K. Pal, Unsupervised feature selection using feature similarity, IEEE Trans. Pattern Anal. Mach. Intell., 24 (3) (2002), pp. 301–312. doi:10.1109/34.990133. [10] Q. Yu, S. Jiang, R. Wang, and H. Wang, A feature selection approach based on a similarity measure for software defect prediction, Front. Inf. Technol. Electron. Eng., 18 (11) (2017), pp. 1744–1753. doi:10.1631/FITEE.1601322. [11] Y. Shi, C. Zu, M. Hong, L. Zhou, L. Wang, X. Wu, et al., ASMFS: Adaptive-similaritybased multi-modality feature selection for classification of Alzheimer ’s disease, Pattern Recognit., 126 (2022), 108566. doi:10.1016/j.patcog.2022.108566. [12] X. Fu, F. Tan, H. Wang, Y. Zhang, and R. W. Harrison, Feature similarity based redundancy reduction for gene selection, In: Proceedings of the International Conference on Data Mining (Dmin), (2006), pp. 357–360. [13] A. Vabalas, E. Gowen, E. Poliakoff, and A. J. Casson, Machine learning algorithm validation with a limited sample size, PLoS One, 14 (11) (2019), e0224365. [14] G. L. Perry and M. E. Dickson, Using machine learning to predict geomorphic disturbance: The effects of sample size, sample prevalence, and sampling strategy, J. Geophys. Res. Earth Surf., 123 (11) (2018), pp. 2954–2970. doi:10.1029/2018JF004640. [15] Z. Cui and G. Gong, The effect of machine learning regression algorithms and sample size on individualized behavioral prediction with functional connectivity features, Neuroimage, 178 (2018), pp. 622–637. doi:10.1016/j.neuroimage.2018.06.001. [16] L. I. Kuncheva, C. E. Matthews, A. Arnaiz-Gonzalez, and J. J. Rodr ´ ´ıguez, Feature selection from high-dimensional data with very low sample size: A cautionary tale, arXiv, (2020). Available from: https://arxiv.org/abs/2008.12025. [17] L. I. Kuncheva and J. J. Rodr´ıguez, On feature selection protocols for very low-sample-size data, Pattern Recognit., 81 (2018), pp. 660–673. doi:10.1016/j.patcog.2018.03.012. [18] J. Doak, An evaluation of feature selection methods and their application to computer security [Technical Report], CSE-92-18, (1992). [19] H. Liu and L. Yu, Toward integrating feature selection algorithms for classification and clustering, IEEE Trans. Knowl. Data Eng., 17 (4) (2005), pp. 491–502. doi:10.1109/TKDE.2005.66. [20] C. F. Tsai and Y. T. Sung, Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches, Knowl. Based Syst., 203 (2020), 106097. doi:10.1016/j.knosys.2020.106097. [21] U. Mori, A. Mendiburu, and J. A. Lozano, Similarity measure selection for clustering time series databases, IEEE Trans. Knowl. Data Eng., 28 (1) (2015), pp. 181–195. doi:10.1109/TKDE.2015.2462369. [22] M. Goldani, A review of time series similarity methods, In: Proceedings of the 3rd International Conference on Innovation in Business Management and Economics, (2022). [23] S. Palkhiwala, M. Shah, and M. Shah, Analysis of machine learning algorithms for predicting a student’s grade, J. Data Inf. Manag., 4 (2022), pp. 329–341. doi:10.1007/s42488- 022-00078-2. [24] A. C. Rencher and W. F. Christensen, Methods of Multivariate Analysis, 3rd ed., Hoboken: John Wiley & Sons, 2012. | ||
|
آمار تعداد مشاهده مقاله: 726 تعداد دریافت فایل اصل مقاله: 415 |
||