Management of exposure and dealing with the consequences of the concentration of PM2.5 in urbanenvironments requires accurate modeling of spatial-temporal changesof pollutant. Accurate modeling of spatial-temporal changes requires appropriate modeling methods and complete and accurate data. These data are measured by different sensors and with different accuracy, have different variability and due to unavoidable factors such as sensor damage. Missing data cause many problems such as loss of sample size and errors in data analysis; therefore, it is necessary to use solutions to estimate the missing data in modeling the concentration of PM2.5. In this study, a method based on extra tree and decision tree models has been proposed to imputation the missing values of PM2.5 along with considering the relationships between variables while maintaining their variability and natural uncertainty.Meteorological variables and other main pollutants such as O3, Pm10, Co, So2, No2 were considered as effective variables in imputation the missing values of PM2.5. Meteorological variables including total precipitation, relative humidity, and temperature were extracted from the model of the European Center for medium-term weather forecasting. Using the ECMWF model, in addition to increasing the number of meteorological stations, provides the possibility of using hourly resolution with a very small number of missing data, as opposed to a limited number of three-hour resolutions with a large number of missing meteorological data. The results showed that the extra tree method has a higher accuracy than the decision tree method with an average of R2=0.813 due to the reduction of bias with an average of R2=0.653 in imputation of missing PM2.5 values. After managing the missing data using the extra tree method, the XGBoost method was used due to the non-linear evaluation of the importance of the effective variables with the aim of increasing the accuracy and reducing the computational cost for modeling the spatial-temporal changes of the PM2.5 pollutant in different geographical contexts.
Type of Study: Research |
Subject: GIS Received: 2023/02/20
References
1. J. Tan, H. Liu, Y. Li, S. Yin, and C. Yu, "A new ensemble spatio-temporal PM2. 5 prediction method based on graph attention recursive networks and reinforcement learning," Chaos, Solitons & Fractals, vol. 162, p. 112405, 2022. [DOI:10.1016/j.chaos.2022.112405]
2. S. Srivastava and I. N. Sinha, "Classification of air pollution dispersion models: a critical review," in Proceedings of National Seminar on Environmental Engineering with special emphasis on Mining Environment, 2004.
3. X. Xi, Z. Wei, R. Xiaoguang, W. Yijie, B. Xinxin, Y. Wenjun, et al., "A comprehensive evaluation of air pollution prediction improvement by a machine learning method," in 2015 IEEE international conference on service operations and logistics, and informatics (SOLI), 2015, pp. 176-181. [DOI:10.1109/SOLI.2015.7367615] [PMID] [PMCID]
4. W. Tong, "Machine learning for spatiotemporal big data in air pollution," in Spatiotemporal Analysis of Air Pollution and Its Application in Public Health, ed: Elsevier, 2020, pp. 107-134. [DOI:10.1016/B978-0-12-815822-7.00005-4]
5. H. Amini, S. M. Taghavi-Shahri, S. B. Henderson, K. Naddafi, R. Nabizadeh, and M. Yunesian, "Land use regression models to estimate the annual and seasonal spatial variability of sulfur dioxide and particulate matter in Tehran, Iran," Science of the total environment, vol. 488, pp. 343-353, 2014. [DOI:10.1016/j.scitotenv.2014.04.106] [PMID]
6. H. Bagheri, "A machine learning-based framework for high resolution mapping of PM2. 5 in Tehran, Iran, using MAIAC AOD data," Advances in Space Research, vol. 69, pp. 3333-3349, 2022. [DOI:10.1016/j.asr.2022.02.032]
7. M. Zamani Joharestani, C. Cao, X. Ni, B. Bashir, and S. Talebiesfandarani, "PM2. 5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data," Atmosphere, vol. 10, p. 373, 2019. [DOI:10.3390/atmos10070373]
8. J. Ma, Z. Yu, Y. Qu, J. Xu, and Y. Cao, "Application of the XGBoost machine learning method in PM2. 5 prediction: A case study of Shanghai," Aerosol and Air Quality Research, vol. 20, pp. 128-138, 2020. [DOI:10.4209/aaqr.2019.08.0408]
9. S. Gündoğdu, G. Tuna Tuygun, Z. Li, J. Wei, and T. Elbir, "Estimating daily PM2. 5 concentrations using an extreme gradient boosting model based on VIIRS aerosol products over southeastern Europe," Air Quality, Atmosphere & Health, vol. 15, pp. 2185-2198, 2022. [DOI:10.1007/s11869-022-01245-5]
10. Y.-P. Chen, C.-H. Huang, Y.-H. Lo, Y.-Y. Chen, and F. Lai, "Combining attention with spectrum to handle missing values on time series data without imputation," Information Sciences, vol. 609, pp. 1271-1287, 2022. [DOI:10.1016/j.ins.2022.07.124]
11. K. J. Nishanth, V. Ravi, N. Ankaiah, and I. Bose, "Soft computing based imputation and hybrid data and text mining: The case of predicting the severity of phishing alerts," Expert Systems with Applications, vol. 39, pp. 10583-10589, 2012. [DOI:10.1016/j.eswa.2012.02.138]
12. J. Poulos and R. Valle, "Missing data imputation for supervised learning," Applied Artificial Intelligence, vol. 32, pp. 186-196, 2018. [DOI:10.1080/08839514.2018.1448143]
13. W.-C. Lin and C.-F. Tsai, "Missing value imputation: a review and analysis of the literature (2006-2017)," Artificial Intelligence Review, vol. 53, pp. 1487-1509, 2020. [DOI:10.1007/s10462-019-09709-4]
14. H. Kang, "The prevention and handling of the missing data," Korean journal of anesthesiology, vol. 64, pp. 402-406, 2013. [DOI:10.4097/kjae.2013.64.5.402] [PMID] [PMCID]
15. S. E. Awan, M. Bennamoun, F. Sohel, F. Sanfilippo, and G. Dwivedi, "A reinforcement learning-based approach for imputing missing data," Neural Computing and Applications, vol. 34, pp. 9701-9716, 2022. [DOI:10.1007/s00521-022-06958-3]
16. I. Belachsen and D. M. Broday, "Imputation of Missing PM2. 5 Observations in a Network of Air Quality Monitoring Stations by a New k NN Method," Atmosphere, vol. 13, p. 1934, 2022. [DOI:10.3390/atmos13111934]
17. H. Yang, W. Chen, and Z. Liang, "Impact of land use on PM2. 5 pollution in a representative city of middle China," International Journal of Environmental Research and Public Health, vol. 14, p. 462, 2017. [DOI:10.3390/ijerph14050462] [PMID] [PMCID]
18. S. Z. Shogrkhodaei, S. V. Razavi-Termeh, and A. Fathnia, "Spatio-temporal modeling of PM2. 5 risk mapping using three machine learning algorithms," Environmental Pollution, vol. 289, p. 117859, 2021. [DOI:10.1016/j.envpol.2021.117859] [PMID]
19. X. Xu, "Forecasting air pollution PM2. 5 in Beijing using weather data and multiple kernel learning," Journal of Forecasting, vol. 39, pp. 117-125, 2020. [DOI:10.1002/for.2599]
20. F. Hosseinibalam and A. Hejazi, "Influence of meteorological parameters on air pollution in Isfahan," IPCBEE, vol. 46, pp. 7-12, 2012.
21. Y. Lin, X. Yuan, T. Zhai, and J. Wang, "Effects of land-use patterns on PM2. 5 in China's developed coastal region: Exploration and solutions," Science of the Total Environment, vol. 703, p. 135602, 2020. [DOI:10.1016/j.scitotenv.2019.135602] [PMID]
22. Y. Liu, G. Cao, and N. Zhao, "Integrate machine learning and geostatistics for high-resolution mapping of ground-level PM2. 5 concentrations," in Spatiotemporal Analysis of Air Pollution and Its Application in Public Health, ed: Elsevier, 2020, pp. 135-151. [DOI:10.1016/B978-0-12-815822-7.00006-6]
23. M. Faraji, S. Nadi, O. Ghaffarpasand, S. Homayoni, and K. Downey, "An integrated 3D CNN-GRU deep learning method for short-term prediction of PM2. 5 concentration in urban environment," Science of The Total Environment, vol. 834, p. 155324, 2022. [DOI:10.1016/j.scitotenv.2022.155324] [PMID]
24. R. J. Chase, D. R. Harrison, A. Burke, G. M. Lackmann, and A. McGovern, "A Machine Learning Tutorial for Operational Meteorology. Part I: Traditional Machine Learning," Weather and Forecasting, vol. 37, pp. 1509-1529, 2022. [DOI:10.1175/WAF-D-22-0070.1]
25. H. Karimian, Q. Li, C. Wu, Y. Qi, Y. Mo, G. Chen, et al., "Evaluation of different machine learning approaches to forecasting PM2. 5 mass concentrations," Aerosol and Air Quality Research, vol. 19, pp. 1400-1410, 2019. [DOI:10.4209/aaqr.2018.12.0450]
26. S. Fielding, P. M. Fayers, A. McDonald, G. McPherson, and M. K. Campbell, "Simple imputation methods were inadequate for missing not at random (MNAR) quality of life data," Health and Quality of Life Outcomes, vol. 6, pp. 1-9, 2008. [DOI:10.1186/1477-7525-6-57] [PMID] [PMCID]
27. T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona, "A survey on missing data in machine learning," Journal of Big Data, vol. 8, pp. 1-37, 2021. [DOI:10.1186/s40537-021-00516-9] [PMID] [PMCID]
28. G. Huang, "Missing data filling method based on linear interpolation and lightgbm," in Journal of Physics: Conference Series, 2021, p. 012187. [DOI:10.1088/1742-6596/1754/1/012187]
29. P. D. Allison, "Multiple imputation for missing data: A cautionary tale," Sociological methods & research, vol. 28, pp. 301-309, 2000. [DOI:10.1177/0049124100028003003]
30. D. B. Rubin, Multiple imputation for nonresponse in surveys vol. 81: John Wiley & Sons, 2004.
31. S. Fielding, P. M. Fayers, A. McDonald, G. McPherson, M. K. Campbell, and R. S. Group, "Simple imputation methods were inadequate for missing not at random (MNAR) quality of life data," Health and Quality of Life Outcomes, vol. 6, pp. 1-9, 2008. [DOI:10.1186/1477-7525-6-57] [PMID] [PMCID]
32. J. Ma, Z. Shou, A. Zareian, H. Mansour, A. Vetro, and S.-F. Chang, "CDSA: cross-dimensional self-attention for multivariate, geo-tagged time series imputation," arXiv preprint arXiv:1905.09904, 2019.
33. M. W. Ahmad, J. Reynolds, and Y. Rezgui, "Predictive modelling for solar thermal energy systems: A comparison of support vector regression, random forest, extra trees and regression trees," Journal of cleaner production, vol. 203, pp. 810-821, 2018. [DOI:10.1016/j.jclepro.2018.08.207]
34. O. Maier, M. Wilms, J. von der Gablentz, U. M. Krämer, T. F. Münte, and H. Handels, "Extra tree forests for sub-acute ischemic stroke lesion segmentation in MR sequences," Journal of neuroscience methods, vol. 240, pp. 89-100, 2015. [DOI:10.1016/j.jneumeth.2014.11.011] [PMID]
35. E. E. Okoro, T. Obomanu, S. E. Sanni, D. I. Olatunji, and P. Igbinedion, "Application of artificial intelligence in predicting the dynamics of bottom hole pressure for under-balanced drilling: extra tree compared with feed forward neural network model," Petroleum, vol. 8, pp. 227-236, 2022. [DOI:10.1016/j.petlm.2021.03.001]
36. S. García, S. Ramírez-Gallego, J. Luengo, J. M. Benítez, and F. Herrera, "Big data preprocessing: methods and prospects," Big Data Analytics, vol. 1, pp. 1-22, 2016. [DOI:10.1186/s41044-016-0014-0]
haghbayan S, Tashayo B, hosseinii M. Modeling spatial-temporal changes in PM2.5 concentration based on data imputation and the use of machine learning methods in different geographical contexts of the Tehran metropolis. JGST 2023; 12 (4) : 5 URL: http://jgst.issgeac.ir/article-1-1136-en.html