Abstract
Accurately predicting equivalent primary energy use (EPEU) in buildings is crucial for advancing energy-efficient design, optimizing operational strategies, and achieving sustainability goals in the built environment. This study aims to develop reliable prediction models for EPEU by leveraging a comprehensive and high-quality dataset from buildings in Portland, USA. To achieve this, a systematic machine learning framework is adopted, encompassing feature selection, data preprocessing, model training, and performance evaluation. Several state-of-the-art machine learning algorithms are applied, including Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Back-Propagation Neural Networks (BP). These models are trained using key features such as building type, gross floor area, construction year, and various operational characteristics that are known to significantly influence energy consumption patterns. The dataset is carefully cleaned and normalized to ensure model generalizability and minimize bias. Model performance is assessed using standard statistical metrics, including the coefficient of determination (R²), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). Among the tested models, ensemble learning methods—particularly RF and GBDT—consistently outperform others in terms of prediction accuracy, robustness, and stability across different building types. The results of this study not only highlight the potential of machine learning in energy prediction tasks but also provide actionable insights for architects, engineers, facility managers, and policymakers. By identifying the most influential variables and employing effective predictive models, this research supports data-driven decision-making processes aimed at improving building energy performance.
References
Gillingham KT, Huang P, Buehler C, Peccia J, Gentner DR. The climate and health benefits from intensive building energy efficiency improvements. Sci Adv. 2021; 7: 0947. https://doi.org/10.1126/sciadv.abg0947
US Energy Information Administration (EIA). Total Energy Monthly Data - U.S. Energy Information Administration (EIA). Wwweiagov 2024. https://www.eia.gov/totalenergy/data/monthly/#consumption (accessed March 29, 2025).
De Oliveira Matias JC, Devezas TC. Consumption dynamics of primary-energy sources: The century of alternative energies. Appl Energy 2007; 84: 763-70. https://doi.org/10.1016/j.apenergy.2007.01.007
Paska J, Biczel P, Kłos M. Hybrid power systems – An effective way of utilising primary energy sources. Renew Energy. 2009; 34: 2414-21. https://doi.org/10.1016/j.renene.2009.02.018
Papadis E, Tsatsaronis G. Challenges in the decarbonization of the energy sector. Energy. 2020; 205: 118025. https://doi.org/10.1016/j.energy.2020.118025
Abdelrahman MM, Zhan S, Miller C, Chong A. Data science for building energy efficiency: A comprehensive text-mining driven review of scientific literature. Energy Build. 2021; 242: 110885. https://doi.org/10.1016/j.enbuild.2021.110885
Liu X, Gou Z. Occupant-centric HVAC and window control: A reinforcement learning model for enhancing indoor thermal comfort and energy efficiency. Build Environ. 2024; 250: 111197. https://doi.org/10.1016/j.buildenv.2024.111197
Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. 2015; 349: 255-60. https://doi.org/10.1126/science.aaa8415
Ai L, Muggleton SH, Hocquette C, Gromowski M, Schmid U. Beneficial and harmful explanatory machine learning. Machine Learn. 2021; 110: 695-721. https://doi.org/10.1007/s10994-020-05941-0
Fathi S, Srinivasan R, Fenner A, Fathi S. Machine learning applications in urban building energy performance forecasting: A systematic review. Renew Sustain Energy Rev. 2020; 133: 110287. https://doi.org/10.1016/j.rser.2020.110287
Amasyali K, El-Gohary N. Machine learning for occupant-behavior-sensitive cooling energy consumption prediction in office buildings. Renew Sustain Energy Rev. 2021; 142: 110714. https://doi.org/10.1016/j.rser.2021.110714
Olu-Ajayi R, Alaka H, Sulaimon I, Sunmola F, Ajayi S. Building energy consumption prediction for residential buildings using deep learning and other machine learning techniques. J Build Eng. 2022; 45: 103406. https://doi.org/10.1016/j.jobe.2021.103406
Shapi MKM, Ramli NA, Awalin LJ. Energy consumption prediction by using machine learning for smart building: case study in Malaysia. Dev Built Environ. 2020; 5: 100037. https://doi.org/10.1016/j.dibe.2020.100037
Pham A-D, Ngo N-T, Ha Truong TT, Huynh N-T, Truong N-S. Predicting energy consumption in multiple buildings using machine learning for improving energy efficiency and sustainability. J Cleaner Prod. 2020; 260: 121082. https://doi.org/10.1016/j.jclepro.2020.121082
Wang R, Lu S, Feng W. A novel improved model for building energy consumption prediction based on model integration. Appl Energy. 2020; 262: 114561. https://doi.org/10.1016/j.apenergy.2020.114561
Lei L, Chen W, Wu B, Chen C, Liu W. A building energy consumption prediction model based on rough set theory and deep learning algorithms. Energy Build. 2021; 240: 110886. https://doi.org/10.1016/j.enbuild.2021.110886
Letham B, Karrer B, Ottoni G, Bakshy E. Constrained bayesian optimization with noisy experiments. Bayesian Anal. 2019; 14: 495-519. https://doi.org/10.1214/18-ba1110
Zulfiqar M, Gamage KAA, Kamran M, Rasheed MB. Hyperparameter optimization of bayesian neural network using bayesian optimization and intelligent feature engineering for load forecasting. Sensors. 2022; 22: 4446. https://doi.org/10.3390/s22124446
Wong T-T. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognition 2015; 48: 2839-46. https://doi.org/10.1016/j.patcog.2015.03.009
Satria A, Sitompul OS, Mawengkang H. 5-fold cross validation on supporting k-nearest neighbour accuration of making consimilar symptoms disease classification. IEEE Xplore. 2021; 1: 1-5. https://doi.org/10.1109/IC2SE52832.2021.9792094
Hancock JT, Khoshgoftaar TM. CatBoost for big data: an interdisciplinary review. J Big Data. 2020; 7: 94. https://doi.org/10.1186/s40537-020-00369-8
Zhou F, Pan H, Gao Z, Huang X, Qian G, Zhu Y, et al. Fire prediction based on catboost algorithm. Math Problems Eng. 2021; 2021: 1-9. https://doi.org/10.1155/2021/1929137
Ju Y, Sun G, Chen Q, Zhang M, Zhu H, Rehman MU. A model combining convolutional neural network and lightGBM algorithm for ultra-short-term wind power forecasting. IEEE Access. 2019; 7: 28309-18. https://doi.org/10.1109/access.2019.2901920
Onoja M, Jegede A, Blamah N, Olawale AV, Omotehinwa TO. EEMDS: efficient and effective malware detection system with hybrid model based on xceptionCNN and lightGBM algorithm. J Comput Soc Inform. 2022; 1: 42-57. https://doi.org/10.33736/jcsi.4739.2022
Cao Y, Miao QG, Liu JC, Gao L. Advance and prospects of adaboost algorithm. Acta Automatica Sinica. 2014; 39: 745-58. https://doi.org/10.3724/sp.j.1004.2013.00745
Hastie T, Rosset S, Zhu J, Zou H. Multi-class AdaBoost. Stat Interface. 2009; 2: 349-60. https://doi.org/10.4310/sii.2009.v2.n3.a8
Hah DW, Kim YM, Ahn JJ. A study on KOSPI 200 direction forecasting using XGBoost model. J Korean Data Inform Sci Soc. 2019;30: 655-69. https://doi.org/10.7465/jkdi.2019.30.3.655
Oh J-Y, Ham D-H, Lee Y-G, Kim G. Short-term load forecasting using XGBoost and the analysis of hyperparameters. Transac Korean Inst Electr Eng. 2019; 68: 1073-8. https://doi.org/10.5370/kiee.2019.68.9.1073
Zhang Z, Jung C. GBDT-MO: gradient-boosted decision trees for multiple outputs. IEEE Trans Neural Netw Learn Syst. 2021; 32: 3156-67. https://doi.org/10.1109/tnnls.2020.3009776
Wang J, Li P, Ran R, Che Y, Zhou Y. A short-term photovoltaic power prediction model based on the gradient boost decision tree. Appl Sci (Basel). 2018; 8: 689. https://doi.org/10.3390/app8050689
Jin T, Zhou ZY. Leakage detection method for piping network based on BP Neural Network. Appl Mech Mater. 2013; 470: 738-42. https://doi.org/10.4028/www.scientific.net/amm.470.738
Jin J. Fault diagnosis of coal equipment based on dynamic fuzzy neural network and BP Neural Network. Int J Hybrid Inf Technol. 2016; 9: 275-82. https://doi.org/10.14257/ijhit.2016.9.7.25
Afanador NL, Smolinska A, Tran TN, Blanchet L. Unsupervised random forest: a tutorial with case studies. J Chemometr. 2016; 30: 232-41. https://doi.org/10.1002/cem.2790
Sun Z, Wang G, Li P, Wang H, Zhang M, Liang X. An improved random forest based on the classification accuracy and correlation measurement of decision trees. Expert Syst Appl. 2024; 237: 121549. https://doi.org/10.1016/j.eswa.2023.121549
Baldini G. Mitigation of adversarial attacks in 5g networks with a robust intrusion detection system based on extremely randomized trees and infinite feature selection. Electronics. 2024; 13: 2405. https://doi.org/10.3390/electronics13122405
Li X, Jiang S, Wang X, Wang T, Zhang S, Guo J, et al. XCO2 super-resolution reconstruction based on spatial extreme random trees. Atmosphere. 2024; 15: 440. https://doi.org/10.3390/atmos15040440
Mao W. Composition analysis and identification of ancient glass objects using regression and clustering algorithms. Highl Sci Eng Technol. 2023; 35: 6-11. https://doi.org/10.54097/hset.v35i.7016
Li M, Zhou Q, Han X, Lv P. Prediction of reference crop evapotranspiration based on improved convolutional neural network (CNN) and long short-term memory network (LSTM) models in Northeast China. J Hydrol. 2024; 645: 132223. https://doi.org/10.1016/j.jhydrol.2024.132223
Zhou J, Su Z, Hosseini S, Tian Q, Lu Y, Luo H, et al. Decision tree models for the estimation of geo-polymer concrete compressive strength. Math Biosci Eng. 2023; 21: 1413-44. https://doi.org/10.3934/mbe.2024061
Zhang L, Wang F, Sun T, Xu B. A constrained optimization method based on BP neural network. Neural Comput Appl. 2016; 29: 413-21. https://doi.org/10.1007/s00521-016-2455-9
Amasyali K, El-Gohary NM. A review of data-driven building energy consumption prediction studies. Renew Sustain Energy Rev. 2018; 81: 1192-205. https://doi.org/10.1016/j.rser.2017.04.095
Heidarinejad M, Guillermo J, Wentz JR, Rekstad NM, Spengler JD, Jelena Srebric. Actual building energy use patterns and their implications for predictive modeling. Energy Convers Manag. 2017; 144: 164-80. https://doi.org/10.1016/j.enconman.2017.04.003
Zhao H, Magoulès F. A review on the prediction of building energy consumption. Renew Sustain Energy Rev. 2012; 16: 3586-92. https://doi.org/10.1016/j.rser.2012.02.049
Qiao Q, Yunusa-Kaltungo A, Edwards RE. Towards developing a systematic knowledge trend for building energy consumption prediction. J Build Eng. 2020; 35: 101967. https://doi.org/10.1016/j.jobe.2020.101967
Kontokosta CE, Tull C. A data-driven predictive model of city-scale energy use in buildings. Appl Energy. 2017; 197: 303-17. https://doi.org/10.1016/j.apenergy.2017.04.005
Wang Z, Srinivasan RS. A review of artificial intelligence based building energy use prediction: Contrasting the capabilities of single and ensemble prediction models. Renew Sustain Energy Rev. 2017; 75: 796-808. https://doi.org/10.1016/j.rser.2016.10.079
Fan C, Xiao F, Wang S. Development of prediction models for next-day building energy consumption and peak power demand using data mining techniques. Appl Energy. 2014; 127: 1-10. https://doi.org/10.1016/j.apenergy.2014.04.016
Zhang Y, O’Neill Z, Dong B, Augenbroe G. Comparisons of inverse modeling approaches for predicting building energy performance. Build Environ. 2015; 86: 177-90. https://doi.org/10.1016/j.buildenv.2014.12.023
Kolter J, Ferreira J. A large-scale study on predicting and contextualizing building energy usage. Proc AAAI Conf Artif Intell. 2011; 25: 1349-56. https://doi.org/10.1609/aaai.v25i1.7806
Hsu D. Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data. Appl Energy. 2015; 160: 153-63. https://doi.org/10.1016/j.apenergy.2015.08.126
Xiaoxiang Q, Junjia Y, Haron NA, Alias AH, Teik Hua L, Abu Bakar N. Status, challenges and future directions in the evaluation of net-zero energy building retrofits: a bibliometrics-based systematic review. Energies. 2024; 17: 3826. https://doi.org/10.3390/en17153826
Xiaoxiang Q, Junjia Y, Haron NA, Alias AH, Law TH, Nabilah AB. Customer perceived value theory and PSO-LightGBM algorithm-based approach to evaluating satisfaction factors with Net-zero energy building retrofits. Edelweiss Appl Sci Technol. 2025; 9: 2508-30. https://doi.org/10.55214/25768484.v9i3.5835
Junjia Y, Alias AH, Haron NA, Bakar NA. Intelligent construction risk management through transfer learning: trends, challenges, and future strategies. Artif Intell Evol. 2024; 6: 1-16. https://doi.org/10.37256/aie.6120255255
Junjia Y, Alias AH, Haron NA, Bakar NA. Machine learning algorithms for safer construction sites: Critical review. Build Eng. 2024; 2: 544. https://doi.org/10.59400/be.v2i1.544
Junjia Y, Alias AH, Haron NA, Abu Bakar N. Deep learning for safety risk management in modular construction: Status, strengths, challenges, and future directions. Autom Constr. 2025; 169: 105894. https://doi.org/10.1016/j.autcon.2024.105894
Karatasou S, Santamouris M, Geros V. Modeling and predicting building’s energy use with artificial neural networks: Methods and results. Energy Build. 2006; 38: 949-58. https://doi.org/10.1016/j.enbuild.2005.11.005
Wang Z, Wang Y, Zeng R, Srinivasan RS, Ahrentzen S. Random Forest based hourly building energy prediction. Energy Build 2018; 171: 11-25. https://doi.org/10.1016/j.enbuild.2018.04.008
Wang Z, Wang Y, Srinivasan RS. A novel ensemble learning approach to support building energy use prediction. Energy Build. 2018; 159: 109-22. https://doi.org/10.1016/j.enbuild.2017.10.085
Korolija I, Marjanovic-Halburd L, Zhang Y, Hanby VI. UK office buildings archetypal model as methodological approach in development of regression models for predicting building energy consumption from heating and cooling demands. Energy Build. 2013; 60: 152-62. https://doi.org/10.1016/j.enbuild.2012.12.032
Deng H, Fannon D, Eckelman MJ. Predictive modeling for US commercial building energy use: A comparison of existing statistical and machine learning algorithms using CBECS microdata. Energy Build. 2018; 163: 34-43. https://doi.org/10.1016/j.enbuild.2017.12.031
Golafshani E, Chiniforush AA, Zandifaez P, Ngo T. An artificial intelligence framework for predicting operational energy consumption in office buildings. Energy Build. 2024; 317: 114409. https://doi.org/10.1016/j.enbuild.2024.114409

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright (c) 2024 Yin Junjia, Aidi Hizami Alias, Nuzul Azam Haron, Nabilah Abu Bakar