Call for Papers
About the Journal
Editorial Board
Publication Ethics
Instructions for Authors
Announcements
Current Issue
Back Issues
Search for Articles
Categories
Back Issues
 

JCSE, vol. 16, no. 4, pp.222-232, December, 2022

DOI: http://dx.doi.org/10.5626/JCSE.2022.16.4.222

A Study of Job Failure Prediction on Supercomputers with Application Semantic Enhancement

Haotong Zhang, Gang Xian, Wenxiang Yang, and Jie Yu
College of Information Science and Engineering, Chongqing Jiaotong University, Chongqing, China Computational Aerodynamics Institute, China Aerodynamics Research and Development Center, Mianyang, China Computational Aerodynamics Institute, China Aerodynamics Research and Development Center, Mianyang, China State Key Laboratory of Aerodynamics, China Aerodynamics Research and Development Center, Mianyang, China

Abstract: The powerful computing capabilities of supercomputers play an important role in today???scientific computing. A large number of high performance computing jobs are submitted and executed concurrently in the system. Job failure will cause a waste of system resources and impact the efficiency of the system and user jobs. Job failure prediction can support fault-tolerant technology to alleviate this phenomenon in supercomputers. At present, the related work mainly predicts job failure by collecting the real-time performance attributes of jobs, but it is difficult to be applied in the real environment because of the high cost of collecting job attributes. In addition to analyzing the time and resource attributes in the job logs, this study also explores the semantic information of jobs. We mine job application semantic information from job names and job paths, where job path is collected by additional monitoring of the job submitting process. A prediction method based on job application semantic enhancement is proposed, and the prediction results of the non-ensemble learning algorithm and the ensemble learning algorithm are compared under each evaluation indicator. This prediction method requires more miniature feature collection and computation overhead and is easy to apply. The experimental results show that the prediction effect is promisingly improved with job application semantic enhancement, and the final evaluation indicator S score is improved by 5%-6%, of which is 88.16% accuracy with 95.23% specificity and 88.24% sensitivity.

Keyword: Execution Efficiency; Job Failure Prediction; Application Semantic Enhancement; Machine Learning

Full Paper:   111 Downloads, 492 View

 
 
ⓒ Copyright 2010 KIISE – All Rights Reserved.    
Korean Institute of Information Scientists and Engineers (KIISE)   #401 Meorijae Bldg., 984-1 Bangbae 3-dong, Seo-cho-gu, Seoul 137-849, Korea
Phone: +82-2-588-9240    Fax: +82-2-521-1352    Homepage: http://jcse.kiise.org    Email: office@kiise.org